Chapter 10 Statistical Inferences Based on Two Samples

Statistical Inferences Based on Two Samples 10. 1 Comparing Two Population Means by Using Independent Samples: Variances Known 10. 2 Comparing Two Population Means by Using Independent Samples: Variances Unknown 10. 3 Paired Difference Experiments 10. 4 Comparing Two Population Proportions by Using Large, Independent Samples 10. 5 Comparing Two Population Variances by Using Independent Samples 10 -2

Comparing Two Population Means by Using Independent Samples: Variances Known • Suppose a random sample has been taken from each of two different populations • Suppose that the populations are independent of each other – Then the random samples are independent of each other • Then the sampling distribution of the difference in sample means is normally distributed 10 -3

Sampling Distribution of the Difference of Two Sample Means #1 • Suppose population 1 has mean µ 1 and variance σ12 – From population 1, a random sample of size n 1 is selected which has mean x 1 and variance s 12 • Suppose population 2 has mean µ 2 and variance σ22 – From population 2, a random sample of size n 2 is selected which has mean x 2 and variance s 22 • Then the sample distribution of the difference of two sample means… 10 -4

Sampling Distribution of the Difference of Two Sample Means #2 • Is normal, if each of the sampled populations is normal – Approximately normal if the sample sizes n 1 and n 2 are large • Has mean µx 1–x 2 = µ 1 – µ 2 • Has standard deviation 10 -5

Sampling Distribution of the Difference of Two Sample Means #3 10 -6

z-Based Confidence Interval for the Difference in Means (Variances Known) • Let x 1 be the mean of a sample of size n 1 that has been randomly selected from a population with mean 1 and standard deviation s 1 • Let x 2 be the mean of a sample of size n 2 that has been randomly selected from a population with 2 and s 2 • Suppose each sampled population is normally distributed or that the samples sizes n 1 and n 2 are large • Suppose the samples are independent of each other, then … 10 -7

z-Based Confidence Interval for the Difference in Means Continued • A 100(1 – ) percent confidence interval for the difference in populations µ 1–µ 2 is 10 -8

Example 10. 1 The Bank Customer Waiting Time Case #1 • A random sample of size 100 waiting times observed under the current system of serving customers has a sample mean of 8. 79 – Call this population 1 – Assume population 1 is normal or sample size is large – The variance is 4. 7 • A random sample of size 100 waiting times observed under the new system of time of 5. 14 – Call this population 2 – Assume population 2 is normal or sample size is large – The variance is 1. 9 • Then if the samples are independent … 10 -9

Example 10. 1 The Bank Customer Waiting Time Case #2 • At 95% confidence, z /2 = z 0. 025 = 1. 96, and • According to the calculated interval, the bank manager can be 95% confident that the new system reduces the mean waiting time by between 3. 15 and 4. 15 minutes 10 -10

z-Based Test About the Difference in Means (Variances Known) • Test the null hypothesis about H 0: µ 1 – µ 2 = D 0 – D 0 = µ 1 – µ 2 is the claimed difference between the population means – D 0 is a number whose value varies depending on the situation – Often D 0 = 0, and the null means that there is no difference between the population means 10 -11

z-Based Test About the Difference in Means (Variances Known) • Use the notation from the confidence interval statement on a prior slide • Assume that each sampled population is normal or that the samples sizes n 1 and n 2 are large 10 -12

Test Statistic (Variances Known) • The test statistic is • The sampling distribution of this statistic is a standard normal distribution • If the populations are normal and the samples are independent. . . 10 -13

z-Based Test About the Difference in Means (Variances Known) • Reject H 0: µ 1 – µ 2 = D 0 in favor of a particular alternative hypothesis at a level of significance if the appropriate rejection point rule holds or if the corresponding p-value is less than • Rules are on the next slide… 10 -14

z-Based Test About the Difference in Means (Variances Known) Continued Alternative Ha: µ 1–µ 2 > D 0 Reject H 0 if z > z Ha: µ 1–µ 2 < D 0 z < z Ha: µ 1–µ 2 ≠ D 0 |z| > z /2* * p-value Area under standard normal to the right of +z Area under standard normal to the left of -z Twice the are under standard normal to the right of |z| Either z > z /2 or z < - z /2 10 -15

Example 10. 2 Customer Waiting Time Case #1 • Test the claim that the new system reduces the mean waiting time • Test at the = 0. 05 significance level the null H 0: μ 1 – μ 2 = 0 against the alternative Ha: μ 1 – μ 2 > 0 – Use the rejection rule H 0 if z > z – At the 5% significance level, z = z 0. 05 = 1. 645 – So reject H 0 if z > 1. 645 10 -16

Example 10. 2 Customer Waiting Time Case #2 • Use the sample and population data in Example 10. 1 to calculate the test statistic • Because z = 14. 21 > z 0. 05 = 1. 645, reject H 0 10 -17

Example 10. 2 Customer Waiting Time Case #3 • Conclude μ 1 – μ 2 is greater than 0 so the new system does reduce the waiting time by 3. 65 – On average, reduces mean time by 3. 65 minutes – Note 3. 65 is in the 95% confidence interval • The p-value for this test is the area under the standard normal curve to the right of 14. 21 – This z value is off the table, so the p-value has to be much less than 0. 001 – So, we have extremely strong evidence that H 0 is false and that Ha is true – Therefore, we have extremely strong evidence the new system reduces the mean waiting time 10 -18

Example 10. 2 Customer Waiting Time Case #4 • The new system will be implemented only if it reduces mean waiting time by more than 3 minutes • Set D 0 = 3, and try to reject the null H 0: μ 1 – μ 2 = 3 in favor of the alternative H a: μ 1 – μ 2 > 3 • Calculate the test statistic: 10 -19

Example 10. 2 Customer Waiting Time Case #5 • Because z=2. 53 > z 0. 05 = 1. 645, reject H 0 in favor of Ha • The p-value for this test is the area under the standard normal curve to the right of z = 2. 53 – With Table A. 3, the p-value is 0. 5 – 0. 4943 = 0. 0057 – There is strong evidence against H 0 10 -20

Comparing Two Population Means by Using Independent Samples: Variances Unknown • Generally, the true values of the population variances σ12 and σ22 are not known • They have to be estimated from the sample variances s 12 and s 22, respectively 10 -21

Comparing Two Population Means by Using Independent Samples: Variances Unknown #2 • Also need to estimate the standard deviation of the sampling distribution of the difference between sample means • Two approaches: 1. If it can be assumed that σ12 = σ2, then calculate the “pooled estimate” of σ2 2. If σ12 ≠ σ22, then use approximate methods 10 -22

Pooled Estimate of σ2 • Assume that σ12 = σ2 • The pooled estimate of σ2 is the weighted averages of the two sample variances, s 12 and s 22 • The pooled estimate of σ2 is denoted by sp 2 • The estimate of the population standard deviation of the sampling distribution is 10 -23

t-Based Confidence Interval for the Difference in Means (Variances Unknown) • Select independent random samples from two normal populations with equal variances • A 100(1 – ) percent confidence interval for the difference in populations µ 1 – µ 2 is • where • and t /2 is based on (n 1+n 2 -2) degrees of freedom (df) 10 -24

Example 10. 3 Catalyst Comparison Case • The difference in mean hourly yields of a chemical process for XA-100 and ZB-200 • Given: – n 1 = 5 – x 1 = 811. 0 – s 21 = 386. 0 – n 2 = 5 – x 2 = 750. 2 – S 22 = 484. 4 10 -25

Example 10. 3 Catalyst Comparison Case #2 • Assume that populations of all possible hourly yields for the two catalysts are both normal with the same variance • The pooled estimate of s 2 is • Let μ 1 be the mean hourly yield of catalyst 1 and let μ 2 be the mean hourly yield of catalyst 2 10 -26

Example 10. 3 Catalyst Comparison Case #3 • Want the 95% confidence interval for μ 1 – μ 2 • df = (n 1 + n 2 – 2) = (5 + 5 – 2) = 8 • At 95% confidence, t /2 = t 0. 025 • For 8 degrees of freedom, t 0. 025 = 2. 306 • The 95% confidence interval is: 10 -27

Example 10. 3 Catalyst Comparison Case #4 • So we can be confident that the mean hourly yield from catalyst 1 is between 30. 38 and 91. 22 pounds higher than that of catalyst 2 • On average, the mean yields will differ by 60. 8 lbs 10 -28

Test Statistic (Variances Unknown) • The test statistic is • where D 0 = µ 1 – µ 2 is the claimed difference between the population means • The sampling distribution of this statistic is a t distribution with (n 1 + n 2 – 2) degrees of freedom 10 -29

t-Based Test About the Difference in Means (Variances Unknown) #1 • Test the null hypothesis about H 0: μ 1–μ 2=D 0 – D 0 = μ 1 – μ 2 is the claimed difference between the population means • D 0 is a number whose value varies depending on the situation – Often D 0 = 0, and the null means that there is no difference between the population means • Assume that each sampled population is normal with equal variances • Then if the samples are independent … 10 -30

t-Based Test About the Difference in Means (Variances Unknown) #2 • Reject H 0: 1 – 2 = D 0 in favor of a particular alternative hypothesis at a level of significance if the appropriate rejection point rule holds or if the corresponding pvalue is less than • Rules are on the next slide … 10 -31

t-Based Test About the Difference in Means (Variances Unknown) Alternative Ha: µ 1–µ 2 > D 0 Reject H 0 if t > t Ha: µ 1–µ 2 < D 0 t < t Ha: µ 1–µ 2 ≠ D 0 |t| > t /2* p-value Area under standard normal to the right of +t Area under standard normal to the left of -t Twice the are under standard normal to the right of |t| where t , t /2, and p-values are based on (n 1+n 2 -2) degrees of freedom * Either t > t /2 or t < - t /2 10 -32

Example 10. 4 Catalyst Comparison Case • Test H 0: 1 – 2 = 0 vs. Ha: 1 – 2 ≠ 0 at the =0. 05 significance level • Reject H 0 is t > t 0. 025 (with 8 degrees of freedom) • Again, at 95% confidence, t /2 = t 0. 025 = 2. 306 with 8 degrees of freedom • Calculate the t statistic: 10 -33

Example 10. 4 continued • Because |t| = 4. 6087 is greater than t 0. 025 = 2. 306, reject H 0 in favor of Ha – Conclude at 5% significance level the mean hourly yields from the two catalysts do differ • With computer, calculate p-value = 0. 0017 • The very small p-value indicates that there is very strong evidence against H 0 (that the means are the same) – Based on p-value, conclude the same as before, the two catalysts differ in their mean hourly yields 10 -34

t-Based Confidence Intervals and Tests for Differences with Unequal Variances • If populations are normal, but sample sizes and variances differ substantially, smallsample estimation and testing can be based on these “unequal variance” procedure • Confidence interval • Test statistics 10 -35

t-Based Confidence Intervals and Tests for Differences with Unequal Variances #2 • For both the confidence interval and hypothesis test, the degrees of freedom are equal to… 10 -36

Paired Difference Experiments • Before, drew random samples from two different populations • Now, have two different processes (or methods) • Draw one random sample of units and use those units to obtain the results of each process 10 -37

Paired Difference Experiments Continued • For instance, use the same individuals for the results from one process vs. the results from the other process – E. g. , use the same individuals to compare “before” and “after” treatments • Using the same individuals, eliminates any differences in the individuals themselves and just comparing the results from the two processes 10 -38

Paired Difference Experiments #3 • Let µd be the mean of population of paired differences – µd = µ 1 – µ 2, where µ 1 is the mean of population 1 and µ 2 is the mean of population 2 • Let d and sd be the mean and standard deviation of a sample of paired differences that has been randomly selected from the population – d is the mean of the differences between pairs of values from both samples 10 -39

t-Based Confidence Interval for Paired Differences in Means • If the sampled population of differences is normally distributed with mean d • A (1 - )100% confidence interval for µd = µ 1 - µ 2 is… • where for a sample of size n, t /2 is based on n – 1 degrees of freedom 10 -40

Example 10. 6 Repair Cost Comparison • Refer to the data in Table 10. 2 in the textbook. • Sample of n = 7 damaged cars – Each damaged car is taken to Garage 1 for its estimated repair cost, and then is taken to Garage 2 for its estimated repair cost • Estimated costs at Garage 1: x 1 = 9. 329 • Estimated costs at Garage 2: x 2 = 10. 129 10 -41

Example 10. 6 Repair Cost Comparison #2 • Sample of n = 7 paired differences – d = x 1 – x 2 = 9. 329 – 10. 129 = -0. 8 – s 2 d = 0. 2533 – sd = 0. 5033 • At 95% confidence, want t /2 with n – 1 = 6 degrees of freedom t /2 = 2. 447 10 -42

Example 10. 6 Repair Cost Comparison #3 • The 95% confidence interval is • Can be 95% confident the mean of all paired differences of repair cost estimates at the two garages is between -$126. 54 and -$33. 46 • Can be 95% confident the mean of all repair cost estimates at Garage 1 is between $126. 54 and $33. 46 less than the mean of all possible repair cost estimates at Garage 2 10 -43

Test Statistic for Paired Differences • The test statistic is – D 0 = µ 1 – µ 2 is the claimed or actual difference between the population means • D 0 varies depending on the situation – Often D 0 = 0, meaning that there is no difference between the population means • The sampling distribution of this statistic is a t distribution with (n – 1) degrees of freedom 10 -44

Paired Differences Testing • If the population of differences is normal, we can reject H 0: d = D 0 at the level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding p-value is less than • Rules are on the next slide … 10 -45

Paired Differences Testing Rules Alternative H a: µ d > D 0 Reject H 0 if t > t H a: µ d < D 0 t < t H a: µ d ≠ D 0 |t| > t /2* p-value Area under standard normal to the right of +t Area under standard normal to the left of -t Twice the are under standard normal to the right of |t| where t , t /2, and p-values are based on (n-1) degrees of freedom * Either t > t /2 or t < - t /2 10 -46

Example 10. 7 Repair Cost Comparison • Now, it is believed the two garages cost the same – H 0: μ d = 0 • Test against the alternative that Garage 1 is less expensive than Garage 2, that is, test if μd = μ 1 – μ 2 is less than zero – Ha: μd < 0 • Test at the = 0. 01 significance level. • Reject if t < –t , that is , if t < –t 0. 01 – With n – 1 = 6 degrees of freedom, t 0. 01 = 3. 143 – So reject H 0 if t < – 3. 143 10 -47

Example 10. 7 Repair Cost Comparison Continued • Calculate the t statistic • Since t = – 4. 2053 less than –t 0. 01 = – 3. 143, reject H 0 • Conclude at the = 0. 01 significance level that the mean repair cost at Garage 1 is less than the mean repair cost of Garage 2 • From a computer, for t = -4. 2053, the p-value is 0. 003 • Because this p-value is very small, there is very strong evidence that H 0 should be rejected and that μ 1 is actually less than μ 2 10 -48

Comparing Two Population Proportions by Using Large, Independent Samples • Select a random sample of size n 1 from a population, and let p 1 denote the proportion of units in this sample that fall into the category of interest • Select a random sample of size n 2 from another population, and let p 2 denote the proportion of units in this sample that fall into the same category of interest • Suppose that n 1 and n 2 are large enough – n 1·p 1≥ 5, n 1·(1 - p 1)≥ 5, n 2·p 2≥ 5, and n 2·(1 – p 2)≥ 5 10 -49

Comparing Two Population Proportions Continued • Then the population of all possible values of p 1 - p 2 – Has approximately a normal distribution if each of the sample sizes n 1 and n 2 is large • Here, n 1 and n 2 are large enough so n 1·p 1 ≥ 5, n 1·(1 - p 1) ≥ 5, n 2·p 2 ≥ 5, and n 2·(1 – p 2) ≥ 5 – Has mean µp 1 - p 2 = p 1 – p 2 – Has standard deviation 10 -50

Confidence Interval for the Difference of Two Population Proportions • If the random samples are independent of each other, then the following is a 100(1 – ) percent confidence interval for p 1 - p 2 10 -51

Test Statistic for the Difference of Two Population Proportions • The test statistic is – D 0 = p 1 – p 2 is the claimed or actual difference between the population proportions • D 0 is a number whose value varies depending on the situation – Often D 0 = 0, and the null means that there is no difference between the population means • The sampling distribution of this statistic is a standard normal distribution 10 -52

Testing the Difference of Two Population Proportions #1 • If the population of differences is normal, we can reject H 0: p 1 – p 2 = D 0 at the level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding p-value is less than • Rules are on the next slide … 10 -53

Testing the Difference of Two Population Proportions #2 Alternative Ha: p 1 – p 2 > D 0 Reject H 0 if z > z Ha: p 1 – p 2 < D 0 z < z Ha: p 1 – p 2 ≠ D 0 |z| > z /2* * p-value Area under standard normal to the right of +z Area under standard normal to the left of -z Twice the are under standard normal to the right of |z| Either z > z /2 or z < - z /2 10 -54

Testing the Difference of Two Population Proportions #3 • If D 0 = 0, estimate σp 1 -p 2 by • If D 0 ≠ 0, estimate σp 1 -p 2 by 10 -55

Example 10. 9: The Advertising Media Case • In Des Moines, 631 of 1, 000 consumers were aware of new product (p =0. 631) • In Toledo, 798 of 1, 000 consumers were aware (p =0. 798) • Sample can be considered large – n 1 p 1, n 1(1 -p 1), n 2 p 2, and n 2(1 -p 2) all exceed five 10 -56

Example 10. 9: The Advertising Media Case Continued • The 95 percent confidence interval for p 1 -p 2 is • We are 95 percent confident that the proportion of Des Moines consumers aware of the product is between 0. 2059 and 0. 1281 less than Toledo 10 -57

Comparing Two Population Variances Using Independent Samples • Population 1 has variance σ12 and population 2 has variance σ22 • The null hypothesis H 0 is that the variances are the same – H 0: σ 12 = σ 22 • The alternative is that one is smaller than the other – That population has less variable measurements – Suppose σ12 > σ22 – More usual to normalize • Test H 0: σ12/σ22 = 1 vs. σ12/σ22 > 1 10 -58

Comparing Two Population Variances Using Independent Samples Continued • Reject H 0 in favor of Ha if s 12/s 22 is significantly greater than 1 • s 12 is the variance of a random of size n 1 from a population with variance σ12 • s 22 is the variance of a random of size n 2 from a population with variance σ22 • To decide how large s 12/s 22 must be to reject H 0, describe the sampling distribution of s 12/s 22 • The sampling distribution of s 12/s 22 is the F distribution 10 -59

F Distribution The F is skewed to the right Shape depends on two parameters: the numerator number of degrees of freedom (df 1) and the denominator number of degrees of freedom (df 2) 10 -60

The Sampling Distribution of s 12/s 22 • Suppose we randomly select independent samples from two normally distributed populations with variances s 12 and s 22 • If the null hypothesis H 0: s 12/s 22 = 1 is true, the population of all possible values of s 12/s 22 has an F distribution with df 1 = (n 1 – 1) numerator degrees of freedom and df 2 = (n 2 – 1) denominator degrees of freedom 10 -61

F Distribution • The F point F is the point on the horizontal axis under the curve of the F distribution that gives a right-hand tail area equal to – The value of F depends on a (the size of the right -hand tail area) and df 1 and df 2 – Different F tables for different values of • Tables A. 5 for = 0. 10 • Tables A. 6 for = 0. 05 • Tables A. 7 for = 0. 025 • Tables A. 8 for = 0. 01 10 -62

Testing Two Population Variances (One-Tailed > Alternative) • Independent samples from two normal populations • Test H 0: σ12 = σ22 versus Ha: σ12 > σ22 • Use the test statistic F = s 12/s 22 • The p-value is the area to the right of this value under the F curve having df 1 = (n 1 – 1) numerator degrees of freedom and df 2 = (n 2 – 1) denominator degrees of freedom • Reject H 0 at the a significance level if: – F > F , or – p-value < 10 -63

Testing Two Population Variances (One-Tailed < Alternative) • Independent samples from two normal populations • Test H 0: σ12 = σ22 versus Ha: σ12 < σ22 • Use the test statistic F = s 12/s 22 • The p-value is the area to the right of this value under the F curve having df 1 = (n 1 – 1) numerator degrees of freedom and df 2 = (n 2 – 1) denominator degrees of freedom • Reject H 0 at the a significance level if: – F > F , or – p-value < 10 -64

Example 10. 11: The Catalyst Comparison Case • Wish to test to see if variance of Catalyst XA 100 is smaller than variance of ZB-200 – H 0: σ 12 = σ 22 – Ha: σ12 < σ22 which is equivalent to – H a: σ 22 > σ 12 • Data – n 1 = n 2 = 5 – s 12 = 386 – s 22 = 484. 2 10 -65

Example 10. 11: The Catalyst Comparison Case Continued • F = s 22/s 12 = 484. 2/386 = 1. 2544 • F based on n 2 -1=4 numerator df, n 1 -1=4 denominator df, and is 6. 39 • Since F = 1. 2544 is not greater than F 05 = 6. 39, cannot reject H 0 • There is little evidence that XA-100 produces yields that are more consistent than ZB-200 10 -66

Testing Equality of Population Variances • Independent samples from two normal populations • Test H 0: σ12 = σ22 versus Ha: σ12 ≠ σ22 • Use as test statistic the ratio of the larger of s 12 and s 22 divided by smaller of s 12 and s 22 • The p-value is twice the area to the right of this value under the F curve having df 1 = (n 1 – 1) numerator degrees of freedom and df 2 = (n 2 – 1) denominator degrees of freedom • Reject H 0 at the a significance level if: – F > F , or – p-value < 10 -67

Example 10. 12: The Catalyst Comparison Case • Data and degrees of freedom are the same as Example 10. 11 • Wish to test: – H 0: σ 12 = σ 22 – H a: σ 12 σ 22 • F = larger of s 12 and s 22 / smaller s 12 and s 22 = 484. 2/386 = 1. 2544 • F 0. 025 = 9. 60 so cannot reject H 0 10 -68