Chapter 9 Statistical Inferences Based on Two Samples

Statistical Inferences Based on Two Samples 9. 1 Comparing Two Population Means by Using Independent Samples: Variances Known 9. 2 Comparing Two Population Means by Using Independent Samples: Variances Unknown 9. 3 Paired Difference Experiments 9. 4 Comparing Two Population Proportions by Using Large Independent Samples 9. 5 Comparing Two Population Variances by Using Independent Samples 2

Comparing Two Population Means by Using Independent Samples: Variances Known • Suppose a random sample has been taken from each of two different populations • Suppose that the populations are independent of each other – Then the random samples are independent of each other • Then the sampling distribution of the difference in sample means is normally distributed 3

Sampling Distribution of the Difference of Two Sample Means #1 • Suppose one population, called it population 1, has mean 1 and variance s 12 • From population 1, a random sample of size n 1 is selected which has mean x 1 and variance s 12 • Suppose a another population, call it population 2, has mean 2 and variance s 22 • From population 2, a random sample of size n 2 is selected which has mean x 2 and variance s 22 • Then … 4

Sampling Distribution of the Difference of Two Sample Means #2 The sampling distribution of the difference of two sample means is: 1. Normal, if each of the sampled populations is normal • Approximately normal if the sample sizes n 1 and n 2 are large 2. Has mean 3. Has standard deviation 5

z-Based Confidence Interval for the Difference in Means (Variances Known) #2 • Then a 100(1 – a) percent confidence interval for the difference in populations 1 – 2 is 6

z-Based Test About the Difference in Means (Variances Known) • Test the null hypothesis about H 0: 1 – 2 = D 0 • D 0 = 1 – 2 is the claimed difference between the population means • D 0 is a number whose value varies depending on the situation • Often D 0 = 0, and the null means that there is no difference between the population means • Use the notation from the confidence interval statement on prior slide • Assume that each sampled population is normal or that the samples sizes n 1 and n 2 are large 7

Test Statistic (Variances Known) • The test statistic is • The sampling distribution of this statistic is a standard normal distribution • If the populations are normal and the samples are independent. . . 8

z-Based Test About the Difference in Means (Variances Known) #2 Reject H 0 if: Ha : 1 – 2 > D 0 p-value Area under standard normal to the right of z Ha : 1 – 2 < D 0 Area under standard normal to the left of –z Alternative Ha : 1 – 2 ≠ D 0 * Twice the area under standard normal to the right of |z| * either z > za/2 or z < –za/2 9

Comparing Two Population Means by Using Independent Samples: Variances Unknown • Generally, the true values of the population variances s 12 and s 22 are not known. They have to be estimated from the sample variances s 12 and s 22, respectively • Also need to estimate the standard deviation of the sampling distribution of the difference between sample means • Two approaches: • If it can be assumed that s 12 = s 2 , then calculate the “pooled estimate” of s 2 • If s 12 ≠ s 22 , then use approximate methods 10

Pooled Estimate of 2 s • Assume that s 12 = s 2 • The pooled estimate of s 2 is the weighted averages of the two sample variances, s 12 and s 22 • The pooled estimate of s 2 is denoted by sp 2 • The estimate of the population standard deviation of the sampling distribution is 11

t-Based Confidence Interval for the Difference in Means (Variances Unknown) • Select two independent random samples from two normal populations with equal variances. • Then a 100(1 – a) percent confidence interval for the difference in populations 1 – 2 is • where • and ta/2 is based on (n 1 + n 2 – 2) degrees of freedom (df) 12

Test Statistic (Variances Unknown) • The test statistic is • where D 0 = 1 – 2 is the claimed difference between the population means • The sampling distribution of this statistic is a t distribution with (n 1 + n 2 – 2) degrees of freedom 13

t-Based Test About the Difference in Means (Variances Unknown) #3 Reject H 0 if: Ha : 1 – 2 > D 0 p-value Area under standard normal to the right of z Ha : 1 – 2 < D 0 Area under standard normal to the left of –z Alternative Ha : 1 – 2 ≠ D 0 * Twice the area under standard normal to the right of |z| where ta, ta/2, and p-values are based on (n 1 + n 2 – 2) degrees of freedom * either t > ta/2 or t < –ta/2 14

Small Sample Intervals and Tests about Differences in Means When Variances are Not Equal If sampled populations are both normal, but sample sizes and variances differ substantially, small-sample estimation and testing can be based on the following “unequal variance” procedure Confidence Interval Test Statistic For both the interval and test, the degrees of freedom are equal to 15

Paired Difference Experiments • Before, drew random samples from two different populations • Now, have two different processes (or methods) • Draw one random sample of units and use those units to obtain the results of each process • For instance, use the same individuals for the results from one process vs. the results from the other process • E. g. , use the same individuals to compare “before” and “after” treatments • By using the same individuals, eliminating any differences in the individuals themselves and just comparing the results from the two processes 16

Paired Difference Experiments Continued • Let d be the mean of population of paired differences • d = 1 – 2 , where 1 is the mean of population 1 and 2 is the mean of population 2 • Let and sd be the mean and standard deviation of a sample of paired differences that has been randomly selected from the population • is the mean of the differences between pairs of values from both samples 17

t-Based Confidence Interval for Paired Differences in Means If the sampled population of differences is normally distributed with mean d, then a (1 -a)100% confidence interval for d = 1 - 2 is where for a sample of size n, ta/2 is based on n – 1 degrees of freedom 18

Test Statistic for Paired Differences • The test statistic is • D 0 = 1 – 2 is the claimed or actual difference between the population means • D 0 varies depending on the situation • Often D 0 = 0, and the null means that there is no difference between the population means • The sampling distribution of this statistic is a t distribution with (n – 1) degrees of freedom 19

Paired Differences Testing Rules Reject H 0 if: Ha : d > D 0 p-value Area under t distribution to the right of t Ha : d < D 0 Area under t distribution to the left of –t Alternative Ha : d ≠ D 0 * Twice the area under t distribution to the right of |t| where ta, ta/2, and p-values are based on (n – 1) degrees of freedom. * either t > ta/2 or t < –ta/2 20

Comparing Two Population Proportions • Select a random sample of size n 1 from a population, and let denote the proportion of units in this sample that fall into the category of interest • Select a random sample of size n 2 from another population, and let denote the proportion of units in this sample that fall into the same category of interest • Suppose that n 1 and n 2 are large enough • n 1 p 1 ≥ 5, n 1 (1 - p 1) ≥ 5, n 2 p 2 ≥ 5, and n 1 (1 – p 2) ≥ 5 21

Comparing Two Population Proportions Continued • Then the population of all possible values of • Is approximately has a normal distribution if each of the sample sizes n 1 and n 2 is large • Here, n 1 and n 2 are large enough is n 1 p 1 ≥ 5, n 1 (1 - p 1) ≥ 5, n 2 p 2 ≥ 5, and n 1 (1 – p 2) ≥ 5 • Has mean • Has standard deviation 22

Confidence Interval for the Difference of Two Population Proportions • If the random samples are independent of each other, then the following a 100(1 – a) percent confidence interval for 23

Test Statistic for the Difference of Two Population Proportions • The test statistic is • D 0 = p 1 – p 2 is the claimed or actual difference between the population proportions • D 0 is a number whose value varies depending on the situation • Often D 0 = 0, and the null means that there is no difference between the population means • The sampling distribution of this statistic is a standard normal distribution 24

Comparing Two Population Variances Using Independent Samples • Population 1 has variance s 12 and population 2 has variance s 22 • The null hypothesis H 0 is that the variances are the same • H 0 : s 1 2 = s 2 2 • The alternative is that one of them is smaller than the other • That population has less variable, more consistent, measurements • Suppose s 12 > s 22 • More usual to normalize • Test H 0: s 12/s 22 = 1 vs. s 12/s 22 > 1 25

Comparing Two Population Variances Using Independent Samples Continued • Reject H 0 in favor of Ha if s 12/s 22 is significantly greater than 1 • s 12 is the variance of a random of size n 1 from a population with variance s 12 • s 22 is the variance of a random of size n 2 from a population with variance s 22 • To decide how large s 12/s 22 must be to reject H 0, describe the sampling distribution of s 12/s 22 • The sampling distribution of s 12/s 22 is the F distribution 26

F Distribution The F is skewed to the right Shape depends on two parameters: the numerator number of degrees of freedom (df 1) and the denominator number of degrees of freedom (df 2) 27

F Distribution • The F point Fa is the point on the horizontal axis under the curve of the F distribution that gives a right-hand tail area equal to a • The value of Fa depends on a (the size of the right-hand tail area) and df 1 and df 2 • Different F tables for different values of a • See: • • Tables A. 5 for a = 0. 10 Tables A. 6 for a = 0. 05 Tables A. 7 for a = 0. 025 Tables A. 8 for a = 0. 01 28