Comparing Populations Proportions and means The sampling distribution
Comparing Populations Proportions and means
The sampling distribution of differences of Normal Random Variables If X and Y denote two independent normal random variables, then : D = X – Y is normal with
Comparing proportions Situation • We have two populations (1 and 2) • Let p 1 denote the probability (proportion) of “success” in population 1. • Let p 2 denote the probability (proportion) of “success” in population 2. • Objective is to compare the two population proportions
Consider the statistic: This statistic has a normal distribution
Consider the statistic: This statistic has a normal distribution with
Thus Has a standard normal distribution
We want to test either: or or
If p 1 = p 2 (p say) then the test statistic:
has a standard normal distribution. where is an estimate of the common value of p 1 and p 2.
Thus for comparing two binomial probabilities p 1 and p 2 The test statistic where
The Critical Region The Alternative Hypothesis HA The Critical Region
Example • In a national study to determine if there was an increase in mortality due to pipe smoking, a random sample of n 1 = 1067 male nonsmoking pensioners were observed for a five-year period. • In addition a sample of n 2 = 402 male pensioners who had smoked a pipe for more than six years were observed for the same five-year period. • At the end of the five-year period, x 1 = 117 of the nonsmoking pensioners had died while x 2 = 54 of the pipe-smoking pensioners had died. • Is there a the mortality rate for pipe smokers higher than that for non-smokers
We want to test: The test statistic:
Note:
The test statistic:
We reject H 0 if: Not true hence we accept H 0. Conclusion: There is not a significant (a = 0. 05) increase in the mortality rate due to pipe -smoking
Estimating a difference proportions using confidence intervals Situation • We have two populations (1 and 2) • Let p 1 denote the probability (proportion) of “success” in population 1. • Let p 2 denote the probability (proportion) of “success” in population 2. • Objective is to estimate the difference in the two population proportions d = p 1 – p 2.
Confidence Interval for d 100 P% = 100(1 – a) % : = p 1 – p 2
Example • Estimating the increase in the mortality rate for pipe smokers higher over that for nonsmokers d = p 2 – p 1
Comparing Means Situation • We have two normal populations (1 and 2) • Let m 1 and s 1 denote the mean and standard deviation of population 1. • Let m 2 and s 2 denote the mean and standard deviation of population 1. • Let x 1, x 2, x 3 , … , xn denote a sample from a normal population 1. • Let y 1, y 2, y 3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means
We want to test either: or or
Consider the test statistic:
If: • will have a standard Normal distribution • This will also be true for the approximation (obtained by replacing s 1 by sx and s 2 by sy) if the sample sizes n and m are large (greater than 30)
Note:
The Alternative Hypothesis HA The Critical Region
Example • A study was interested in determining if an exercise program had some effect on reduction of Blood Pressure in subjects with abnormally high blood pressure. • For this purpose a sample of n = 500 patients with abnormally high blood pressure were required to adhere to the exercise regime. • A second sample m = 400 of patients with abnormally high blood pressure were not required to adhere to the exercise regime. • After a period of one year the reduction in blood pressure was measured for each patient in the study.
We want to test: The exercise group did not have a higher average reduction in blood pressure vs The exercise group did have a higher average reduction in blood pressure
The test statistic:
Suppose the data has been collected and:
The test statistic:
We reject H 0 if: True hence we reject H 0. Conclusion: There is a significant (a = 0. 05) effect due to the exercise regime on the reduction in Blood pressure
Estimating a difference means using confidence intervals Situation • We have two populations (1 and 2) • Let m 1 denote the mean of population 1. • Let m 2 denote the mean of population 2. • Objective is to estimate the difference in the two population proportions d = m 1 – m 2.
Confidence Interval for d = m 1 – m 2
Example • Estimating the increase in the average reduction in Blood pressure due to the excercize regime d = m 1 – m 2
Comparing Means – small samples Situation • We have two normal populations (1 and 2) • Let m 1 and s 1 denote the mean and standard deviation of population 1. • Let m 2 and s 2 denote the mean and standard deviation of population 1. • Let x 1, x 2, x 3 , … , xn denote a sample from a normal population 1. • Let y 1, y 2, y 3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means
We want to test either: or or
Consider the test statistic:
If the sample sizes (m and n) are large the statistic will have approximately a standard normal distribution This will not be the case if sample sizes (m and n) are small
The t test – for comparing means – small samples (equal variances) Situation • We have two normal populations (1 and 2) • Let m 1 and s denote the mean and standard deviation of population 1. • Let m 2 and s denote the mean and standard deviation of population 1. • Note: we assume that the standard deviation for each population is the same. s 1 = s 2 = s
Let
The pooled estimate of s. Note: both sx and sy are estimators of s. These can be combined to form a single estimator of s, s. Pooled.
The test statistic: If m 1 = m 2 this statistic has a t distribution with n + m – 2 degrees of freedom
The Alternative Hypothesis HA The Critical Region are critical points under the t distribution with degrees of freedom n + m – 2.
Example • A study was interested in determining if administration of a drug reduces cancerous tumor size. • For this purpose n +m = 9 test animals are implanted with a cancerous tumor. • n = 3 are selected at random and administered the drug. • The remaining m = 6 are left untreated. • Final tumour sizes are measured at the end of the test period
We want to test: The treated group did not have a lower average final tumour size. vs The exercise group did have a lower average final tumour size.
The test statistic:
Suppose the data has been collected and:
The test statistic:
We reject H 0 if: with d. f. = n + m – 2 = 7 Hence we accept H 0. Conclusion: The drug treatment does not result in a significant (a = 0. 05) smaller final tumour size,
Summary of Tests
One Sample Tests p = p 0 p ≠ p 0 p > p 0 p < p 0
Two Sample Tests I am using p instead of p.
- Slides: 52