Comparing Populations Proportions and means The sampling distribution

Comparing Populations Proportions and means

The sampling distribution of differences of Normal Random Variables If X and Y denote two independent normal random variables, then : D = X – Y is normal with

Comparing proportions Situation • We have two populations (1 and 2) • Let p 1 denote the probability (proportion) of “success” in population 1. • Let p 2 denote the probability (proportion) of “success” in population 2. • Objective is to compare the two population proportions

Consider the statistic: This statistic has a normal distribution

Consider the statistic: This statistic has a normal distribution with

Thus Has a standard normal distribution

We want to test either: or or

If p 1 = p 2 (p say) then the test statistic:

has a standard normal distribution. where is an estimate of the common value of p 1 and p 2.

Thus for comparing two binomial probabilities p 1 and p 2 The test statistic where

The Critical Region The Alternative Hypothesis HA The Critical Region

Example • In a national study to determine if there was an increase in mortality due to pipe smoking, a random sample of n 1 = 1067 male nonsmoking pensioners were observed for a five-year period. • In addition a sample of n 2 = 402 male pensioners who had smoked a pipe for more than six years were observed for the same five-year period. • At the end of the five-year period, x 1 = 117 of the nonsmoking pensioners had died while x 2 = 54 of the pipe-smoking pensioners had died. • Is there a the mortality rate for pipe smokers higher than that for non-smokers

We want to test: The test statistic:

Note:

The test statistic:

We reject H 0 if: Not true hence we accept H 0. Conclusion: There is not a significant (a = 0. 05) increase in the mortality rate due to pipe -smoking

Estimating a difference proportions using confidence intervals Situation • We have two populations (1 and 2) • Let p 1 denote the probability (proportion) of “success” in population 1. • Let p 2 denote the probability (proportion) of “success” in population 2. • Objective is to estimate the difference in the two population proportions d = p 1 – p 2.

Confidence Interval for d 100 P% = 100(1 – a) % : = p 1 – p 2

Example • Estimating the increase in the mortality rate for pipe smokers higher over that for nonsmokers d = p 2 – p 1

Comparing Means Situation • We have two normal populations (1 and 2) • Let m 1 and s 1 denote the mean and standard deviation of population 1. • Let m 2 and s 2 denote the mean and standard deviation of population 1. • Let x 1, x 2, x 3 , … , xn denote a sample from a normal population 1. • Let y 1, y 2, y 3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means

We want to test either: or or

Consider the test statistic:

If: • will have a standard Normal distribution • This will also be true for the approximation (obtained by replacing s 1 by sx and s 2 by sy) if the sample sizes n and m are large (greater than 30)

Note:

The Alternative Hypothesis HA The Critical Region

Example • A study was interested in determining if an exercise program had some effect on reduction of Blood Pressure in subjects with abnormally high blood pressure. • For this purpose a sample of n = 500 patients with abnormally high blood pressure were required to adhere to the exercise regime. • A second sample m = 400 of patients with abnormally high blood pressure were not required to adhere to the exercise regime. • After a period of one year the reduction in blood pressure was measured for each patient in the study.

We want to test: The exercise group did not have a higher average reduction in blood pressure vs The exercise group did have a higher average reduction in blood pressure

The test statistic:

Suppose the data has been collected and:

The test statistic:

We reject H 0 if: True hence we reject H 0. Conclusion: There is a significant (a = 0. 05) effect due to the exercise regime on the reduction in Blood pressure

Estimating a difference means using confidence intervals Situation • We have two populations (1 and 2) • Let m 1 denote the mean of population 1. • Let m 2 denote the mean of population 2. • Objective is to estimate the difference in the two population proportions d = m 1 – m 2.

Confidence Interval for d = m 1 – m 2

Example • Estimating the increase in the average reduction in Blood pressure due to the excercize regime d = m 1 – m 2

Comparing Means – small samples Situation • We have two normal populations (1 and 2) • Let m 1 and s 1 denote the mean and standard deviation of population 1. • Let m 2 and s 2 denote the mean and standard deviation of population 1. • Let x 1, x 2, x 3 , … , xn denote a sample from a normal population 1. • Let y 1, y 2, y 3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means

We want to test either: or or

Consider the test statistic:

If the sample sizes (m and n) are large the statistic will have approximately a standard normal distribution This will not be the case if sample sizes (m and n) are small

The t test – for comparing means – small samples (equal variances) Situation • We have two normal populations (1 and 2) • Let m 1 and s denote the mean and standard deviation of population 1. • Let m 2 and s denote the mean and standard deviation of population 1. • Note: we assume that the standard deviation for each population is the same. s 1 = s 2 = s

Let

The pooled estimate of s. Note: both sx and sy are estimators of s. These can be combined to form a single estimator of s, s. Pooled.

The test statistic: If m 1 = m 2 this statistic has a t distribution with n + m – 2 degrees of freedom

The Alternative Hypothesis HA The Critical Region are critical points under the t distribution with degrees of freedom n + m – 2.

Example • A study was interested in determining if administration of a drug reduces cancerous tumor size. • For this purpose n +m = 9 test animals are implanted with a cancerous tumor. • n = 3 are selected at random and administered the drug. • The remaining m = 6 are left untreated. • Final tumour sizes are measured at the end of the test period

We want to test: The treated group did not have a lower average final tumour size. vs The exercise group did have a lower average final tumour size.

The test statistic:

Suppose the data has been collected and:

The test statistic:

We reject H 0 if: with d. f. = n + m – 2 = 7 Hence we accept H 0. Conclusion: The drug treatment does not result in a significant (a = 0. 05) smaller final tumour size,

Summary of Tests

One Sample Tests p = p 0 p ≠ p 0 p > p 0 p < p 0

Two Sample Tests I am using p instead of p.