Sampling Distribution of a Sample Proportion Lecture 25

Sampling Distribution of a Sample Proportion Lecture 25 Sections 8. 1 – 8. 2 Fri, Feb 29, 2008

Sampling Distributions n Sampling Distribution of a Statistic

The Sample Proportion n n The letter p represents the population proportion. The symbol p^ (“p-hat”) represents the sample proportion. p^ is a random variable. The sampling distribution of p^ is the probability distribution of all the possible values of p^.

Example Suppose that 2/3 of all males wash their hands after using a public restroom. n Suppose that we take a sample of 1 male. n Find the sampling distribution of p^. n

Example 2/3 W P(W) = 2/3 N P(N) = 1/3

Example Let x be the sample number of males who wash. n The probability distribution of x is n x 0 1 P(x) 1/3 2/3

Example Let p^ be the sample proportion of males who wash. (p^ = x/n. ) n The sampling distribution of p^ is n p^ 0 1 P(p^) 1/3 2/3

Example Now we take a sample of 2 males, sampling with replacement. n Find the sampling distribution of p^. n

Example 2/3 W P(WW) = 4/9 N P(WN) = 2/9 W P(NW) = 2/9 N P(NN) = 1/9 1/3 2/3 N W 1/3

Example Let x be the sample number of males who wash. n The probability distribution of x is n x 0 1 2 P(x) 1/9 4/9

Example Let p^ be the sample proportion of males who wash. (p^ = x/n. ) n The sampling distribution of p^ is n p^ 0 1/2 1 P(p^) 1/9 4/9

Samples of Size n = 3 n If we sample 3 males, then the sample proportion of males who wash has the following distribution. p^ 0 1/3 2/3 P(p^) 1/27 =. 03 6/27 =. 22 12/27 =. 44 1 8/27 =. 30

Samples of Size n = 4 n If we sample 4 males, then the sample proportion of males who wash has the following distribution. p^ P(p^) 0 1/81 =. 01 1/4 8/81 =. 10 2/4 24/81 =. 30 3/4 32/81 =. 40 1 16/81 =. 20

Samples of Size n = 5 n If we sample 5 males, then the sample proportion of males who wash has the following distribution. p^ P(p^) 0 1/243 =. 004 1/5 10/243 =. 041 2/5 40/243 =. 165 3/5 80/243 =. 329 4/5 80/243 =. 329 1 32/243 =. 132

Our Experiment In our experiment, we had 80 samples of size 5. n Based on the sampling distribution when n = 5, we would expect the following n Value of p^ Actual Predicted 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 0. 3 3. 3 13. 2 26. 3 10. 5

The pdf when n = 1 0 1

The pdf when n = 2 0 1/2 1

The pdf when n = 3 0 1/3 2/3 1

The pdf when n = 4 0 1/4 2/4 3/4 1

The pdf when n = 5 0 1/5 2/5 3/5 4/5 1

The pdf when n = 10 0 2/10 4/10 6/10 8/10 1

Observations and Conclusions Observation: The values of p^ are clustered around p. n Conclusion: p^ is close to p most of the time. n

Observations and Conclusions Observation: As the sample size increases, the clustering becomes tighter. n Conclusion: Larger samples give better estimates. n Conclusion: We can make the estimates of p as good as we want, provided we make the sample size large enough. n

Observations and Conclusions Observation: The distribution of p^ appears to be approximately normal. n Conclusion: We can use the normal distribution to calculate just how close to p we can expect p^ to be. n

One More Observation However, we must know the values of and for the distribution of p^. n That is, we have to quantify the sampling distribution of p^. n

The Central Limit Theorem for Proportions n It turns out that the sampling distribution of p^ is approximately normal with the following parameters.

The Central Limit Theorem for Proportions n The approximation to the normal distribution is excellent if

Example If we gather a sample of 100 males, how likely is it that between 60 and 70 of them, inclusive, wash their hands after using a public restroom? n This is the same as asking the likelihood that 0. 60 p^ 0. 70. n

Example Use p = 0. 66. n Check that n ¨ np = 100(0. 66) = 66 > 5, ¨ n(1 – p) = 100(0. 34) = 34 > 5. n Then p^ has a normal distribution with

Example n So P(0. 60 p^ 0. 70) = normalcdf(. 60, . 70, . 66, . 04737) = 0. 6981.

Why Surveys Work Suppose that we are trying to estimate the proportion of the male population who wash their hands after using a public restroom. n Suppose the true proportion is 66%. n If we survey a random sample of 1000 people, how likely is it that our error will be no greater than 5%? n

Why Surveys Work n Now we have

Why Surveys Work Now find the probability that p^ is between 0. 61 and 0. 71: normalcdf(. 61, . 71, . 66, . 01498) = 0. 9992. n It is virtually certain that our estimate will be within 5% of 66%. n

Why Surveys Work What if we had decided to save money and surveyed only 100 people? n If it is important to be within 5% of the correct value, is it worth it to survey 1000 people instead of only 100 people? n

Quality Control A company will accept a shipment of components if there is no strong evidence that more than 5% of them are defective. n H 0: 5% of the parts are defective. n H 1: More than 5% of the parts are defective. n

Quality Control They will take a random sample of 100 parts and test them. If no more than 10 of them are defective, they will accept the shipment. n What is ? n