Chapter 9 Estimation Using a Single Sample Confidence

  • Slides: 57
Download presentation
Chapter 9 Estimation Using a Single Sample (Confidence Intervals!) 1 Copyright © 2005 Brooks/Cole,

Chapter 9 Estimation Using a Single Sample (Confidence Intervals!) 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Branches of Statistics • Descriptive statistics – what we’ve done so far. • Inferential

Branches of Statistics • Descriptive statistics – what we’ve done so far. • Inferential statistics – what we start today! ü Using values obtained from a sample (statistics) to predict values for a population (parameters) ü Confidence intervals ü Hypothesis testing 2 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Point Estimation A point estimate of a population characteristic is a single number that

Point Estimation A point estimate of a population characteristic is a single number that is based on sample data and represents a plausible value of the characteristic. 3 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Examples of Point Estimates The percentage of orange Reese’s Pieces in a random sample

Examples of Point Estimates The percentage of orange Reese’s Pieces in a random sample of 25. The average length of the Jellyblubbers in a random sample of 25. The median size (diameter) of a random sample of 40 apples. The standard deviation of the ages of a random sample of 125 college students. The variance of the Algebra II grades of a random sample of 200 Algebra II students. 4 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Examples of Point Estimates - Continued A sample of 200 students at a large

Examples of Point Estimates - Continued A sample of 200 students at a large university is selected to estimate the proportion of students that wear contact lens. In this sample 47 wore contact lens. Let = the true proportion of all students at this university who wear contact lens. Consider “success” being a student who wears contact lens. What is the point estimate for ? 5 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example The statistic is a reasonable choice for a formula to obtain a point

Example The statistic is a reasonable choice for a formula to obtain a point estimate for . Such a point estimate is 6 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example A sample of weights of 34 male freshman students was obtained. 185 202

Example A sample of weights of 34 male freshman students was obtained. 185 202 197 188 166 148 161 139 214 170 231 180 174 177 283 207 176 194 175 170 184 180 184 176 202 151 189 167 179 178 176 168 177 155 If one wanted to estimate the true mean of all male freshman students, you might use the sample mean as a point estimate for the true mean. 7 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example – Same Data! After looking at a histogram and boxplot of the data

Example – Same Data! After looking at a histogram and boxplot of the data (below) you might notice that the data seems reasonably symmetric with an outlier, so you might use either the sample median or a sample trimmed mean as a point estimate. 140 8 180 220 260 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Bias A statistic with mean value equal to the value of the population characteristic

Bias A statistic with mean value equal to the value of the population characteristic being estimated is said to be an unbiased statistic. A statistic that is not unbiased is said to be biased. Sampling distribution of a unbiased statistic Original distribution 9 Sampling distribution of a biased statistic Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Bias Another way to think of bias is this. An unbiased statistic gives an

Bias Another way to think of bias is this. An unbiased statistic gives an estimate that is too high the same proportion of the time that it gives an estimate that is too low! Sampling distribution of a unbiased statistic Original distribution 10 Sampling distribution of a biased statistic Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

What Makes a “Good” Point Estimate? Given a choice between several unbiased statistics that

What Makes a “Good” Point Estimate? Given a choice between several unbiased statistics that could be used for estimating a population characteristic, the best statistic to use is the one with the smallest standard deviation. Unbiased sampling distribution with the smallest standard deviation, the Best choice. 11 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Point Estimates - Summary Unbiased v. Biased Small standard error is good. What is

Point Estimates - Summary Unbiased v. Biased Small standard error is good. What is standard error? The standard deviation of the sampling distribution of sample statistics. 12 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Confidence Intervals Point estimates are of little value in estimating a parameter. Because of

Confidence Intervals Point estimates are of little value in estimating a parameter. Because of sampling variability we know a point estimate can vary widely and is seldom equal to the actual parameter. 13 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Confidence Intervals So…instead, we find a range of values that we can say with

Confidence Intervals So…instead, we find a range of values that we can say with some degree of certainty contains the parameter. A confidence interval for a population characteristic (parameter) is an interval of plausible values for the characteristic. It is constructed so that, with a chosen degree of confidence, the value of the characteristic will be captured inside the interval. 14 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

A Better Way! Confidence Intervals § An interval estimate with and associated measure of

A Better Way! Confidence Intervals § An interval estimate with and associated measure of precision. • I am 95% confident that the true proportion of U. S. adults who believe that affirmative action programs should continue is between. 499 and. 561. • I am 93% confident that the true mean number of students per 3 rd hour class at MHS is 25 ± 4. This is called the bound on the error. 15 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

More Examples I am 99% confident that the true mean annual radiation exposure for

More Examples I am 99% confident that the true mean annual radiation exposure for Diablo Canyon Nuclear Power Plant Unit 2 workers is between. 412 and. 550 rem. I am 90% confident that in 1993 the true mean salary for married men who received MBAs in the late 70 s and who were the sole source of family income was between $121, 406. 03 and $127, 613. 97. Figure out the bound on the error for each of these. 16 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Statistic ± Bound on the Error Getting the statistic is easy. How do we

Statistic ± Bound on the Error Getting the statistic is easy. How do we get the “bound on the error”? • We’ll call it “error” for short. • Formula Critical value × standard error For p-hat that means: Where z* is based on the “confidence level” (How certain you want to be). 17 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Steps to Creating a Confidence Interval for Estimating To create a confidence interval with

Steps to Creating a Confidence Interval for Estimating To create a confidence interval with a 95% level of confidence, we take 95% of the area under the normal curve, right out of the center! Z*1 Next, calculate the zscores that define the boundaries of this area. These are the critical values. 95% Z*2 Actual value of π 18 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The Concept! Only one of the point estimates possible is actually correct. But, if

The Concept! Only one of the point estimates possible is actually correct. But, if we add or subtract this much… 95% Z*1 …from every value of p -hat possible. Then all of the resulting intervals created by the p-hats in the shaded region will contain the actual value of π. Z*2 Actual value of π 19 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Continuing the Steps Look up the z-scores that are the boundaries for the middle

Continuing the Steps Look up the z-scores that are the boundaries for the middle 95% of the normal curve. They are just additive inverses of each other. This is the critical value for a 95% confidence interval. 95% Z*1 20 Actual value of π Z*2 Find the standard error for p-hat. The critical value times the standard error gives the actual distance from π to the boundaries. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

But Wait a Minute!!! We don’t know the value of pi. So use p-hat.

But Wait a Minute!!! We don’t know the value of pi. So use p-hat. And how do we know this is normal? • Requirements for creating a z-confidence interval for pi. üP-hat must come from a random sample. üThe sample size must be large enough for n(p-hat) ≥ 10 and n(1 – p-hat) ≥ 10 (This allows us to say that p-hat has an approximately normal distribution and allows us to use p-hat to estimate pi. ) üThe sample must be less than 5% (or 10%) of the population. If these requirements are met then we can proceed. 21 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Large-sample Confidence Interval for a Population Proportion 95% z 1 22 z 2 Copyright

Large-sample Confidence Interval for a Population Proportion 95% z 1 22 z 2 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Confidence Level The confidence level associated with a confidence interval estimate is the success

Confidence Level The confidence level associated with a confidence interval estimate is the success rate of the method used to construct the interval. Even though it is written as a percentage, the confidence level is not a probability! 23 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

So Now…what does confidence mean in each of these cases? I am 99% confident

So Now…what does confidence mean in each of these cases? I am 99% confident that the true proportion of MHS students who are “middle children” is between. 412 and. 550. I am 90% confident that in 1993 the true proportion of married men who received MBAs in the late 70 s and who were the sole source of family income was between. 12 and. 185. What were the p-hats in each of these cases? 24 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Reese’s Pieces Our class did an M&M lab. In past years we have done

Reese’s Pieces Our class did an M&M lab. In past years we have done similar labs with Reese’s Pieces. Either way results suggest that even though sample values vary depending on which sample you happen to pick, there seems to be a pattern to the variation. We need more samples to investigate this pattern more thoroughly, however. Since it is time-consuming (and possibly fattening) to literally sample candies, we will use the TI-83 calculator to simulate the process. 25 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Reese’s Pieces To perform these simulations we need to suppose that we know the

Reese’s Pieces To perform these simulations we need to suppose that we know the actual value of the parameter. Let us suppose that 45% of the population is orange. Use TI-83 calculator drawing 500 samples of 25 candies each. (Pretend that this is really 500 students, each taking 25 candies and counting the number of orange ones. ) rand. Bin(25, . 45, 500) L 1/25 L 2 (This will take time and battery power. ) Then look at a display of the sample proportions of orange obtained. And sketch. 26 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Reese’s Pieces Record the mean and standard deviation of these sample proportions. Roughly speaking,

Reese’s Pieces Record the mean and standard deviation of these sample proportions. Roughly speaking, are there more sample proportions close to the population proportion (which, we said to be. 45) than there are far from it? 27 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Phone Home!* Reese’s Pieces Let us quantify the previous question. Use TI-83 calculator to

Phone Home!* Reese’s Pieces Let us quantify the previous question. Use TI-83 calculator to count how many of the 500 sample proportions are within . 10 of. 45 (i. e. between. 35 and. 55). Then repeat for within . 20 and for within . 30. Sort. A(L 2) Record the results: Number of the 500 sample proportions Percentage of these sample proportions within . 10 of. 45 within . 20 of. 45 within . 30 of. 45 28 *E. T. reference Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Reese’s Pieces Suppose that each of the 500 imaginary students was to estimate the

Reese’s Pieces Suppose that each of the 500 imaginary students was to estimate the population proportion of orange candies by going a distance of. 20 on either side of her/his sample proportion. What percentage of the 500 students would capture the actual population proportion (. 45) within this interval? 29 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Reese’s Pieces Forgetting that you actually (think you) know the population proportion of orange

Reese’s Pieces Forgetting that you actually (think you) know the population proportion of orange candies to be. 45, suppose that you were one of these 500 imaginary students. Would you have any way of knowing definitively whether your sample proportion was within. 20 of the population proportion? Would you be reasonably “confident” that your sample proportion was within. 20 of the population proportion? Explain why. 30 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The 95% Confidence Interval 31 Copyright © 2005 Brooks/Cole, a division of Thomson Learning,

The 95% Confidence Interval 31 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example For a project, a student randomly sampled 182 other students at a large

Example For a project, a student randomly sampled 182 other students at a large university to determine if the majority of students were in favor of a proposal to build a field house. He found that 75 were in favor of the proposal. 32 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

 = the true proportion of students that favor the proposal. Requirements: 1. It

= the true proportion of students that favor the proposal. Requirements: 1. It is given to be a random sample. 2. np = 182(0. 4121) = 75 >10 and n(1 -p)=182(0. 5879) = 107 >10 3. It is reasonable to assume that 182 students is less than or equal to the number of students attending a large university (182/. 05=3640). 4. I will create a 95% z-confidence interval for . I am 95% confident that the true proportion of students that favor the proposal is between 0. 341 and 0. 484. 33 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Confidence Interval for pi on the TI-84 34 Copyright © 2005 Brooks/Cole, a division

Confidence Interval for pi on the TI-84 34 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

So…The General Confidence Interval Formula for a population proportion The general formula for a

So…The General Confidence Interval Formula for a population proportion The general formula for a confidence interval for a population proportion when 1. p is the sample proportion from a random sample , and 2. The sample size n is large (np 10 and np(1 -p) 10) 3. n<. 05 N is given by 35 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Finding a z Critical Value Finding a z critical value for a 98% confidence

Finding a z Critical Value Finding a z critical value for a 98% confidence interval. How would we do this on the calculator? 2. 33 Looking up the cumulative area or 0. 9900 in the body of the table we find z = 2. 33 36 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Some Common Critical Values Confidence z critical level value 80% 95% 98% 99. 9%

Some Common Critical Values Confidence z critical level value 80% 95% 98% 99. 9% 37 1. 28 1. 645 1. 96 2. 33 2. 58 3. 09 3. 29 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Terminology Review The standard error of a statistic is the estimated standard deviation of

Terminology Review The standard error of a statistic is the estimated standard deviation of the statistic. 38 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Review Terminology The bound on error of estimation, B, associated with a 95% confidence

Review Terminology The bound on error of estimation, B, associated with a 95% confidence interval is (1. 96)·(standard error of the statistic). The bound on error of estimation, B, associated with a confidence interval is (z critical value)·(standard error of the statistic). 39 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Sample Size The sample size required to estimate a population proportion to within an

Sample Size The sample size required to estimate a population proportion to within an amount B with 95% confidence is The value of may be estimated by prior information. If no prior information is available, use = 0. 5 in the formula to obtain a conservatively large value for n. Generally one rounds the result up to the nearest integer. 40 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Sample Size Calculation Example If a TV executive would like to find a 95%

Sample Size Calculation Example If a TV executive would like to find a 95% confidence interval estimate within 0. 03 for the proportion of all households that watch NYPD Blue regularly. How large a sample is needed if a prior estimate for was 0. 15. We have B = 0. 03 and the prior estimate of = 0. 15 A sample of 545 or more would be needed. 41 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Sample Size Calculation Example revisited Suppose a TV executive would like to find a

Sample Size Calculation Example revisited Suppose a TV executive would like to find a 95% confidence interval estimate within 0. 03 for the proportion of all households that watch NYPD Blue regularly. How large a sample is needed if we have no reasonable prior estimate for . We have B = 0. 03 and should use = 0. 5 in the formula. 42 The required sample size is now 1068. Notice, a reasonable ball park estimate for can lower the needed sample size. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Another Example A college professor wants to estimate the proportion of students at a

Another Example A college professor wants to estimate the proportion of students at a large university who favor building a field house with a 99% confidence interval accurate to 0. 02. If one of his students performed a preliminary study and estimated to be 0. 412, how large a sample should he take. We have B = 0. 02, a prior estimate = 0. 412 and we should use the z critical value 2. 58 (for a 99% confidence interval) The required sample size is 4032. 43 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

One-Sample z Confidence Interval for m 2. The sample size n is large (generally

One-Sample z Confidence Interval for m 2. The sample size n is large (generally n 30), and 3. s , the population standard deviation, is known the general formula for a confidence interval for a population mean is given by 44 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

One-Sample z Confidence Interval for m If n is small (generally n < 30)

One-Sample z Confidence Interval for m If n is small (generally n < 30) but it is reasonable to believe that the distribution of values in the population is normal, a confidence interval for (when is known) is. . . Notice that this formula works when is known and either 1. n is large (generally n 30) or 2. The population distribution is normal (any sample size. 45 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example A certain filling machine has a true population standard deviation = 0. 228

Example A certain filling machine has a true population standard deviation = 0. 228 ounces when used to fill catsup bottles. A random sample of 36 “ 6 ounce” bottles of catsup was selected from the output from this machine and the sample mean was 6. 018 ounces. Find a 90% confidence interval estimate for the true mean fills of catsup from this machine. 46 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example I (continued) The z critical value is 1. 645 90% Confidence Interval (5.

Example I (continued) The z critical value is 1. 645 90% Confidence Interval (5. 955, 6. 081) 47 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Unknown [All Size Samples] An Irish mathematician/statistician, W. S. Gosset developed the techniques and

Unknown [All Size Samples] An Irish mathematician/statistician, W. S. Gosset developed the techniques and derived the Student’s t distributions that describe the behavior of. 48 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

t Distributions If X is a normally distributed random variable, the statistic has a

t Distributions If X is a normally distributed random variable, the statistic has a “t” distribution where 49 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

t Distributions 50 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

t Distributions 50 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

t Distributions Notice: As df increase, t distributions approach the standard normal distribution. Since

t Distributions Notice: As df increase, t distributions approach the standard normal distribution. Since each t distribution would require a table similar to the standard normal table, we usually only create a table of critical values for the t distributions. 51 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

52 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

52 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

One-Sample t Procedures Suppose that a SRS of size n is drawn from a

One-Sample t Procedures Suppose that a SRS of size n is drawn from a population having unknown mean . The general confidence limits are and the general confidence interval for is 53 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Confidence Interval Example Ten randomly selected shut-ins were each asked to list how many

Confidence Interval Example Ten randomly selected shut-ins were each asked to list how many hours of television they watched per week. The results are 82 66 90 84 75 88 80 94 110 91 Find a 90% confidence interval estimate for the true mean number of hours of television watched per week by shut-ins. 54 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Confidence Interval Example Calculating the sample mean and standard deviation we have n =

Confidence Interval Example Calculating the sample mean and standard deviation we have n = 10, = 86, s = 11. 842 We find the critical t value of 1. 833 by looking on the t table in the row corresponding to df = 9, in the column with bottom label 90%. Computing the confidence interval for is 55 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Confidence Interval Example To calculate the confidence interval, we had to make the assumption

Confidence Interval Example To calculate the confidence interval, we had to make the assumption that the distribution of weekly viewing times was normally distributed. Consider the normal plot of the 10 data points produced with Minitab that is given on the next slide. 56 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Confidence Interval Example Notice that the normal plot looks reasonably linear so it is

Confidence Interval Example Notice that the normal plot looks reasonably linear so it is reasonable to assume that the number of hours of television watched per week by shut-ins is normally distributed. Typically if the p-value is more than 0. 05 we assume that the distribution is normal 57 Anderson-Darling Normality Test A-Squared: 0. 226 P-Value: 0. 753 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.