Interval estimation ASW Chapter 8 Economics 224 Notes

  • Slides: 23
Download presentation
Interval estimation ASW, Chapter 8 Economics 224, Notes for October 8, 2008

Interval estimation ASW, Chapter 8 Economics 224, Notes for October 8, 2008

Central limit theorem – CLT (ASW, 271) The sampling distribution of the sample mean,

Central limit theorem – CLT (ASW, 271) The sampling distribution of the sample mean, , is approximated by a normal distribution when the sample is a simple random sample and the sample size, n, is large. In this case, the mean of the sampling distribution is the population mean, μ, and the standard deviation of the sampling distribution is the population standard deviation, σ, divided by the square root of the sample size. The latter is referred to as the standard error of the mean. In symbols, the standard error is A sample size of 100 or more elements is generally considered sufficient to permit using the CLT. If the population from which the sample is drawn is symmetrically distributed, n > 30 may be sufficient to use the CLT.

Large random sample from any population Any population Sampling distribution of x when sample

Large random sample from any population Any population Sampling distribution of x when sample is random No. of elements N n Mean μ μ Standard deviation σ A sample size n of greater than 100 is generally considered sufficiently large to use these results from the CLT.

Probability that a sample mean is within a specified distance of the population mean

Probability that a sample mean is within a specified distance of the population mean What is the probability that a particular random sample of size n = 50 has a mean that is within $100 of the population mean? See next slide.

 • Within $100 of the mean is from 2352 - 100 = 2252

• Within $100 of the mean is from 2352 - 100 = 2252 to 2352 + 100 = 2452. • The sampling distribution of the sample means is normal since the sample size n = 50 is large, and µ = 2352 and σ = 210. • The required probability is the area under the normal curve between 2252 and 2452. • Obtain the corresponding Z-values. For Z = -0. 48, cumulative probability is 0. 3156 For Z = 0. 48, cumulative probability is 0. 6844 Required probability is 0. 6844 – 0. 3156 = 0. 3688. The probability that a sample yields a sample mean within $100 of the population mean is 0. 37.

Standard error for the sample mean • The standard deviation of the sampling distribution

Standard error for the sample mean • The standard deviation of the sampling distribution of the sample mean is also referred to as the standard error. As n increases, the standard error decreases, so the sample means are less variable. As n increases, the sample means tend to be closer to the population mean. That is, for a larger n, there is an increased probability that the sample mean lies within any specified distance from the mean. See the next slide for a diagram. • In the last example, if n = 200, the standard error is 1485 divided by the square root of 200, or 105. With this larger sample size, the probability that a sample mean is within $100 of the population mean is the area under a normal curve between z = -0. 95 and z = 0. 95, or 0. 6578.

Example of the effect of changing sample size n Value of standard error Probability

Example of the effect of changing sample size n Value of standard error Probability of sample mean being when σ = 1485 within $100 of μ 50 σ/7. 071 210 0. 37 200 σ/14. 142 105 0. 66 500 σ/22. 361 66. 4 0. 87 1000 σ/31. 623 47. 0 0. 97 From these calculations, note how the larger sample size produces sampling distributions where the sample mean is generally closer to the population mean μ. The last column shows how there is increased probability that the sample mean is within $100 of the population mean µ as n becomes larger.

Constructing interval estimates of a parameter • The general form for the interval estimate

Constructing interval estimates of a parameter • The general form for the interval estimate of a population parameter is Point estimate of parameter ± Margin of error • The margin of error is an amount that is added to and subtracted from the point estimate of a statistic, to produce an interval estimate of the parameter. • The size of the margin of error depends on – The type of sampling distribution for the sample statistic. – The percentage of the area under the sampling distribution that a researcher decides to include – usually 90%, 95%, or 99%. This is termed a confidence level. • Each interval estimate is an interval constructed around the point estimate, along with a confidence level.

Examples of interval estimates • Statistics Canada reports that mean weekly food expenditures for

Examples of interval estimates • Statistics Canada reports that mean weekly food expenditures for Prairie households in 2001 were $127. 78. But the data were obtained from a sample so there is sampling error associated with this estimate. The “true” value of the mean is between $123. 78 and $131. 78, 68% of the time and between $119. 78 and 135. 78, 95% of the time. Source: Statistics Canada, Food Expenditure in Canada 2001, catalogue no. 62 -554 -XIE, pp. 16, 70, 81. • “The margin of error is estimated to be plus or minus 3. 51 per cent, 19 times out of 20. ” From the Palliser electoral district poll reporting Conservative at 43. 3%, NDP at 35. 7%, Liberal at 17. 3%, and Green at 3. 5% of decided voters, conducted by Sigma Analytics. Source: Leader-Post, Regina, October 3, 2008, pp. A 1 -A 2.

Statistics Canada uses the following method For example, if the estimate of an average

Statistics Canada uses the following method For example, if the estimate of an average expenditure for a given category is $75 and the corresponding CV is 5%, then the “true” value is between $71. 25 and $78. 75, 68% of the time and between $67. 50 and $82. 50, 95% of the time. (p. 70 of 62 -514 XIE). The intervals on for mean food expenditure on the last slide were constructed from this.

Modified FIGURE 8. 1 SAMPLING DISTRIBUTION OF THE SAMPLE MEAN AMOUNT SPENT FROM SIMPLE

Modified FIGURE 8. 1 SAMPLING DISTRIBUTION OF THE SAMPLE MEAN AMOUNT SPENT FROM SIMPLE RANDOM SAMPLES OF 100 CUSTOMERS A sampling distribution of the sample mean for a simple random sample of 100 individuals from a population with a standard deviation of 20. The mean of the sampling distribution of is the population mean μ and its standard deviation, or standard error, is 2. This distribution can be used to construct an interval estimate of μ.

Constructing an interval estimate for a population mean μ • Obtain the point estimate

Constructing an interval estimate for a population mean μ • Obtain the point estimate of μ, that is, the sample mean. • Determine the distribution of the sample mean. If n is large, then the Central Limit Theorem can be used and is normally distributed with mean μ and standard deviation • Select a confidence level. The most common level is 95%. • Obtain the margin of error associated with the confidence level. For a normal distribution, the interval from Z = -1. 96 to Z = 1. 96 contains 95% of the area under the curve or of the sample means. See next slide to illustrate this. • The 95% interval estimate is

Modified FIGURE 8. 2. SAMPLING DISTRIBUTION OF SHOWING THE ¯ LOCATION OF SAMPLE MEANS

Modified FIGURE 8. 2. SAMPLING DISTRIBUTION OF SHOWING THE ¯ LOCATION OF SAMPLE MEANS THAT ARE WITHIN 3. 92 Z-values OF μ In this example, the standard error is 2 and the margin of error is 2 x 1. 96 = 3. 92. For the general case, 1. 96 is multiplied by the standard error to determine the margin of error.

Example of interval estimates - I Statistics of total income, Saskatchewan females employed full-time

Example of interval estimates - I Statistics of total income, Saskatchewan females employed full-time and full-year, by age, 2003 Age group 25 -34 35 -44 45 -54 55 -64 Income in thousands of dollars Mean Standard deviation 33. 3 40. 3 45. 1 40. 1 13. 5 20. 7 25. 9 Sample size 55 57 37 31 Source: Data for this question adapted from Statistics Canada. General Social Survey of Canada, 2003. Cycle 17: Social Engagement [machine readable data file]. 1 st Edition. Ottawa, ON: Statistics Canada [publisher and distributor] 10/1/2004. Obtained through University of Regina Data Library Services.

Example of interval estimates - II • Obtain 95% interval estimates for the mean

Example of interval estimates - II • Obtain 95% interval estimates for the mean income of all full-time, full-year employed females in Saskatchewan in these age groups. • Describe the pattern of mean income by age. Analysis: The pattern in the samples is clear – increased mean income from ages 25 -34 to 45 -54, then a decline for ages 55 -64. However, the data from each of the four age groups is a sample, so interval estimates are necessary to comment on whether this pattern appears to hold for all females.

Example of interval estimates - III Obtain an interval estimate for the mean income

Example of interval estimates - III Obtain an interval estimate for the mean income of all females aged 25 -34. Call this μ. • The point estimate of μ is the sample mean, • Since n = 55 is reasonably large, the Central Limit Theorem will be used. Thus, is normally distributed with mean μ and standard deviation • Select the 95% confidence level, as requested. • In a normal distribution, Z = -1. 96 to Z = 1. 96 has 95% of the area under the curve or of the sample means. • The 95% interval estimate is • In this example, s is used as an estimate of σ. • The interval is • The margin of error is ± 3. 6 and the 95% interval estimate of μ is (29. 7, 36. 9) thousand dollars.

Example of interval estimates - IV Age group Income in thousands of Sample Margin

Example of interval estimates - IV Age group Income in thousands of Sample Margin dollars size of error Mean Standard deviation 95% interval estimates 25 -34 35 -44 33. 3 40. 3 13. 5 20. 7 55 57 ± 3. 6 ± 5. 4 (29. 7, 36. 9) (34. 9, 45. 7) 45 -54 55 -64 45. 1 40. 1 25. 9 37 31 ± 8. 2 ± 9. 1 (36. 9, 53. 3) (31. 0, 49. 2) • Explain why the margins of error differ as they do. • Explain the pattern of mean income by age for all females of each age group, now that interval estimates are available.

Example of interval estimates - V • The margin of error is greater when

Example of interval estimates - V • The margin of error is greater when s is larger or n is smaller. All these interval estimates have the same Z = ± 1. 96 associated with the 95% confidence level. A larger confidence level produces a larger Z, a larger margin of error, and a wider interval. • The intervals for each of the groups between ages 35 and 64 overlap a lot, meaning that there may not be differences in the mean income for all females of these ages. The interval for the 45 -54 and 25 -34 age groups do not overlap so it is fairly certain that all females aged 25 -34 have lower incomes than do all those aged 45 -54. • Note that the target or sample populations in this example are not really all Saskatchewan females of each age group, but only those employed and employed full-time and full-year.

Interpretation of interval estimates • The interval estimate is an interval of values of

Interpretation of interval estimates • The interval estimate is an interval of values of the sample mean. We hope that this interval contains the population mean μ. • With repeated random sampling, if a 95% confidence level is selected, the probability is 0. 95 that the intervals contain the population mean μ. A particular interval may or may not contain μ but the method employed here means that 95% of intervals are constructed so that they cross the population mean μ. (For example, 95% confidence intervals for the two poor samples – samples 65 and 171 – in the 192 sample simulation do not contain the population mean). See following slide for an illustration of this. • When reporting a confidence interval, make sure you report both the interval and the confidence level. One without the other is meaningless.

Determination of σ • In order to construct an interval estimate, it is necessary

Determination of σ • In order to construct an interval estimate, it is necessary to obtain some estimate of σ, the variability of the population from which the sample is drawn. This is required to obtain an estimate of the standard error of the sample mean • Generally, the sample standard deviation s is used as an estimate of σ. For large sample size, assume the CLT holds and assume s provides a reasonable estimate of σ. For a small sample, where n < 30, the t-distribution should be used, again using s as an estimate of σ. • In sections 8. 1 and 8. 2, ASW distinguish methods for when σ is known and unknown. In practice σ is rarely known and in note 1, p. 299, ASW state this. In addition, as n increases, the t-distribution approaches the normal distribution. Thus, so long as n > 30, it is acceptable to use s as an estimate of σ for purposes of constructing an interval estimate.

Selecting a confidence level • There is no one confidence level that is appropriate

Selecting a confidence level • There is no one confidence level that is appropriate for all circumstances. • Greater confidence level means greater certainty that the interval estimate of µ actually contains µ. But for 99% or 99. 9% confidence level, the interval may be very wide. • Smaller confidence levels (eg. 80% or 90%) produce smaller margins of error and seemingly more precise interval estimates, but they are less likely to contain µ. • Use the level requested or the level others have used when researching similar issues. • By tradition, the default level is 95%. • Issues such as manufacturing products to be safe for human use, eg. foods, should require high confidence levels (99. 9%+). But this may increase costs of manufacture and checking for safety.

Cautions about interval estimates • There are many assumptions involved in interval estimation: –

Cautions about interval estimates • There are many assumptions involved in interval estimation: – The sample is randomly selected from a population. – The sample size is sufficiently large to use the CLT. – The population standard deviation is known or s is a good estimate of σ. – The selection of a confidence level is an arbitrary process. – The population is not too skewed (note 2, ASW, 308). • As a result, interval estimates are not precise, but are estimates or approximations. • Larger n, repeated sampling, comparisons with other studies, and careful sampling and survey design and practice can improve the quality of the estimates.

Next week • t-distribution (ASW, sections 8. 1, 8. 2). • Sample size (ASW,

Next week • t-distribution (ASW, sections 8. 1, 8. 2). • Sample size (ASW, section 8. 3) • Interval estimates for proportions (ASW, sections 6. 3, 7. 6, 8. 4). • Extra office hour – Friday, October 10, 1 -3 p. m. , CL 237.