CHAPTER6 Sampling error and confidence intervals Parameter population

  • Slides: 55
Download presentation
CHAPTER-6 Sampling error and confidence intervals

CHAPTER-6 Sampling error and confidence intervals

Parameter population error statistic sample

Parameter population error statistic sample

Section 1 sampling error of mean Section 2 t distribution Section 3 confidence intervals

Section 1 sampling error of mean Section 2 t distribution Section 3 confidence intervals for the population mean

Section 1 sampling error of mean

Section 1 sampling error of mean

A simple random sample is a sample of size n drawn from a population

A simple random sample is a sample of size n drawn from a population of size N in such a way that every possible random samples n has the same probability of being selected. Variability among the simple random samples drawn from the same population is called sampling variability, and the probability distribution that characterizes some aspect of the sampling variability, usually the mean but not always, is called a sampling distribution. These sampling distributions allow us to make objective statements about population parameters without measuring every object in the population.

[Example 1] The population mean of DBP in the Chinese adult men is 72

[Example 1] The population mean of DBP in the Chinese adult men is 72 mm. Hg with standard deviation 5 mm. Hg. 10 adult participants was chosen randomly from the Chinese adult men, here we can calculate the sample mean and sample standard deviation. Supposing sampling 100 times, what’s the result?

N linkage

N linkage

If random samples are repeatedly drawn from a population with a mean μ and

If random samples are repeatedly drawn from a population with a mean μ and standard deviation σ , we can find: 1 the sample means are different from the others 2 The sample mean are not necessary equal to population mean μ 3 The distribution of sample mean is symmetric about μ HOW TO EXPLORE THE SAMPLING DISTRIBUTION FOR THE MEAN?

The difference between sample statistics and population parameter or the difference among sample statistics

The difference between sample statistics and population parameter or the difference among sample statistics are called sampling error.

v In real life we sample only once, but we realize that our sample

v In real life we sample only once, but we realize that our sample comes from a theoretical sampling distribution of all possible samples of a particular size. The sampling distribution concept provides a link between sampling variability and probability. Choosing a random sample is a chance operation and generating the sampling distribution consists of many repetitions of this chance operation.

Central limit Theorem Ø When sampling from a normally distributed population with mean μ,

Central limit Theorem Ø When sampling from a normally distributed population with mean μ, the distribution of the sample mean will be normal with mean μ

 =10 = 50 Population distribution n=4 n =16 X X Sampling distribution

=10 = 50 Population distribution n=4 n =16 X X Sampling distribution

Central limit Theorem Ø When sampling from a nonnormally distributed population with mean μ,

Central limit Theorem Ø When sampling from a nonnormally distributed population with mean μ, the distribution of the sample mean will be approximately normal with mean μ as long as n is larger enough (n>50).

X

X

Standard error (SE) can be used to assess sampling error of mean. Although sampling

Standard error (SE) can be used to assess sampling error of mean. Although sampling error is inevitable, it can be calculated accurately.

Calculation of standard error (SE) ü theoretical value ü estimation of SE s↑→SE↑ n↑→SE↓

Calculation of standard error (SE) ü theoretical value ü estimation of SE s↑→SE↑ n↑→SE↓ linkage

v Example 5. 2 One analyst chose randomly a sample (n=100) and measured their

v Example 5. 2 One analyst chose randomly a sample (n=100) and measured their weights with a mean of 72 kg and standard deviation of 15 kg. Question: what is the standard error?

Solution:

Solution:

Exercise 5. 1 Consider a sample of measurement 100 with mean 121 cm and

Exercise 5. 1 Consider a sample of measurement 100 with mean 121 cm and standard deviation 7 cm drawn from a normal population. Try to compute its standard error.

Solution:

Solution:

Section 2 t distribution

Section 2 t distribution

1. Definition N(μ, 2) N(0, 1)

1. Definition N(μ, 2) N(0, 1)

Random sampling

Random sampling

Usually standard deviation σ is unknown, so we can only get s, then we

Usually standard deviation σ is unknown, so we can only get s, then we can calculate

This sampling distribution was developed by W. S Gossett and published under the pseudonym

This sampling distribution was developed by W. S Gossett and published under the pseudonym “student” in 1908. it is, therefore, sometimes called the “student’s t distribution and is really a family of distributions dependent on the n-1.

Z distribution t distribution =n-1

Z distribution t distribution =n-1

2. the characteristics of t distribution graph FIG 4 the graph of t distribution

2. the characteristics of t distribution graph FIG 4 the graph of t distribution with different degrees of freedom

v 1 symmetric about 0; v 2 the shape of t curve is determined

v 1 symmetric about 0; v 2 the shape of t curve is determined by degree of freedom, df=n-1. v 3 t-distribution is approximated to standard normal distribution when n is infinite.

t critical value with one-sided probability → t(α, ) t critical value with two-sided

t critical value with one-sided probability → t(α, ) t critical value with two-sided probability → t(α/2, )

Example 5. 2 With n=15, find t 0 such that P(-t 0≤t≤ t 0

Example 5. 2 With n=15, find t 0 such that P(-t 0≤t≤ t 0 )=0. 90

solution From t value table, df=15 -1=14, the twotailed shaded area equals 0. 10,

solution From t value table, df=15 -1=14, the twotailed shaded area equals 0. 10, so -t 0=-1. 761 and t 0 =1. 761

Section 3 confidence intervals for the population mean

Section 3 confidence intervals for the population mean

Statistical methods descriptive statistics inferential statistics parameter estimation Point estimation hypothesis test Intervals estimation

Statistical methods descriptive statistics inferential statistics parameter estimation Point estimation hypothesis test Intervals estimation

1. Basic concepts Parameter estimation: Deduce the population parameter basing on the sample statistics

1. Basic concepts Parameter estimation: Deduce the population parameter basing on the sample statistics

ØPoint Estimate A single-valued estimate. A single element chosen from a sampling distribution. Conveys

ØPoint Estimate A single-valued estimate. A single element chosen from a sampling distribution. Conveys little information about the actual value of the population parameterabout the accuracy of the estimate.

ØConfidence Interval or Interval Estimation An interval or range of values believed to include

ØConfidence Interval or Interval Estimation An interval or range of values believed to include the unknown population parameter.

Point estimation Intervals estimation Lower limit Upper limit

Point estimation Intervals estimation Lower limit Upper limit

a/2 1 -a a/2

a/2 1 -a a/2

2. Methods Z distribution 1. σ is known 2. σ is unknown ,n> 50

2. Methods Z distribution 1. σ is known 2. σ is unknown ,n> 50 CI t distribution σ is unknown,n≤ 50 CI

Example 5. 3 A horticultural scientist is developing a new variety of apple. One

Example 5. 3 A horticultural scientist is developing a new variety of apple. One of the important traits, in addition to taste, color, and storability, is the uniformity of the fruit size. To estimate the weight she samples 100 mature fruit and calculates a sample mean of 220 g and standard deviation 5 g Develop 95% confidence intervals for the population mean μ from her sample

solution 95% confidence intervals for the population mean is between 219. 02 and 220.

solution 95% confidence intervals for the population mean is between 219. 02 and 220. 98 g

Exercise A forester is interested in estimating the average number of ‘count trees’ per

Exercise A forester is interested in estimating the average number of ‘count trees’ per acre. A random sample of n=64 one acre is selected and examined. The average (mean) number of count trees per acre is found to be 27. 3, with a standard deviation of 12. 1. Use this information to construct 95% confidence interval for μ.

solution 95% confidence intervals for the population mean is between 24. 36 and 30.

solution 95% confidence intervals for the population mean is between 24. 36 and 30. 24

The forester is 95% confident that the population mean for “count trees” per acre

The forester is 95% confident that the population mean for “count trees” per acre is between 24. 36 and 30. 24

Example 5. 4 The ecologist samples 25 plants and measures their heights. He finds

Example 5. 4 The ecologist samples 25 plants and measures their heights. He finds that the sample has a mean of 15 cm and a sample deviation of 4 cm. what is the 95% confidence interval for the population mean μ

solution df=25 -1=24

solution df=25 -1=24

The plant ecologist is 95% confident that the population mean for heights of these

The plant ecologist is 95% confident that the population mean for heights of these plants is between 13. 349 and 16. 651 cm

Exercise 1 one doctor samples 25 men and measures their heights. He finds that

Exercise 1 one doctor samples 25 men and measures their heights. He finds that the sample has a mean of 172. 12 cm and a sample deviation of 4. 50 cm. what is the 95% confidence interval for the population mean μ

solution 95% confidence intervals for the population mean is between 170. 26 and 173.

solution 95% confidence intervals for the population mean is between 170. 26 and 173. 98

Exercise 2 Random samples of size 9 are repeatedly drawn from a normal distribution

Exercise 2 Random samples of size 9 are repeatedly drawn from a normal distribution with a mean of 65 and a standard deviation of 18. Describe the sampling distribution of mean.

65 Lower limit Upper limit

65 Lower limit Upper limit

PROBLEM 1. What are the difference of SD and SE? 2. What is the

PROBLEM 1. What are the difference of SD and SE? 2. What is the medical reference range? What is the confidence intervals for population mean?