Sampling distribution Repeated samples from a population The
Sampling distribution • Repeated samples from a population • The sample statistic (e. g. mean) from a specific sample becomes an observation • Probability distribution of these sample statistics gives us a measure of the “precision” for the sample statistics 1
Human genome (N = 20290 genes) All human genes are completely sampled human genome can be treated as a population The longest gene is 99630 nucleotides! 2
The length of 1000 randomly sampled genes Sample 1: 3
The length of 1000 randomly sampled genes Sample 2: 4
The length of 1000 randomly sampled genes Sample 3: 5
Population distribution Sample 1 Sample 2 Sample 3 6
Sampling distribution 7
Central limit theorem The means of samples drawn from a population of any distribution will approach to be normally distributed as sample size increases n = 20 n = 80 n = 40 8
Population distribution of human gene lengths are not normally distributed Sampling distribution of mean human gene lengths are normally distributed 9
Sampling distribution is narrower than population distribution but has the same mean Population Sample mean 10
11
12
Population SD vs. Sample SE Population variance Sample size Variance of the sample means Population standard deviation Standard error of the sample mean 13
SD or SE? You are telling people about the range of the data values X You are telling people information about the range of the statistics (e. g. means) 14
Inferring a population parameter (e. g. mean) from a single sample is one guess at How close this sample mean is to population true mean? • How fare is the sampling process (=randomness) • How variable the population is (= 2 or ) • How big is the sample size (n) 15
Confidence intervals for the mean Assuming a sampling distribution fits Z distribution: 16
17
Confidence intervals Lower 95% Limit (L 1) -1. 960 Upper 95% Limit (L 2) 1. 960 18
Confidence intervals 19
Confidence intervals With the sampling distribution of 10, 000 mean gene lengths, we can also observe the 2. 5 th and 97. 5 th percentiles, which will be very close to 2500 and 2746, respectively. 20
Confidence intervals In real world, population standard deviation, σ, are often unknown: Replace σ with s: s is the standard deviation of the observations in the only sample we have 21
Student’s t distribution is not a standard normal distribution because s is only a point estimate of σ, and will not always equal σ (the shape of the standard normal distribution requires the exact σ) William S. Gosset pseudonym : student R. A. Fisher - The only one at the time who appreciated student’s t 22
Student’s t distribution has μ=0 and σ depending on sample size (or degree of freedom, df = n-1): As sample size increases (degree of freedom increases), t distribution will approach standard normal distribution. 23
Confidence Intervals Lower Limit (L 1) Upper Limit (L 2) 24
Confidence intervals based on just one sample and the t distribution Sample 1: 25
Confidence intervals based on just one sample and the t distribution Sample 2: 26
Confidence intervals based on just one sample and the t distribution Sample 3: 27
Confidence intervals for the mean 28
- Slides: 28