STAT 101 Dr Kari Lock Morgan Estimation Sampling

  • Slides: 31
Download presentation
STAT 101 Dr. Kari Lock Morgan Estimation: Sampling Distribution SECTIONS 3. 1 • Sampling

STAT 101 Dr. Kari Lock Morgan Estimation: Sampling Distribution SECTIONS 3. 1 • Sampling Distributions (3. 1) Statistics: Unlocking the Power of Data Lock 5

Question of the Day What proportion of M & M candies are blue? Statistics:

Question of the Day What proportion of M & M candies are blue? Statistics: Unlocking the Power of Data Lock 5

The Big Picture Population Sample Interval estimation Hypothesis testing Sampling Statistical Inference Statistics: Unlocking

The Big Picture Population Sample Interval estimation Hypothesis testing Sampling Statistical Inference Statistics: Unlocking the Power of Data Lock 5

Statistical Inference Statistical inference is the process of drawing conclusions about the entire population

Statistical Inference Statistical inference is the process of drawing conclusions about the entire population based on information in a sample. �Example: use the sample of M&Ms candies we have here to draw conclusions about all M&Ms Statistics: Unlocking the Power of Data Lock 5

Statistic and Parameter A parameter is a number that describes some aspect of a

Statistic and Parameter A parameter is a number that describes some aspect of a population. A statistic is a number that is computed from data in a sample. �We usually have a sample statistic and want to use it to make inferences about the population parameter Statistics: Unlocking the Power of Data Lock 5

M & M Candies �p = proportion of M & M candies that are

M & M Candies �p = proportion of M & M candies that are blue �Get an estimate from one sample. p = ? ? ? Statistics: Unlocking the Power of Data Lock 5

The Big Picture Population Sampling PARAMETERS Sample STATISTICS Statistical Inference Statistics: Unlocking the Power

The Big Picture Population Sampling PARAMETERS Sample STATISTICS Statistical Inference Statistics: Unlocking the Power of Data Lock 5

Parameter versus Statistic mu x-bar p-hat sigma rho Statistics: Unlocking the Power of Data

Parameter versus Statistic mu x-bar p-hat sigma rho Statistics: Unlocking the Power of Data Lock 5

Point Estimate We use the statistic from a sample as a point estimate for

Point Estimate We use the statistic from a sample as a point estimate for a population parameter. �Point estimates will not match population parameters exactly, but they are our best guess, given the data Statistics: Unlocking the Power of Data Lock 5

How far might the population parameter fall from the sample statistic? p? p? GOAL:

How far might the population parameter fall from the sample statistic? p? p? GOAL: Identify an interval of plausible values. Statistics: Unlocking the Power of Data Lock 5

Key Question and Answer • Key Question: For a given sample statistic, what are

Key Question and Answer • Key Question: For a given sample statistic, what are plausible values for the population parameter? How far might the true population parameter be from the sample statistic? Key answer: It depends on how much the statistic varies from sample to sample! Statistics: Unlocking the Power of Data Lock 5

More Samples �Let’s collect a few more point estimates! � Important point: Sample statistics

More Samples �Let’s collect a few more point estimates! � Important point: Sample statistics vary from sample to sample, and knowing how much a statistic varies from sample to sample helps us assess uncertainty in the statistic! Statistics: Unlocking the Power of Data Lock 5

Lots of Samples • To really see how statistics vary from sample to sample,

Lots of Samples • To really see how statistics vary from sample to sample, let’s take lots of samples and compute lots of statistics! (eat lots of M&Ms!) • Enter your sample proportion on the google form emailed to you before class (if you don’t have a computer, have someone near you enter your number) • You just made your first sampling distribution! Statistics: Unlocking the Power of Data Lock 5

Sampling Distribution A sampling distribution is the distribution of sample statistics computed for different

Sampling Distribution A sampling distribution is the distribution of sample statistics computed for different samples of the same size from the same population. �A sampling distribution shows us how the sample statistic varies from sample to sample Statistics: Unlocking the Power of Data Lock 5

Sampling Distribution In the M & M sampling distribution, what does each dot represent?

Sampling Distribution In the M & M sampling distribution, what does each dot represent? a) One Reese’s piece b) One sample statistic Statistics: Unlocking the Power of Data Lock 5

Center and Shape Center: If samples are randomly selected, the sampling distribution will be

Center and Shape Center: If samples are randomly selected, the sampling distribution will be centered around the population parameter. Shape: For most of the statistics we consider, if the sample size is large enough the sampling distribution will be symmetric and bell-shaped. Statistics: Unlocking the Power of Data Lock 5

Sampling Caution • If you take random samples, the sampling distribution will be centered

Sampling Caution • If you take random samples, the sampling distribution will be centered around the true population parameter • If sampling bias exists (if you do not take random samples), your sampling distribution may give you bad information about the true parameter • “The. Polls. Have. Stopped. Making. Any. Sense. ” Statistics: Unlocking the Power of Data Lock 5

We really care about the spread of the statistic… Sampling distribution ? How much

We really care about the spread of the statistic… Sampling distribution ? How much do statistics vary from sample to sample? Statistics: Unlocking the Power of Data Lock 5

Standard Error The standard error of a statistic, SE, is the standard deviation of

Standard Error The standard error of a statistic, SE, is the standard deviation of the sample statistic �The standard error measures how much the statistic varies from sample to sample �The standard error can be calculated as the standard deviation of the sampling distribution Statistics: Unlocking the Power of Data Lock 5

Standard Error The more the statistic varies from sample to sample, the a) higher

Standard Error The more the statistic varies from sample to sample, the a) higher b) lower the standard error. Statistics: Unlocking the Power of Data Lock 5

M & M Standard Error Based on our sampling distribution, the standard error is

M & M Standard Error Based on our sampling distribution, the standard error is closest to (distribution below is based on 100 samples): a) 0. 01 b) 0. 1 c) 0. 2 d) 0. 35 Statistics: Unlocking the Power of Data Lock 5

Lower SE means statistics closer to true parameter value… SE = 0. 1 Distance

Lower SE means statistics closer to true parameter value… SE = 0. 1 Distance from parameter to statistic SE = 0. 04 p SE measures “typical” distance between parameter and statistic Statistics: Unlocking the Power of Data Lock 5

Distance from parameter to statistic gives distance from statistic to parameter p SE can

Distance from parameter to statistic gives distance from statistic to parameter p SE can be used to determine width of interval! Rare for statistics to be further than this from parameter So rare for parameter to be further than this from statistic Statistics: Unlocking the Power of Data Lock 5

The larger the SE, the larger the interval SE = 0. 1 Rare for

The larger the SE, the larger the interval SE = 0. 1 Rare for statistics to be further than this from parameter SE = 0. 04 p SE = 0. 04 SE = 0. 1 Statistics: Unlocking the Power of Data Lock 5

Sample Size Matters! As the sample size increases, the variability of the sample statistics

Sample Size Matters! As the sample size increases, the variability of the sample statistics tends to decrease and the sample statistics tend to be closer to the true value of the population parameter �For larger sample sizes, you get less variability in the statistics, so less uncertainty in your estimates Statistics: Unlocking the Power of Data Lock 5

M & Ms � Stat. Key Statistics: Unlocking the Power of Data Lock 5

M & Ms � Stat. Key Statistics: Unlocking the Power of Data Lock 5

Sample Size Suppose we were to take samples of size 10 and samples of

Sample Size Suppose we were to take samples of size 10 and samples of size 100 from the same population, and compute the sample means. Which sample means would have the higher standard error? a) The sample means using n = 10 b) The sample means using n = 100 Statistics: Unlocking the Power of Data Lock 5

Statistics: Unlocking the Power of Data Lock 5

Statistics: Unlocking the Power of Data Lock 5

So larger n means narrower intervals Small n Large n Statistics: Unlocking the Power

So larger n means narrower intervals Small n Large n Statistics: Unlocking the Power of Data p? Lock 5

Summary �Interval estimates are superior to just point estimates because they account for uncertainty

Summary �Interval estimates are superior to just point estimates because they account for uncertainty �A sampling distribution is a collection of many statistics from the same population and same n �The width of the interval depends on how much the statistic varies from sample to sample, measured by the standard error (SE) �Larger SE => wider interval �Larger n => smaller SE => narrower interval Statistics: Unlocking the Power of Data Lock 5

To Do �Read Section 3. 1 �HW 3. 1 due Friday, 10/2 Statistics: Unlocking

To Do �Read Section 3. 1 �HW 3. 1 due Friday, 10/2 Statistics: Unlocking the Power of Data Lock 5