Generalization How Broadly do the Results Apply Chapter
Generalization: How Broadly do the Results Apply? Chapter 2
Statistic and Parameter • So far in our study of tests of significance we’ve focused on process probabilities. • Is the probability that Buzz will push the correct button more than 0. 50? • Is the probability that scissors will be thrown less than 1/3? • With Buzz, the statistic was the proportion of times Buzz pushed the correct button in the trials we looked at (15/16) and the parameter was Buzz’s probability (long-term proportion) of pushing the correct button. • We used this sample statistic to tell us something about the parameter.
Generalization • Now we want to focus more on finite populations instead of an infinite process. • Typically the entire population is not measure for what we are interested in. • We, therefore, take samples (some subgroup of the population) to give us the information we want about a population. • We want to generalize the information from the sample to the population.
Sampling from a Finite Population Section 2. 1
Sampling Students Example 2. 1 A
Sampling Students • We will look at data collected from the registrar’s office from the College of the Midwest for ALL students for Spring 2011 by looking at the two variables in the spreadsheet below that shows the first 8 students. Student ID 1 2 3 4 5 6 7 8 … Cumulative GPA 3. 92 2. 80 3. 08 2. 71 3. 31 3. 83 3. 80 3. 58 … On campus? Yes Yes No Yes …
Sampling Students • What type of variable is “On campus”? • What type is Cumulative GPA? Student ID 1 2 3 4 5 6 7 8 … Cumulative GPA 3. 92 2. 80 3. 08 2. 71 3. 31 3. 83 3. 80 3. 58 … On campus? Yes Yes No Yes …
Sampling Students • Here are graphs (a histogram and a bar graph) representing all of the 2919 students at the College of the Midwest for our two variables of interest.
Sampling Students • We usually don’t have information on an entire population (a census) like we do here. • We usually need to make inferences about a population based on a sample. • Suppose a researcher asks the first 30 students he finds on campus one morning (like 8: 00 am outside of Phelps) whether they live on campus. This would be a quick an convenient way to get a sample.
Sampling Students For this scenario: • What is the population? • What is the sample? • What is the parameter? • What is the statistic? • Do you think this quick and convenient sampling method will result in a similar sample proportion to the population proportion?
Sampling Students • The researcher’s sampling method might overestimate the proportion of students that live on campus because if it is taken early in the morning and most of those that live off campus might not have arrived yet. • We call this sampling method biased. • A sampling method is biased if statistics from samples consistently over or under-estimate the population parameter.
Sampling Students • Bias is a property of a sampling method, not the sample • A method must consistently produce nonrepresentative results to be considered biased • Sampling bias also depends on what is measured • Would the morning sampling method be biased in estimating the average GPA of students at the college? • What about estimating the proportion of students with black hair?
Biased Sampling Let’s take a look at a biased sampling method before we get back to our example.
ESPN Website: What is college basketball's fiercest rivalry? Connecticut vs. Tennessee (Women) Duke vs. North Carolina Hope vs. Calvin Illinois vs. Missouri Indiana vs. Purdue Louisville vs. Kentucky Penn vs. Princeton Philadelphia's Big 5 Oklahoma vs. Oklahoma State Xavier vs. Cincinnati http: //proxy. espn. go. com/chat/sportsnation/polling? event_id=1194
ESPN Website: What is college basketball's fiercest rivalry? 75. 1% Hope vs. Calvin 9. 3% Duke vs. North Carolina 5. 4% Indiana vs. Purdue 5. 2% Philadelphia's Big 5 1. 7% Penn vs. Princeton 1. 5% Oklahoma vs. Oklahoma State 0. 7% Louisville vs. Kentucky 0. 6% Connecticut vs. Tennessee (Women) 0. 3% Illinois vs. Missouri 0. 3% Xavier vs. Cincinnati Total Votes: 46, 084
Random Sample • To get a sample that represents its population: • You can’t have people self-select themselves into the sample. (Basketball poll) • You can’t choose a convenient sample that is clearly not representative of the population. (this class and you are interested in proportion of college students that major in the social sciences)
Random Sample • A simple random sample is the easiest way to insure that your sample method is unbiased. • Remember that a sampling method is biased if statistics from samples consistently over or under-estimate the population parameter. • Hence, an unbiased method of sampling does not have a tendency to over or under-estimate the population parameter.
Simple Random Sample • A simple random sample is like drawing names out of a hat. • Technically, a simple random sample is a way of randomly selecting members of a population so that every sample of a certain size from a population has the same chance of being chosen.
Sampling Students •
Sampling Students •
Sampling Students • We took 5 different SRSs of 30 students • Each sample gives different statistics • This is sampling variability • The values don’t change much: • Average GPAs from 3. 22 to 3. 40 • Sample proportions from 0. 63 to 0. 83 Random sample 1 2 3 4 5 3. 22 3. 29 3. 40 3. 26 3. 25 0. 80 0. 83 0. 77 0. 63 0. 83
Sampling Students • Random sample 1 2 3 4 5 3. 22 3. 29 3. 40 3. 26 3. 25 0. 80 0. 83 0. 77 0. 63 0. 83
Sampling Students • We took 1000 SRSs and have graphs of the 1000 sample means (for the GPAs) and 1000 sample proportions (for living on campus). • The mean of each distribution falls near the population parameter
Sampling Students • If we took all possible random samples of 30 students from this population the averages of the statistics would match the parameters exactly. • This distribution of statistics is called a sampling distribution and it is what we are approximating with our null distributions and our theory-based distributions. • Statistics computed from SRSs cluster around the parameter so this is an unbiased sampling method because there is no tendency to over or underestimate the parameter
Sampling Students • We can generalize when we use simple random sampling because it creates: • A sample that is representative of the population • A sample statistic that is close to the parameter
Sampling Students • If the researcher at the College of the Midwest uses 75 students instead of 30 with the same early morning sampling method will it be less biased? • No, selecting more students in the same manner doesn’t fix the tendency to oversample students who live on campus • A smaller sample that is random is actually more accurate.
Sampling Students • What is an advantage of a larger sample size? • Less sample to sample variability • Statistics from different samples cluster more closely around the center of the distribution
Notation Check Remember that statistics summarize a sample and parameters summarize a population
Learning Objectives for Section 2. 1 • Identify the (finite) population and the sample in a statistical study. • Identify parameters and statistics in a statistical study. • Be able to fill in a data table where rows are the observational units and columns are the variables. • Identify when a sampling method might be biased and understand what happens when a sampling method is biased. • Recognize that the types of statistics and graphs used for categorical and quantitative variables differ, and be able to identify which statistics (proportions, means, SDs) and graphs (bar graph, dotplot, histogram) are appropriate for each type of variable.
Learning Objectives for Section 2. 1 • State that collecting a representative sample from a population allows for generalizing results of inference procedures from the sample statistic(s) to the population parameter(s). • Recognize that small random samples can be representative of the population; you do not have to have a large proportion of the population in your sample to be representative.
Exploration 2. 1 A: Sampling Words • We need to sample from a population of interest if it is very large or is difficult to measure every single member of the population. • If we were interested in High School GPA for Hope students we would not need to sample. The registrar’s office has all that information. If we were interested in something that has not already been collected, we might want to sample.
Exploration 2. 1 A: Sampling Words • That being said, in this activity we will be using the words in the Gettysburg Address as our population. • There are fewer than 300 in this speech and we could easily look at the entire speech to find out average word length, proportion of words that contain an e, etc. • We will be sampling from this speech not to get information from the population, but to help us learn some things about sampling.
Only picture of Lincoln at Gettysburg (There is another picture in which there is some dispute as to whether or not two blurry images are that of Lincoln. ) (Edward Everett spoke for over two hours. Lincoln followed with his two-minute speech. )
Exploration 2. 1 A • Select what you think is a representative sample of 10 words from the Gettysburg Address (pg 109). Record your words in the table in question 2. • Make dotplots of both average length and proportion containing e on the board.
Exploration 2. 1 A • Select a random sample of 10 words from the Gettysburg Address (pg 112). • Again we will make dotplots of both average length and proportion containing e on the board. • Which sample is more representative of the population?
Exploration 2. 1 A • We should have seen that our simple random sample gave us an unbiased estimate of the population mean and proportion while the self-selected sample was biased.
Exploration 2. 1 A Are these sampling methods biased? • Close our eyes and blindly point a pencil at 10 words. • Cut all the words out of the book, put them in a hat and draw out 10. • Put all the words on the same size pieces of paper, put them in a hat and draw out 10.
Exploration 2. 1 A • Now let’s go to the Sampling Words applet and see how: • The sample size changes the variability in the sampling distribution. • The population size doesn’t change the sampling distribution.
Central Limit Theorem • This idea that distributions of sample means forms an approximately normal distribution (with predictable mean and standard deviation) when the sample size is large enough is known as the Central Limit Theorem. • In the Gettysburg Address example, we saw that when we had a population distribution that was skewed, even with a fairly small sample size, the distribution of sample means was fairly symmetric.
Predicting Mean and SD for a Sampling Distribution • Let’s also look at the Sampling Words Applet to take samples of different distributions so we can see the Central Limit Theorem Working
Review of Section 2. 1 • A sampling method is biased if statistics from samples consistently over or underestimate the population parameter.
Review of Section 2. 1 • A simple random sample is the easiest way to insure that your sample is unbiased. • Therefore, if we have a simple random sample, we can infer our results to the population from which is was drawn. • Even small samples can be representative of a very large population. If we have a simple random sample, we can generalize our results to a large population.
Review of Section 2. 1 • We saw biased and unbiased sampling in the Gettysburg Address exploration. We also saw that: • When we increase sample size, the variability of our sampling distribution decreases. • This variability can be predicted. • Changing the population size has no effect on variability.
Population distribution of word lengths Distribution of average word length from samples of size 20 When we sample from a population and calculate a sample mean and then repeat this process over and over again, the distribution will look bell shaped under certain conditions.
Using methods similar to what we did in the last section, we will see how a null distribution for a single quantitative variable can be obtained and even predicted. Section 2. 2: Inference for a Single Quantitative Variable
Example 2. 2: Estimating Elapsed Time • Students in a stats class (for their final project) collected data on students’ perception of time • Subjects were told that they’d listen to music and asked questions when it was over. • Played 10 seconds of the Jackson 5’s “ABC” and asked how long they thought it lasted • Can students accurately estimate the length?
Hypotheses Null Hypothesis: People will accurately estimate the length of a 10 second-song snippet, on average. (μ = 10 seconds) Alternative Hypothesis: People will not accurately estimate the length of a 10 second-song snippet, on average. (μ ≠ 10 seconds)
Estimating Time • A sample of 48 students on campus were subjects and song length estimates were recorded. • What does a single dot represent? • What are the observational units? Variable?
Skewed, mean, median • The distribution obtained is not symmetric, but is right skewed. • When data are skewed right, the mean gets pulled out to the right while the median is more resistant to this.
Mean vs Median • The mean is 13. 71 and the median is 12. • How would these numbers change if one of the people that gave an answer of 30 seconds actually said 300 seconds? • The standard deviation is 6. 5 sec. Is it resistant to outliers? • Let’s look at this in the Descriptive Statistics Applet.
Inference • H 0: μ = 10 seconds • Ha: μ ≠ 10 seconds • Our problem now is, how do we develop a null distribution? (The second S in our 3 S strategy. ) • Flipping coins and spinning spinners will not work to model what would happen under a true null hypothesis. • Let’s build on what we did last time with the Gettysburg Address exploration.
Population distribution of word lengths Distribution of average word length from samples of size 20
All we have is our sample data ? ? ? We don’t have population data
Central Limit Theorem • Remember that this idea that distributions of sample means forms an approximately normal distribution (with predictable mean and standard deviation) when the sample size is large enough is known as the Central Limit Theorem. • In the Gettysburg Address example, we saw that when we had a population distribution that was skewed, even with a fairly small sample size, the distribution of sample means was fairly symmetric.
t-distribution • With testing means, we not only don’t know the population mean exactly, but we don’t know the population standard deviation. There are more unknowns. • This is why we need to us a t-distribution. The tdistribution has slightly “heavier” tails than a normal distribution.
t-distribution •
Validity Conditions • The theory-based test for a single mean requires: That you have a symmetric sample distribution or a sample size is at least 20 and the sample distribution is not strongly skewed. • In practice we will use theory-based applet to run this test. (Let’s do this, but first we need to get the data or the mean, SD, and sample size)
Estimating Time Summary • H 0: μ = 10 seconds • Ha: μ ≠ 10 seconds • t = 3. 95 and p-value = 0. 0003 • Based on our small p-value (or large standardized statistic), we can conclude that people don’t accurately estimate the length of a 10 -second song snippet and in fact they overestimate it.
Summary • When we test a single quantitative variable, our hypothesis have the following form: • H 0: μ = some number • Ha: μ ≠ < or > some number • We will get our data (or mean, sample size, and SD for our data) and use the Theory-Based Inference applet to determine the p-value. • The p-values we get with this test has the same general meaning as those from a test for a single proportion.
• Let’s work exercise 2. 2. 24
Learning Objectives for Section 2. 2 •
Exploration 2. 2: Sleepless Nights? • Page 131. • We will set up the test and explore the data with questions 1 -8, 10, 11, 15 -22. • We will do theory-based inference as a group.
Summary •
Section 2. 3: Significance and Errors
Significance Level • We think of a p-value as telling us something about the strength of evidence from a test of significance. • The lower the p-value the stronger the evidence. • Some people think of this in more black and white terms. • Either we reject the null (and accept alternative) or not.
Significance Level • The value that we use to determine to reject the null or not is called the significance level. • We reject the null when the p-value is less than or equal to (≤) the significance level. • The significance level is often represented by the Greek letter alpha, α.
Significance Level • Typically we use 0. 05 for our significance level. There is nothing magical about 0. 05. We could set up our test to make it • harder to conclude the alternative (smaller significance level say 0. 01) or • easier (larger significance level say 0. 10).
Significance Level • If the p-value is 0. 023 and the significance level is 0. 05 do I reject the null hypothesis? • If the p-value is 0. 023 and the significance level is 0. 01 do I reject the null hypothesis? • The smaller the significance level the harder it is to reject the null and conclude the alternative. • When we make a conclusion, hopefully it is correct. However, we could have made one of two types of errors.
Type I error • Think back to Buzz and Doris. • We concluded that they could communicate. • Suppose that in reality they couldn’t (Buzz just had a lucky day). • What we have done is to reject a true null hypothesis. This is called a type I error and is sometimes referred to a false alarm.
Type II error • Now suppose we obtained a large p-value so we didn’t get significant results in the Buzz and Doris example. • Hence, we could not conclude that they could communicate. • Also suppose that in reality they really could. • What we have done is to not reject a false null hypothesis. This is called a type II error and is sometimes referred to a missed opportunity.
Type I and Type II errors • In medical tests: • A type I error is a false positive. (They conclude someone has a disease when they don’t. ) • A type II error is a false negative. (They conclude someone does not have a disease then they actually do. ) • As you can see, these errors can have very different consequences.
Type I and Type II Errors
Type I and Type II Errors
Type I and Type II Errors What are type I and type II errors in the following: • In the St. George’s Hospital example where they were trying to determine if the Hospital had a higher death rate after heart transplants than the country average of 15%. • Since we found a significant result for the St. George’s Hospital example, which error could we have made? • In the Halloween treats example where they were trying to determine if kids had a preference between toy and candy. • Since we did not find significant results in the Halloween treats example, which error could we have made?
The probability of a Type I error • The probability of a type I error is the significance level. • Suppose the significance level is 0. 05. If the null is true we would reject it 5% of the time and thus make a type I error 5% of the time. • If you make the significance level lower, you have reduced the probability of making a type I error, but have increased the probability of making a type II error.
The probability of a Type II error • The probability of a type II error is more difficult to calculate. • In fact, the probability of a type II error is not even a fixed number. It depends on the value of the true parameter. • The probability of a type II error can be very high if: • The true value of the parameter and the value you are testing are close. • The sample size is small.
Power • The probability of rejecting a false null hypothesis is called the power of a test. • Power can also be thought of as 1 minus the probability of a type II error. • We want a test with high power and this is accomplished by: • A sample proportion (or mean) far away from the parameter in the null hypothesis. • A large sample size. • For quantitative data---a small standard deviation.
Just one more thing… • We never, ever, ever, ever, ever conclude the null or find strong evidence for the null. • We either find strong evidence against the null (or for the alternative) or we do not. • A large p-value is always weak evidence of something and never strong evidence.
One more thing… • Suppose you are testing H 0: π = 0. 50, get a sample proportion of 0. 55 and up with a large p-value. • What possible logic is there for you to conclude that since the sample proportion is 0. 55, the population proportion is 0. 50? • You can say that 0. 50 is plausible, but so are many other values.
One more thing… • Suppose you have a two-headed quarter and for some strange reason want to test to see if it is not a fair coin when flipped. • H 0: π = 0. 50 and Ha: π ≠ 0. 50. • You flip it three times. • What is your sample proportion? • What is your p-value? • What is your conclusion?
Learning Objectives for Section 2. 3 • State, justify, and explain the reasoning behind a test decision about rejecting the null hypothesis or not, depending on the significance level and p-value of a test. • Describe what Type I and Type II Errors mean in a particular context and describe consequences of making such an error in that context. • Recognize that the significance level is the probability of a Type I error, assuming the null hypothesis is true. • Recognize that decreasing the probability of one type of error typically means increasing the probability of the other type of error, unless the sample size or other factors also change. • Recognize which error could have been made after drawing a conclusion in a test of significance.
Expl 2. 3: Parapsychology Studies (pg 142)
- Slides: 86