Bell Work 116 1 In your own words
Bell Work 1/16 1) In your own words tell me what you think statistics is about. Give examples how you think statistics applied or how statistics is relevant to your own life.
Objective(s): 1. Identify the Who, What, When, Where, Why and How of data, or recognize when some of this information has not been provided. 2. Identify the cases and variables in any data set. 3. Identify the population from which a sample was chosen.
Unit 1: Introduction to Statistics
What is Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions. Examples of Statistical Applications Search engines and websites that you frequent collect and store data on your internet usage to tailor your online experience Data is collected on each of you to chart your progress throughout your high school experience
Data- Systematically recorded information, whether numbers or labels, together with its context Data is meaningless without context Who: Whom the data is being collected on, or whom we are recording characteristics of What: What characteristics are being recorded about each individual case (variables) Why: Why are we gathering this data? What is its intended use Where: Where was the study conducted When: When was the information collected How: How was the information gathered, or how the data are collected
Data Tables Data table – an arrangement of data in which each row represents a case and each column represents a variable Data collected by Amazon
Important Terms • Population The collection of all responses, measurements, or counts that are of interest. • Sample A portion or subset of the population. x 7 x x x x x xx x x x x x x x xx xx x x x x x
Data Which of the following Venn diagrams shows the relationship between population data and sample data? a). b). P c). S S P d). S P
Contextualizing Data: Identifying the 5 W’s and the How Ian Walker, a psychologist at the University of Bath, wondered whether drivers treat bicycle riders differently when they wear helmets. He rigged his bicycle with an ultrasonic sensor that could measure how close each car was that passed him. He then rode on alternating days with and without a helmet. Out of 2500 cars passing him, he found that when he wore his helmet, motorists passed 3. 35 inches closer to him, on average, than when his head was bear? Who: What: Why: When: Where: How: Population of Interests:
Contextualizing Data: Identifying the 5 W’s and the How Ian Walker, a psychologist at the University of Bath, wondered whether drivers treat bicycle riders differently when they wear helmets. He rigged his bicycle with an ultrasonic sensor that could measure how close each car was that passed him. He then rode on alternating days with and without a helmet. Out of 2500 cars passing him, he found that when he wore his helmet, motorists passed 3. 35 inches closer to him, on average, than when his head was bear? Who: 2500 motorists Population of Interests: all motorists passing cyclists What: the distance between the cars and the bicycle riders Why: to determine if wearing a helmet influenced how drivers treated bicycle riders When: Not specified Where: Not specified How: He rigged his bicycle with an ultrasonic sensor that could measure how close each car was that passed him
Contextualizing Data: Identifying the 5 W’s and the How Some companies offer 401(k) retirement plans to employees, permitting them to shift part of their before-tax salaries into investments such as mutual funds. Employers typically match 50% of the employees’ contributions up to about 6% of salary. One company, concerned with what it believed was a low employee participation rate in its 401(k) plan, sampled 30 other companies with similar plans and asked for their 401(k) participation rates? Who: What: Why: When: Where: How: Population of Interests:
Contextualizing Data: Identifying the 5 W’s and the How Some companies offer 401(k) retirement plans to employees, permitting them to shift part of their before-tax salaries into investments such as mutual funds. Employers typically match 50% of the employees’ contributions up to about 6% of salary. One company, concerned with what it believed was a low employee participation rate in its 401(k) plan, sampled 30 other companies with similar plans and asked for their 401(k) participation rates? Who: 30 similar companies Population of Interests: All similar companies What: 401(k) participation rates Why: concern over low employee participation rates in its 401(k) plan When: Not specified Where: Not specified How: by sampling 30 similar companies
Contextualizing Data: Identifying the 5 W’s and the How Because of the difficulty of weighing a bear in the woods, researchers caught and measured 54 bears, recording their weight, neck size, length, and sex. They hoped to find a way to estimate weight from the other, more easily determined quantities. Who: What: Why: When: Where: How:
Contextualizing Data: Identifying the 5 W’s and the How Because of the difficulty of weighing a bear in the woods, researchers caught and measured 54 bears, recording their weight, neck size, length, and sex. They hoped to find a way to estimate weight from the other, more easily determined quantities. Who: 54 bears What: weight, neck size, length, and sex Why: to find an easier way of estimating weight When: Not specified Where: Not specified How: Researchers collected data from 54 bears they were able to catch
One of the reasons that the Monitoring the Future (MTF) project was started was “to study changes in the beliefs, attitudes, and behavior of young people in the United States. ” Data are collected from 8 th, 10 th, and 12 th graders each year. To get a representative nationwide sample, surveys are given to a randomly selected group of students. In Spring 2004, students were asked about alcohol, illegal drug, and cigarette use. Describe the W’s, if the information is given. If the information is not given, state that it is not specified. Who: What: When: Where: How: Why:
In June 2003 Consumer Reports published an article on some sport-utility vehicles they had tested recently. They reported some basic information about each of the vehicles and the results of some tests conducted by their staff. Among other things, the article told the brand of each vehicle, its price, and whether it had a standard or automatic transmission. They reported the vehicle’s fuel economy, its acceleration (number of seconds to go from zero to 60 mph), and its braking distance to stop from 60 mph. The article also rated each vehicle’s reliability as much better than average, worse, or much worse than average. Describe the W’s and the How, if the information is given: Who: What: When: Where: How: Why:
A listing posted by Arby’s restaurant chain gives, for each of the sandwiches it sells, the type of meat in the sandwich, the number of calories, and the serving size in ounces. The data might be used to assess the nutritional value of the different sandwiches. Describe the W’s and the How, if the information is given: Who: What: When: Where: How: Why:
Bell Work 1/18 1) For the description of data, identify the Who and What were investigated and the population of interest Some motion pictures are profitable and others are not. Understandably, the movie industry would like to know what makes movies successful. Data from 120 first-run movies released in 2005 suggest that longer movies actually make less profit. 2) From the Venn Diagram Identify the population and the sample
Objectives • Classify a variable as categorical (qualitative) or quantitative. • Identify whether variables are quantitative or categorical based on the context of the data • For any quantitative variable, identify the units in which the variable has been measured.
Some variables have units that tell how each value has been measured and tell the scale of the measurement.
Categorical and Quantitative Variables Variable - is an attribute or characteristic of an individual or object whose value varies from case to case Categorical Variables consist of attributes, labels, or non-numerical entries. Examples: Quantitative Variables consist of numerical measurements or counts. (Quantitative Variables always have units) Examples:
Categorical and Quantitative Variables The suggested retail prices of several Ford vehicles are shown in the table. Which data are qualitative data and which are quantitative data? Explain your reasoning. The populations of several U. S. cities are shown in the table. Which data are qualitative data and which are quantitative data?
Categorical and Quantitative Variables determine whether the data are qualitative or quantitative. Explain your reasoning. a) telephone numbers in a directory e) heights of hot air balloons b) body temperatures of patients f) eye colors of models c) lengths of songs on MP 3 player g) carrying capacities of pickups d) Student ID numbers h) age
A February 2007 Gallup Poll question asked, “In politics, as of today, do you consider yourself a Republican, a Democrat, or an Independent? ” The possible responses were “Democrat”, “Republican”, “Independent”, “Other”, and “No Response”. What kind of variable is the response? A pharmaceutical company conducts an experiment in which a subject takes 100 mg of a substance orally. The researchers measure how many minutes it takes for half of the substance to exit the bloodstream. What kind of variable is the company studying?
Categorical and Quantitative Variables For each description of data, identify the W’s, name the variables, specify for each variable whether its use indicates that it should be treated as categorical or quantitative, and, for any quantitative variable, identify the units in which it was measured (or note that they were not provided). Age and party. The Gallup Poll conducted a representative telephone survey of 1180 American voters during the first quarter of 2007. Among the reported results were the voter’s region (Northeast, South, etc. ), age, party affiliation, and whether or not the person had voted in the 2006 midterm congressional election. Who. What. Where. Why. When- How- Variables:
Categorical and Quantitative Variables For each description of data, identify the W’s, name the variables, specify for each variable whether its use indicates that it should be treated as categorical or quantitative, and, for any quantitative variable, identify the units in which it was measured (or note that they were not provided). Schools. The State Education Department requires local school districts to keep these records on all students: age, race or ethnicity, days absent, current grade level, standardized test scores in reading and mathematics, and any disabilities or special educational needs. Who. What. Where. Why. When- How- Variables:
Categorical and Quantitative Variables The Kentucky Derby is a horse race that has been run every year since 1875 at Churchill Downs, Louisville, Kentucky. The race started as a 1. 5 -mile race, but in 1896, it was shortened to 1. 25 miles because experts felt that 3 -year-old horses shouldn’t run such a long race that early in the season. (It has been run in May every year but one— 1901—when it took place on April 29). Here are the data for the first four and several recent races.
Categorical and Quantitative Variables
The Gallup Poll conducted a representative telephone survey of 1180 American voters during the first quarter of 2007. Among the reported results were the voter’s region (Northeast, South, etc. ), age, party affiliation, and whether or not the person had voted in the 2006 midterm congressional election. Who. What. Where. Why. When- How- Variables:
Identify the variables as quantitative or categorical. Explain your reasoning. a) player numbers for a soccer team b) student ID numbers c)wait times at a grocery store d) species of trees in a forest e) weights of infants at a hospital f) responses on an opinion poll
Categorical and Quantitative Variables In November 2003 Discover published an article on the colonies of ants. They reported some basic information about many species of ants and the results of some discoveries found by myrmecologist Walter Tschinkel of the University of Florida. Information included the scientific name of the ant species, the geographic location, the depth of the nest (in feet), the number of chambers in the nest, and the number of ants in the colony. The article documented how new ant colonies begin, the antnest design, and how nests differ in shape, number, size of chambers, and how they are connected, depending on the species. It reported that nest designs include vertical, horizontal, or inclined tunnels for movement and transport of food ants. 1. Describe the W’s, if the information is given: • Who: Colonies of ants. “Many species of ants, ” but no indication of exactly how many. • What: scientific name, geographic location, average nest depth, average number of chambers, average colony size, how new ant colonies begin, the ant-nest design, and how nests differ in architecture. • When: November 2003 • Where: not specified • How: The results of some discoveries found by myrmecologist Walter Tschinkel of the University of Florida • Why: Information of interest to readers of the magazine
The 2. 5 mile Indianapolis Motor Speedway has been the home to a race on Memorial Day nearly every year since 1911. Here are the data for the first five races of five races and five recent Indianapolis 500 races. Included also are the pole winners (the winners of the trial races, when each driver drives alone to determine the position on race day). Identify the W’s, name the variables, specify for each variable whether its use indicates that it should be treated as categorical or quantitative, and, for any quantitative variable, identify the units in which it was measured.
What can go wrong? • Don’t label a variable as categorical or quantitative without thinking about the question you want to answer. The same variable can sometimes take on different roles • Just because your variables are numbers, don’t assume that it’s quantitative. • Always be skeptical
What have we learned? • Data are information in a context. – The W’s help with context. – We must know the Who (cases), What (variables), and Why to be able to say anything useful about the data. We treat variables as categorical or quantitative. • Categorical variables identify a category for each case. • Quantitative variables record measurements or amounts of something and must have units. • Some variables can be treated as categorical or quantitative depending on what we want to learn from them.
Bell Work 1/22 1) From the diagram, identify the population and the sample 2) From the description, Identify the variables, and for each variable tell whether it should be treated as categorical or quantitative Business analysts hoping to provide information helpful to American grape growers compiled these data about vineyards: size (acres), number of years in existence, state, varieties of grapes grown, average case price, gross sales, and percent profit.
Designing a Statistical Study 1. Identify the variable(s) of interest (the focus) and the population of the study. 2. Develop a detailed plan for collecting data. If you use a sample, make sure the sample is representative of the population. 3. Collect the data. 4. Describe the data, using descriptive statistics techniques. 5. Interpret the data and make decisions about the population using inferential statistics. 6. Identify any possible errors
Identifying Populations and Samples Determine whether each data set is a population or a sample a) The height of each player on a school’s basketball team b) The amount of energy collected from every wind turbine on a wind farm c) A survey of 500 spectators from a stadium with 42, 000 spectators d) The annual salary of each pharmacist at a pharmacy e) The cholesterol levels of 20 patients in a hospital with 100 patients
Identifying Populations and Samples Identify the Sample and the Population of Interest a) A survey of 1000 U. S. adults found that 59% think buying a home is the best investment a family can make. (Source: Rasmussen Reports) b) A study of 33, 043 infants in Italy was conducted to find a link between a heart rhythm abnormality and sudden infant death syndrome. (Source: New England Journal of Medicine) c) A survey of 1442 U. S. adults found that 36% received an influenza vaccine for the current flu season. (Source: Zogby International) d) A survey of 1600 people found that 76% plan on using the Microsoft Windows 7™ operating system at their businesses. (Source: Information Technology Intelligence Corporation and Sunbelt Software) e) A survey of 800 registered voters found that 50% think economic stimulus is the most important issue to consider when voting for Congress.
Methods of Collecting Data: Sample Surveys Objectives: Identify population parameters and sample statistics Identify sampling techniques as simple random, stratified, cluster, systematic, or convenience. Identify the sampling frame, sample and any potential biases.
Sample Surveys We would like to gather information on an entire population of individuals. We could perform a census. We examine a smaller group of individuals – a sample Survey- designed to ask questions of a small group of people in hope of learning something about the entire population
Population Parameters and Sample Statistics Definition Parameter- is a numerical description of a population characteristic Statistic- is a numerical description of a sample characteristic Distinguishing between Statistics and Parameters: Examples 1. A recent survey of 200 college career centers reported that the average starting salary for petroleum engineering majors is $83, 121. (Source: National Association of Colleges and Employers) 2. The 2182 students who accepted admission offers to Northwestern University in 2009 have an average SAT score of 1442. (Source: Northwestern University) 3. In a random check of a sample of retail stores, the Food and Drug Administration found that 34% of the stores were not storing fish at the proper temperature.
Population Parameters and Sample Statistics Determine whether the numerical value is a parameter or a statistic In 2009, Major League Baseball teams spent a total of $2, 655, 395, 194 on players’ salaries. Sixty-two of the 97 passengers aboard the Hindenburg airship survived its explosion. In January 2010, 52% of the governors of the 50 states in the United States were Democrats. In a survey of 300 computer users, 8% said their computers had malfunctions that needed to be repaired by service technicians. In a recent year, the interest category for 12% of all new magazines was sports.
Bias Sampling methods that by their nature, tend to over- or underemphasize some characteristic of the population are said to be biased Biased sampling methods tend to over- or underestimate parameters.
Random Sample: Each member of the population has an equal chance of being selected. Simple Random Sample: All samples of the same size are equally likely. xx x xxx x x x x x xx x x x x x x x xx x x xx x x x xx x üAssign a number to each member of the population. üRandom numbers can be generated by a random number table, software program or a calculator. üData from members of the population that correspond to these numbers become members of the sample.
Simple Random Sample (SRS) Sampling frame: A list of individuals from whom the sample is drawn Books: Appendix G pg. A-101
Bell Work 1/22 1) Tell whether the value given describes a parameter or a statistic a) The 2009 team payroll of the Philadelphia Phillies was $113, 004, 046. (Source: USA Today) b) In a survey of 752 adults in the United States, 42% think there should be a law that prohibits people from talking on cell phones in public places. (Source: University of Michigan) 2) Using the sequence of random number and reading from left to right. Generate a set of 4 numbers between 1 – 60. 71622 35940 81807 59225 71|62|23|59|40|81|80|75|92|25 {23, 59, 40, 25} 18192
Simple Random Sample (SRS) Sammy’s Salsa, a small local company, produces 240 jars of salsa a day. Each jar is imprinted with a code indicating the date and batch number. To help maintain consistency, at the end of each day, Sammy selects three jars of salsa, weighs the contents, and tastes the product. Help Sammy select the sample jars. Today’s jars are coded 07 N 61 through 07 N 300. a) Describe how you might set this up using a simple randomized sample. b) Show to use random numbers to pick 3 jars. 20639 28642 06962 08710 84395
Stratified Sampling Stratified Random Sampling- a sampling design where the population is divided into subpopulations, or strata, and random samples are then drawn from each stratum. Strata are homogenous groups (groups sharing some common characteristic) Examples of strata:
Cluster Sampling Cluster sample – A sampling design in which entire groups or clusters are chosen at random. Each cluster should be representative of the population, so all clusters should be heterogeneous Example:
Systematic Sample Choose a starting value at random. Then choose sample members at regular intervals. When there is no relationship between the order of the sampling frame and the variables of interest, a systematic sample can be representative
Bad or Biased Sampling Methods: Voluntary Response Sample and Convenience Sample Voluntary Response Sample – a large group of individuals is invited to respond all who respond are counted Convenience Sample - readily available members of the population are chosen for the sample
Identify the Sampling Method used You divide the student population with respect to majors and randomly select and question some students in each major. You assign each student a number and generate random numbers. You then question each student whose number is randomly selected. Using random digit dialing, researchers call 1400 people and ask what obstacles (such as childcare) keep them from exercising. Questioning students as they leave a university library, a researcher asks 358 students about their drinking habits. After a hurricane, a disaster area is divided into 200 equal grids. Thirty of the grids are selected, and every occupied household in the grid is interviewed to help focus relief efforts on what residents require the most. Every tenth person entering a mall is asked to name his or her favorite store.
Bell Work 1/24 1) From the sequence of random numbers, select 3 distinct numbers (no repeats) between 1 and 40, reading from left to right 56282 69928 14125 38872 2) The Web site www. gamefaqs. com asked, as their question of the day to which visitors to the site were invited to respond, “Do you ever use emoticons when you type online? ” Of the 87, 262 respondents, 27% said that they did not use emoticons. a) What kind of sample was this? b) How much confidence would you place in using 27% as an estimate of the fraction of people who use emoticons?
Simple Random Sample (SRS) A small sampling frame of administrators and teachers who work at CCHS is given in the box. Using the sequence of random numbers given, perform and SRS(Simple Random Sample) to select a sample of sample size 3. 83010 1 Mr. Warren 2 Dr. Crosby 3 Mr. Locklair 4 Mrs. Shipp 5 Mr. Mambou 6 Dr. Dixon 7 Mr. Allen 8 Mrs. Stroble 9 Mr. Thomas 10 Mr. Bordieanu 97601 89105 98803 Sample: Mr. Warren, Mr. Thomas, Mr. Bordieanu
Stratified Random Samples Divide the population into groups (strata) and select a random sample from each group. Strata could be age groups, genders or levels of education, for example. Sample 55
Stratified Sampling Working with the sampling frame, lets say that 20% of our schools employees are administrators and 80 % are teachers. Perform a Stratified Random Sample on Administrators and Teachers, with a sample size of 5. Administrators 1 Mr. Warren 2 Dr. Crosby 3 Mr. Locklair 4 Dr. Dixon Sequence of Random Numbers 96299 07196 Teachers 1 Mr. Allen 2 Mrs. Stroble 3 Mrs. Shipp 4 Mr. Thomas 5 Mr. Mambou 6 Mr. Bordieanu Sequence of Random Numbers 98642 20639 23185 Our sample: Dr. Crosby, Mr. Bordieanu, Mr. Thomas, Mrs. Stroble, Mrs. Shipp
Cluster Samples Divide the population into individual units or groups and randomly select one or more units. The sample consists of all members from selected unit(s). Cluster Sample:
Systematic Samples Choose a starting value at random. Then choose sample members at regular intervals. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx We say we choose every kth member. In this example, k = 5. Every 5 th member of the population is selected. 58
Identifying Sampling Methods Management at a retail store is concerned about the possibility of drug abuse by people who work there. They decide to check on the extent of the problem by having a random sample of the employees undergo a drug test. Several plans for choosing the sample are proposed. Name the sampling strategy in each. a. Randomly select an a store location and test all the people who work in that store- supervisors, full-time clerks, part-time clerks, and maintenance staff. b. Choose the fourth person that arrives to work for each shift. c. There are four employee classifications: supervisors, fulltime clerks, part-time clerks, and maintenance staff. Randomly select ten people from each category. d. Each employee has a three-digit identification number. Randomly choose 40 numbers.
Causes of Biases Under-coverage – part of the population is given less representation than it has in the population Non-Response Bias Voluntary Response Bias Sampling from a bad or incomplete Sampling Frame Response Bias
Response Bias: Examples of Biased Questions determine whether the survey question is biased. If the question is biased, suggest a better wording. Why does eating whole-grain foods improve your health? Why does text messaging while driving increase the risk of a crash? How much do you exercise during an average week? Why do you think the media have a negative effect on teen girls’ dieting habits? Do you think high school students should be required to wear uniforms? Given humanity’s great tradition of exploration, do you favor continued funding for space flights?
Bell Work 1/24 1) From the sample frame given in the box, perform an SRS (Simple Random Sample) of sample size 3, using the sequence of random numbers 1 Franklin 2 Elizabeth 3 Enrique 4 Steven 5 William 6 Jordan 7 Dean 8 Tishia 9 Ray Kwan 10 Chardesia 11 James 12 Sandra 13 Craig 14 Debra 07196 08607 41081 34125 38872 Our Sample: Dean, Tishia, Craig Identify whether the numerical value is a statistic or a parameter? a) The class average for a Probability and Statistics test, was an 81% b) In a survey of 880 students, 76% said they enjoyed having music played in class c) A sample conducted on U. S. voters, found that President Obama had a 50% approval rating d) Voter turnout in the 2012 election was 57. 5%
Identifying Sampling Methods a) We want to know what percentage of local doctors accept Medicaid patients. We call the offices of 50 doctors randomly selected from local Yellow Page listings. b) We want to know what percentage of local businesses anticipate hiring additional employees in the upcoming month. We randomly select a page in the Yellow Pages and call every business listed there c) We want to know if there is neighborhood support to turn a vacant lot into a playground. We spend a Saturday afternoon going door-todoor in the neighborhood, asking people to sign a petition. d) We want to know if students at our college are satisfied with the selection of food available on campus. We go to the largest cafeteria and interview every 10 th person in line.
Identifying Parameters and Statistics Occasionally, when I fill my car with gas, I figure out how many miles per gallon my car got. I wrote down those results after 6 fill-ups in the past few months. Overall, it appears my car gets 28. 8 miles per gallon. a) What statistic have I calculated? b) What is the parameter I’m trying to estimate? c) When the Environmental Protection Agency (EPA) checks a car like mine to predict its fuel economy, what parameter is it trying to estimate? During the course of your Statistics class, you are given 6 equally weighted tests. You have taken three tests so far, and calculated your average score to be 89%. a) What statistic have you calculated? b) What is the parameter you are trying to estimate? c) If you ask five of your friends in class how they did on the test, what parameter are you trying to estimate?
For the following reports on statistical studies, identify the following : a) b) c) d) e) The population parameter of interest The sampling frame The sampling Method, including whether or not randomization was employed Potential sources of bias a) Consumers Union asked all subscribers whether they had used alternative medical treatments and, if so, whether they had benefited from them. For almost all of the treatments, approximately 20% of those responding reported cures or substantial improvement in their condition. b) Researchers waited outside a bar they had randomly selected from a list of such establishments. They stopped every 10 th person who came out of the bar and asked whether he or she thought drinking and driving was a serious problem.
a) Population – all U. S. adults Parameter – proportion who have used and benefited from alternative medical treatments. Sampling Frame – all Consumers Union subscribers Sample – those subscribers who responded Method – not specified, but probably a questionnaire mailed to all subscribers Bias – nonresponse bias, specifically voluntary response bias. Those who respond may have strong feelings one way or another. b) Population – adults Parameter – proportion who think drinking and driving is a serious problem Sampling Frame – bar patrons Sample – every 10 th person leaving the bar Method – systematic sampling Bias – undercoverage. Those interviewed had just left a bar, and may have opinions about drinking and driving that differ from the opinions of the population in general.
For the following reports on statistical studies, identify the following : a) b) c) d) e) The population parameter of interest The sampling frame The sampling Method, including whether or not randomization was employed Potential sources of bias 1) A question posted on the Lycos Web site on 18 June 2000 asked visitors to the site to say whether they thought that marijuana should be legally available for medicinal purposes. (www. lycos. com) 2) A company packaging snack foods maintains quality control by randomly selecting 10 cases from each day’s production and weighing the bags. Then they open one bag from each case and inspect the contents.
1) Population – all U. S. adults Parameter – proportion that feels marijuana should be legalized for medicinal purposes Sampling Frame – none given –potentially all people with access to web site Sample – those visiting the web site who responded Method – voluntary response (no randomization employed) Bias – voluntary response sample. Those who visit the website and respond may be predisposed to a particular answer. 2) Population – snack food bags Parameter – proportion passing inspection Sampling Frame – all bags produced each day Sample – 10 bags, one from each of 10 randomly selected cases Method – multistage sampling. Presumably, they take a simple random sample of 10 cases, followed by a simple random sample of one bag from each case. Bias – no indication of bias
Identifying Sampling Method and Potential Sources of Bias In a large city school system with 20 elementary schools, the school board is considering the adoption of a new policy that would require elementary students to pass a test in order to be promoted to the next grade. The PTA wants to find out whether parents agree with this plan. Listed below are some of the ideas proposed for gathering data. For each, indicate what kind of sampling strategy is involved and what (if any) biases might result. a) Put a big ad in the newspaper asking people to log their opinions on the PTA Web site. b) Randomly select one of the elementary schools and contact every parent by phone. c) Send a survey home with every student, and ask parents to fill it out and return it the next day. d) Randomly select 20 parents from each elementary school. Send them a survey, and follow up with a phone call if they do not return the survey within a week.
Identifying Sampling Method and Potential Sources of Bias In a large city school system with 20 elementary schools, the school board is considering the adoption of a new policy that would require elementary students to pass a test in order to be promoted to the next grade. The PTA wants to find out whether parents agree with this plan. Listed below are some of the ideas proposed for gathering data. For each, indicate what kind of sampling strategy is involved and what (if any) biases might result. a) This is a voluntary response sample. Only those who see the ad, feel strongly about the issue, and have web access will respond. b) This is cluster sampling, but probably not a good idea. The opinions of parents in one school may not be typical of the opinions of all parents. c) This is an attempt at a census, and will probably suffer from nonresponse bias. d) This is stratified sampling. If the follow-up is carried out carefully, the sample should be unbiased.
Identifying Sampling Method and Potential Sources of Bias Four new sampling strategies have been proposed to help the PTA determine whether parents favor requiring elementary students to pass a test in order to be promoted to the next grade. For each, indicate what kind of sampling strategy is involved and what (if any) biases might result. a) Run a poll on the local TV news, asking people to dial one of two phone numbers to indicate whether they favor or oppose the plan. b) Hold a PTA meeting at each of the 20 elementary schools, and tally the opinions expressed by those who attend the meetings. c) Randomly select one class at each elementary school and contact each of those parents. d) Go through the district’s enrollment records, selecting every 40 th parent. PTA volunteers will go to those homes to interview the people chosen
Four new sampling strategies have been proposed to help the PTA determine whether parents favor requiring elementary students to pass a test in order to be promoted to the next grade. For each, indicate what kind of sampling strategy is involved and what (if any) biases might result. a) This sampling method suffers from voluntary response bias. Only those who see the show and feel strongly will call. b) Although this method may result in a more representative sample than the method in part, this is still a voluntary response sample. Only strongly motivated parents attend PTA meetings. c) This is multistage sampling, stratified by elementary school and then clustered by grade. This is a good design, as long as the parents in the class respond. There should be follow-up to get the opinions of parents who do not respond. d) This is systematic sampling. As long as a starting point is randomized, this method should produce reliable data.
- Slides: 72