Journalism 614 Sampling and NonResponse Sampling Probability Sampling

  • Slides: 36
Download presentation
Journalism 614: Sampling and Non-Response

Journalism 614: Sampling and Non-Response

Sampling ¨ Probability Sampling – Based on random selection ¨ Non-probability sampling – Based

Sampling ¨ Probability Sampling – Based on random selection ¨ Non-probability sampling – Based on convenience

Sampling Miscues: Alf Landon for President (1936) ¨ Literary Digest: post cards to voters

Sampling Miscues: Alf Landon for President (1936) ¨ Literary Digest: post cards to voters in 6 states – Correctly predicting elections from 1920 -1932 • Names selected from telephone directories and automobile registrations – In 1936, they sent out 10 million post cards • Results pick Landon 57% to Roosevelt 43% – Election: Roosevelt in the largest landslide • Roosevelt 61% of the vote and 523 -8 in Elect. Col. ¨ Why so inaccurate? : Poor sampling frame – Leads to selection of wealthy respondents

Sampling Miscues: Thomas E. Dewey for President (1948) ¨ Gallup picks winner 1936 -1944

Sampling Miscues: Thomas E. Dewey for President (1948) ¨ Gallup picks winner 1936 -1944 – Use quota sampling: • matches sample characteristics to population – Gallup quota samples on the basis of income ¨ In 1948, Gallup picked Dewey to defeat Truman – Reasons: • 1. Most pollsters quit polling in October • 2. Undecided voters went for Truman • 3. Unrepresentative samples—WWII changed society since census

Non-probability Sampling ¨ In situations where sampling frame for randomization doesn’t exist ¨ Types

Non-probability Sampling ¨ In situations where sampling frame for randomization doesn’t exist ¨ Types of non-probability samples: – 1. Reliance on available subjects • convenience sampling – 2. Purposive or judgmental sampling – 3. Snowball sampling – 4. Quota sampling

Reliance on Available Subjects ¨ Person on the street, easily accessible ¨ Examples: –

Reliance on Available Subjects ¨ Person on the street, easily accessible ¨ Examples: – Mall intercepts, college students, e-polls ¨ Frequently used, but usually biased ¨ Notoriously inaccurate – Especially in making inferences about larger population, even with many respondents

Purposive or Judgmental Sampling ¨ Dictated by the purpose of the study – Situational

Purposive or Judgmental Sampling ¨ Dictated by the purpose of the study – Situational judgments about what individuals should be surveyed to make for a useful or representative sample • E. g. , Using college students to study third-person effects regarding rap and metal music – 3 pe: Others are more affected by exposure than self • Assessing effects on self and others – Using college students makes for homogeneity of self

Snowball Sampling ¨ Used when population of interest is difficult to locate – E.

Snowball Sampling ¨ Used when population of interest is difficult to locate – E. g. , homeless people, meth addicts ¨ Research collects data from of few people in the targeted group – Initially surveyed individuals asked to name other people to contact • Good for exploration • Bad for generalizability

Quota Sampling ¨ Begins with a table of relevant characteristics of the population –

Quota Sampling ¨ Begins with a table of relevant characteristics of the population – Proportions of Gender, Age, Education, Ethnicity from census data – Selecting a sample to match those proportions ¨ Problems: – 1. Quota frame must be accurate – 2. Sample is not random, but can be representative

Probability Sampling ¨ Goal: Representativeness – Sample resembles larger population ¨ Random selection –

Probability Sampling ¨ Goal: Representativeness – Sample resembles larger population ¨ Random selection – Enhancing likelihood of representative sample – Each unit of the population has an equal chance of being selected into the sample

Population Parameters ¨ Parameter: Summary statistic for the population – E. g. , Mean

Population Parameters ¨ Parameter: Summary statistic for the population – E. g. , Mean age of the population ¨ Sample allows parameter estimates – E. g. , Mean age of the sample • Used as an estimate of the population parameter

Sampling Error ¨ Every time you draw a sample from the population, the parameter

Sampling Error ¨ Every time you draw a sample from the population, the parameter estimate will fluctuate slightly – E. g. : • Sample 1: Mean age = 37. 2 • Sample 2: Mean age = 36. 4 • Sample 3: Mean age = 38. 1 ¨ If you draw lots of samples, you would get a normal curve of values

Normal Curve of Sample Estimates Frequency of estimated means from multiple samples Likely population

Normal Curve of Sample Estimates Frequency of estimated means from multiple samples Likely population parameter Estimated Mean

Error and Sample Size ¨ As the sample size increases: – The error decreases

Error and Sample Size ¨ As the sample size increases: – The error decreases – In other words, large sample estimate is likely to be closer to the population parameter – As the sample size increases, we get more confident in our parameter estimate

Confidence Interval ¨ Interval width at which we are 95% confident the estimate contains

Confidence Interval ¨ Interval width at which we are 95% confident the estimate contains the population parameter ¨ For example, we predict that Candidate X will receive 45% of the vote with a 3% confidence interval – We are 95% sure the parameter will be between 42% and 48% – The “margin of error” in a poll ¨ Confidence interval shrinks as: – Error is smaller – Sample size is larger

Sample Size & Confidence Interval ¨ How precise does the estimate have to be?

Sample Size & Confidence Interval ¨ How precise does the estimate have to be? – More precise: larger sample size ¨ Larger samples increase precision – But at a diminishing rate – Each unit you add to your sample contributes to the accuracy of your estimate • But the amount it adds shrinks with additional unit added

95% Confidence Intervals Sample Size % split N= 100 N= 200 N= 300 N=

95% Confidence Intervals Sample Size % split N= 100 N= 200 N= 300 N= 400 N= 500 N= 700 N= 1000 N= 1500 50/50 10. 0 7. 1 5. 8 5. 0 4. 5 3. 8 3. 2 2. 6 70/30 9. 2 6. 5 5. 3 4. 6 4. 1 3. 5 2. 9 2. 4 90/10 6. 8 4. 2 3. 5 3. 0 2. 7 2. 3 1. 9 1. 5

Describe Sampling Frame ¨ List of units from which sample is drawn – Defines

Describe Sampling Frame ¨ List of units from which sample is drawn – Defines your population – E. g. , List of members of population ¨ Ideally you’d like to list all members of your population as your sampling frame – Randomly select your sample from that list ¨ Often impractical to list entire population

Sampling Frames for Surveys ¨ Limitations of the telephone book: – Misses unlisted numbers/mobile

Sampling Frames for Surveys ¨ Limitations of the telephone book: – Misses unlisted numbers/mobile numbers – SES and age bias: • Poor people may not have phone • Less likely to have multiple phone lines • Young people have mobile phone numbers ¨ Most studies use a technique such as Random Digit Dialing as a way around this

Types of Sampling Designs ¨ Simple Random Sampling ¨ Systematic Sampling ¨ Stratified Sampling

Types of Sampling Designs ¨ Simple Random Sampling ¨ Systematic Sampling ¨ Stratified Sampling ¨ Multi-stage Cluster Sampling

Simple Random Sampling ¨ Establish a sampling frame – A number is assigned to

Simple Random Sampling ¨ Establish a sampling frame – A number is assigned to each element – Elements randomly selected into the sample – Use a random number generator to select every case you need for inclusion.

Systematic Sampling ¨ Establish sampling frame – Select every kth element with random start

Systematic Sampling ¨ Establish sampling frame – Select every kth element with random start – E. g. , 1000 on the list, choosing every 5 th name yields a sample size of 200 ¨ Sampling interval: standard distance between units for the sampling frame – Sampling interval = pop. size / sample size ¨ Sampling ratio: proportion of pop. selected – Sampling ratio = sample size / population size

Stratified Sampling ¨ Modification used to reduce potential for sampling error – Research ensures

Stratified Sampling ¨ Modification used to reduce potential for sampling error – Research ensures that certain groups are represented proportionately in the sample • E. g. , If the population is 60% female, stratified sample selects 60% females into the sample • E. g. , Stratifying by region of the country to make sure that each region is proportionately represented

Cluster Sampling ¨ Frequently, there is no convenient way of listing the population for

Cluster Sampling ¨ Frequently, there is no convenient way of listing the population for sampling – E. g. , Sample of Dane County or Wisconsin • Hard to get a list of the population members ¨ Cluster sample – Sample of census blocks • List of census blocks, list people for selected blocks • Select sub-sample of people living on each block

Multi-stage Cluster Sample ¨ Cluster sampling done in a series of stages: – List,

Multi-stage Cluster Sample ¨ Cluster sampling done in a series of stages: – List, then sample within ¨ Example: – Stage 1: Listing zip codes • Randomly selecting zip codes – Stage 2: List census blocks within selected zip codes • Randomly select census blocks – Stage 3: List households on selected census blocks • Randomly select households – Stage 4: List residents of selected households • Randomly select person to interview

Nonresponse ¨ Declining contact and cooperation rates – Especially for “gold standard” RDD National

Nonresponse ¨ Declining contact and cooperation rates – Especially for “gold standard” RDD National Telephone Surveys ¨ Early research suggests the issues are rather small, with little bias on results – Examined by comparing “easy to contact” individuals to “hard to contact” – More systematic version is to compare between standard 5 -day and “rigorous” survey

Accelerating Problem ¨ Survey firms reporting increasingly high rates of non-contact and non-cooperation –

Accelerating Problem ¨ Survey firms reporting increasingly high rates of non-contact and non-cooperation – Americans leading increasingly busy lives – More and more unsolicited calls to home – Sophisticated technologies to avoid calls ¨ Big drop offs in last 15 -20 years – Call screening (I only take known callers) – Cell phones (I pay for minutes during survey)

Hard to Gauge the Effect ¨ Initial work conducted in late 90 s ¨

Hard to Gauge the Effect ¨ Initial work conducted in late 90 s ¨ Curtain et al - Low effort “restricted call” design versus high effort “all call” design – See no difference in population estimates ¨ Keeter et al – Two parallel surveys, one using standard 5 day vs. “rigorous” – On average, a two percentage point difference ¨ Seem to suggest that lower response rate does not effect survey quality

Non-response in this Century ¨ Lot has changed in last decade + – More

Non-response in this Century ¨ Lot has changed in last decade + – More legislative restrictions – More mobile technologies – More VOIP technologies ¨ Re-ran the study and found similar results comparing 5 -day and rigorous – 5 -day – 10 call backs, one refusal conversion – Rigorous – 21 weeks, advance letters, left messages, additional call backs, etc. – Little difference in findings

The Problem of Cell Phones ¨ In 2006, 13% of cell phone only HHs

The Problem of Cell Phones ¨ In 2006, 13% of cell phone only HHs – Increasing 1% every six months 2003 -2006 – Increasing 2% every six months after 2006 • By 2015, 46% of U. S. Adults Live In Cellphone-Only HHs • 64% of Millennials (born 1977 -1994) are Cellphone-Only ¨ Bias in terms of who is missed is most prominent among young people. – “Serious coverage problem” – “Particular challenge”

Big differences in wireless only HHs by Age and SES Only 16% of those

Big differences in wireless only HHs by Age and SES Only 16% of those 65+ Nearly 70% among 25 -29 Just over 40% among “not poor” Over 50% for “near poor” Nearly 60% for the “poor” This creates systematic biases

Some substantial differences ¨ Big differences between cell and non-cell respondents to a range

Some substantial differences ¨ Big differences between cell and non-cell respondents to a range of questions ¨ Especially for issues that affect younger people, and behaviors such as voting – Register to vote? – Political knowledge? – Media usage?

Strive for Higher Response Rate

Strive for Higher Response Rate

To Achieve a High Response Rate ¨ Incentives: gifts or drawings for completion –

To Achieve a High Response Rate ¨ Incentives: gifts or drawings for completion – prize drawing vs smaller incentives to alll – donating to a charity as an inducement – Run experiments to see which incentive works ¨ Online: pre-invitation and landing page – Test different versions to see what encourages respondents to click on survey link (online) ¨ Reminders and follow-ups to boost response – In general, you only want to send one or two email reminders. ¨ Make online survey friendly for all devices/browsers. – Make it usable on mobile or tablet. See if there is a high bounce rate from particular devices