# Sampling Sampling Probability Sampling Based on random selection

Sampling

Sampling Probability Sampling Based on random selection Non-probability sampling Based on convenience

Sampling Miscues: Alf Landon for President (1936) Literary Digest: post cards to voters in 6 states Correctly predicting elections from 1920 -1932 Names selected from telephone directories and automobile registrations In 1936, they sent out 10 million post cards Results pick Landon 57% to Roosevelt 43% Election: Roosevelt in the largest landslide Roosevelt 61% of the vote and 523 -8 in Elect. Col. Why so inaccurate? : Poor sampling frame Leads to selection of wealthy respondents

Sampling Miscues: Thomas E. Dewey for President (1948) Gallup uses quota sampling to pick winner 1936 -1944 Quota sampling: matches sample characteristics to characteristics of population Gallup quota samples on the basis of income In 1948, Gallup picked Dewey to defeat Truman Reasons: 1. Most pollsters quit polling in October 2. Undecided voters went for Truman 3. Unrepresentative samples—WWII changed society since census

Non-probability Sampling In situations where sampling frame for randomization doesn’t exist Types of non-probability samples: 1. Reliance on available subjects convenience sampling 2. Purposive or judgmental sampling 3. Snowball sampling 4. Quota sampling

Reliance on Available Subjects Person on the street, easily accessible Examples: Mall intercepts, college students, person on the street Frequently used, but usually biased Notoriously inaccurate Especially in making inferences about larger population

Purposive or Judgmental Sampling Dictated by the purpose of the study Situational judgments about what individuals should be surveyed to make for a useful or representative sample E. g. , Using college students to study third-person effects regarding rap and metal music 3 pe: Others are more affected by exposure than self Assessing effects on self and others Using college students makes for homogeneity of self

Snowball Sampling Used when population of interest is difficult to locate E. g. , homeless people Research collects data from of few people in the targeted group Initially surveyed individuals asked to name other people to contact Good for exploration Bad for generalizability

Quota Sampling Begins with a table of relevant characteristics of the population Proportions of Gender, Age, Education, Ethnicity from census data Selecting a sample to match those proportions Problems: 1. Quota frame must be accurate 2. Sample is not random

Probability Sampling Goal: Representativeness Sample resembles larger population Random selection Enhancing likelihood of representative sample Each unit of the population has an equal chance of being selected into the sample

Population Parameters Parameter: Summary statistic for the population E. g. , Mean age of the population Sample is used to make parameter estimates E. g. , Mean age of the sample Used as an estimate of the population parameter

Sampling Error Every time you draw a sample from the population, the parameter estimate will fluctuate slightly E. g. : Sample 1: Mean age = 37. 2 Sample 2: Mean age = 36. 4 Sample 3: Mean age = 38. 1 If you draw lots of samples, you would get a normal curve of values

Normal Curve of Sample Estimates Frequency of estimated means from multiple samples Likely population parameter Estimated Mean

Standard Error The average distance of sample estimates from the population parameter 68% of sample estimates will fall within in one standard error of the population parameter

Normal Curve of Sample Estimates Frequency of estimated means from multiple samples Population parameter 1 standard error unit Estimated Mean

Normal Curve of Sample Estimates 2/3 of samples Frequency of estimated means from multiple samples Population parameter 1 standard error unit Estimated Mean

Standard Error Estimates and Sample Size As the sample size increases: The standard error decreases In other words, are sample estimate is likely to be closer to the population parameter As the sample size increases, we get more confident in our parameter estimate

Confidence Levels Two thirds of samples will fall within the standard error of the population parameter Therefore: a single sample has a 68% chance of being within the standard error Confidence levels: 68% sure estimate is within 1 s. e. of parameter 95% sure estimate is within 2 s. e. of parameter 99% sure estimate is within 3 s. e. of parameter

Confidence Interval width at which we are 95% confident contains the population parameter For example, we predict that Candidate X will receive 45% of the vote with a 3% confidence interval We are 95% sure the parameter will be between: 42% and 48% Confidence interval shrinks as: Standard error is smaller Sample size is larger

Sample Size & Confidence Interval How precise does the estimate have to be? More precise: larger sample size Larger samples increase precision But at a diminishing rate Each unit you add to your sample contributes to the accuracy of your estimate But the amount it adds shrinks with additional unit added

95% Confidence Intervals Sample Size % split N= 100 N= 200 N= 300 N= 400 N= 500 N= 700 N= 1000 N= 1500 50/50 10. 0 7. 1 5. 8 5. 0 4. 5 3. 8 3. 2 2. 6 70/30 9. 2 6. 5 5. 3 4. 6 4. 1 3. 5 2. 9 2. 4 90/10 6. 8 4. 2 3. 5 3. 0 2. 7 2. 3 1. 9 1. 5

Sampling Frame List of units from which sample is drawn Defines your population E. g. , List of members of organization or community Ideally you’d like to list all members of your population as your sampling frame Randomly select your sample from that list Often impractical to list entire population

Sampling Frames for Surveys Limitations of the telephone book: Misses unlisted numbers Class bias: Poor people may not have phone Less likely to have multiple phone lines Most studies use a technique such as Random Digit Dialing as a surrogate for a sampling frame

Types of Sampling Designs Simple Random Sampling Systematic Sampling Stratified Sampling Multi-stage Cluster Sampling

Simple Random Sampling Establish a sampling frame A number is assigned to each element Numbers are randomly selected into the sample

Systematic Sampling Establish sampling frame Select every kth element with random start E. g. , 1000 on the list, choosing every 10 th name yields a sample size of 100 Sampling interval: standard distance between units on the sampling frame Sampling interval = population size / sample size Sampling ratio: proportion of population that are selected Sampling ratio = sample size / population size

Stratified Sampling Modification used to reduce potential for sampling error Research ensures that certain groups are represented proportionately in the sample E. g. , If the population is 60% female, stratified sample selects 60% females into the sample E. g. , Stratifying by region of the country to make sure that each region is proportionately represented

Two Methods of Stratification 1. Sort population in groups Randomly select within groups in proportion to relative group size 2. Sort population into groups Systemically select within groups using random start Disproportionate stratification: Some stratification groups can be over-sampled for sub- group analysis Samples are then weighted to restore population proportions

Cluster Sampling Frequently, there is no convenient way of listing the population for sampling purposes E. g. , Sample of Dane County or Wisconsin Hard to get a list of the population members Cluster sample Sample of census blocks List of people for selected census block Select sub-sample of people living on each block

Multi-stage Cluster Sample Cluster sampling done in a series of stages: List, then sample within Example: Stage 1: Listing zip codes Randomly selecting zip codes Stage 2: List census blocks within selected zip codes Randomly select census blocks Stage 3: List households on selected census blocks Randomly select households Stage 4: List residents of selected households Randomly select person to interview

Multi-stage Sampling and Sampling Error is introduced at each stage One solution is to use stratification at each stage to try to reduce sampling error

- Slides: 32