Determining Subsampling Rates for Nonrespondents Rachel Harter NORC

Determining Subsampling Rates for Nonrespondents Rachel Harter, NORC at the University of Chicago Traci Mach, Board of Governors of the Federal Reserve John Wolken, Board of Governors of the Federal Reserve Janella Chapline, NORC at the University of Chicago

The views expressed herein are those of the authors. They do not necessarily reflect the opinions of the Federal Reserve Board or its staff.

Overview • Double Sampling • Examples • Methods for Determining Subsampling Rates • Illustrations using Survey of Small Business Finances • Comments

Introduction • Double sampling (Two Phase or Sequential Sampling) – Hansen and Hurwitz (1946) – Relatively inexpensive data collection method applied to a larger sample – More expensive follow-up method for a subsample

Introduction (cont. ) • Subsample phase 1 nonrespondents to: – Increase weighted response rates – Maintain response rates while reducing costs – Reduce nonresponse bias

Introduction (cont. ) • Subsampling affects – Variability in weights – Effective sample size – Number of completed cases – Costs

Introduction (cont. ) • Subsampling rate depends on – Design stage decision vs. late decision – Design objectives – Assumed/known parameters – Contractual constraints

Example 1 American Community Survey – Three modes of data collection: mail→ telephone→ in-person interview of subsample – Subsampling rates are based on expected completion rates at tract level – Typically subsample 1 -in-3 or 2 -in-3

Example 2 Chicago Health and Social Life Survey – Subsampled nonrespondents when the response rates were lower than expected – Subsampled 1 in 4

Example 3 National Survey of Family Growth 2006 – Used response propensity models to stratify segments – Subsampling rates varied by stratum to favor segments likely to yield more completed cases for lower cost

Example 4 General Social Survey 2004, 2006 – Nonrespondents subsampled to boost weighted response rates and control costs – Subsampled 45% as balance between unweighted number of completed cases and weighted response rate

General Guidelines for Subsampling Nonrespondents • Kish’s rule of thumb (1965) – Data collection in the second phase is at least 10 times the cost of phase one data collection on a per case basis in order to be economical.

General Guidelines for Subsampling Nonrespondents • Elliott et al. (2000) – Subsampling saves resources whenever the per-callback or per-interview cost is increasing with each attempt, or when the probability of a successful interview attempt is decreasing.

Hansen-Hurwitz Method (1946) • Basic strategy – Determine the sample needed to achieve the desired precision, assuming no nonresponse. – Assume cost structure and expected response rates for each phase are known. – Solve for the initial sample size n and subsampling rate f that minimize cost subject to the desired precision level.

Hansen-Hurwitz Method (cont. ) • Per-unit cost structure C = c 0 n + c 1 n 1 + c 2 n 2’ • Optimal subsampling rate f = sqrt{( c 0 + c 1 r 1) / (c 2 r 1)}

Hansen-Hurwitz Method (cont. ) • Drawbacks (Groves 1989) – – Takes into account sampling error only. Completion rate in phase 2 assumed to be high. Mode effects between phases are ignored. No distinction made between noncontacts and refusals. – Completion rates and cost structures are known in advance.

Deming Method (1953) • Goal – minimize cost for a specified mean squared error, or vice versa • Basic Strategy – All sample cases are attempted once – Use variance, nonresponse bias, and cost to determine the number of callback attempts – Subsample for the callback attempts

Deming Method (cont. ) • Mean square error of the estimator MSE = A + B/n + C/(nf) • Cost function Cost = Dn + Enf • Subsampling rate f = sqrt{CD/BE}

Deming Method (cont. ) • Drawbacks – Estimates of means and variances for each attempt are needed in advance for the MSE function. – Assumes cases are equally likely to respond on each attempt.

Elliott-Little-Lewitzky Method (2000) • Allow different response probabilities with each callback attempt and nonzero costs of refusals. • Define efficiency ratio as the cost under the subsampling approach to the cost under the full-callback approach. • Subsampling is effective when efficiency ratio<1.

Elliott-Little-Lewitzky Method (cont. ) • Basic strategy – Subsample at the mth callback attempt. – Total K callback attempts. – Find subsampling rate f that minimizes the efficiency ratio for the mth callback attempt. – Repeat for all values of m up to K. – Determine the values of m and f that minimize the efficiency ratio.

Alternative Constraints • Required Completed Cases • Required Response Rate • Keep Costs and Weighting Effect Within Limits, Given Completes

Required Completed Cases • nspec = n r 1 + n (1 -r 1) f r 2 • In the situation where r 1 and r 2 are fixed, f is determined by n. • To also minimize cost, f is either 0 or 1 depending on whether c 2>c 1 or vice versa.

Required Response Rate • The response rate is a function of r 1 and r 2, the completion rates for each phase—not the subsampling rate f. • Subsampling affects the response rate by redeploying funds to change r 2.

Weighting Effect and Cost Within Limits, Given Required Completes • Each phase may have multiple outcomes, and each outcome may have a different known cost. • The expected rates for each outcome are assumed known. • Determine initial sample size and cost to achieve completes without subsampling.

WEFF, Cost Constraints Given Completes (cont. ) • Determine increase in initial sample size to compensate for fewer completes with subsampling (function of f). • Determine cost with subsampling (function of f).

WEFF, Cost Constraints Given Completes (cont. ) • Ratio of cost with subsampling to cost without subsampling must be less than specified percentage. • Solve for acceptable range of f to meet cost reduction constraint.

WEFF, Cost Constraints Given Completes (cont. ) • Assuming equal base weights, weighting effect is a function of f. • Weighing effect must be less than specified value.

WEFF, Cost Constraints Given Completes (cont. ) • Solve for acceptable range of f. • Use the intersection of the cost range and the WEFF range for f (if the intersection is non-empty).

2003 Survey of Small Business Finances (SSBF) Two Types of data collection – Screener: respondents screened by phone after advance mailing – Main Interview: eligible businesses interviewed by phone, after sending worksheet

SSBF (cont. ) • Four batches/replicates of sample firms • Double sampling applied to both screener and main interview

Illustrations Using SSBF • Use screener cases in batch 2 • n=5, 666 total sample cases • 2, 838 completed the screener by the end of phase 1 (r 1 = 50%) • 1, 099 cases selected for phase 2 (f = 60%) • Additional 359 cases completed the screener in phase 2 (r 2=33%)

Hansen-Hurwitz Subsampling Rate • C = $. 98 n + $17. 48 n 1 + $26. 03 n’ 2 • f = 86%

Deming Subsampling Rate • Cost = $9. 72 n + $4. 29 n f • MSE = A + B/n + C/(nf) • f = sqrt{CD/BE} = sqrt{(C/B)(9. 72/4. 29)}

Subsampling Rate for Sample Size Constraint • nspec = n r 1 + n (1 -r 1) f r 2

Subsampling Rate for Sample Size Constraint • c 2 > c 1 • If cost and sample size were the only considerations, take a larger initial sample and set f = 0.

Discussion • Consider Bias and Variance Implications – Oh and Scheuren (1983) • Alternatives to Subsampling for Nonresponse – Politz and Simmons (1940) – Groves (1989) • Limitations of Cost/Error Models – Fellegi and Sunter (1974)

Next Step • Explore relationships among subsampling rates, cost redeployment, and response rates. • Relationships may be institutionspecific.

Contact Info: Harter-Rachel@norc. org