1 A prediction approach to representative sampling Ib

1 A prediction approach to representative sampling Ib Thomsen & Li-Chun Zhang Statistics Norway E-mail: lcz@ssb. no

The birth of representative method • Kruskal and Mosteller (1979 a, b, c): origins and development • of the concept representative sampling N. Kiær’s representative method (ISI meeting, 1895, Bern) – A three-stage design, with 1890 census as frame: w 1 st: 128 counties and 23 towns throughout the country w 2 nd: cohorts of males of age 17, 22, 27, 32, etc. w 3 rd: persons with surname initial A, B, C, L, M, N – Comparison of sample marginal averages with census averages • ISI committee in 1924 & report at the following meeting: “I • think I may venture to say that nowadays there is hardly one statistician, who in principle will contest the legitimacy of the representative method”. (Jensen) Bowley (1926) member of the committee.

Rise and fall of the representative method: Balance vs. randomization • Kiær did not take a probabilistic point of view. – Representative sample surveys instead of representative sampling – Idea of variability of population over time (quote) – Miniature population multivariate simple balance • Design-based approach: – Neyman (1934): representative sampling = randomization (quote) – Subsequent development: Hansen & co. , Deming, Kish, Cochran, Mahalanobis, etc. – Godambe (1955): no minimum variance linear estimator – Representative sampling vs. efficient estimation • Prediction approach: – Royall (1970): purposive sample – Royall and Eberhardt (1975): Simple balance for bias protection (quote) – Representative sample vs. efficiency

A definition of representative sampling from a prediction point of view • • Prediction of each individual in the population • • Conditional IMSEP: zero inside the sample, positive outside • Control of individual prediction as a design criterion, i. e. Representative sampling connected to individual mean squared error of prediction (IMSEP), i. e. Use randomization design to control unconditional IMSEP, i. e. expected amount of information about each population unit.

An example under ratio model

Motivating familiar but seemingly unconnected sampling techniques from a unified point of view • • • Constant mean and variance throughout the population: equal prediction epsem/SRS Constant mean and variance in subpopulation groups: stratified equal prediction stratified epsem/SRS; relative equal prediction w. r. t. individual variance stratified epsem/SRS with proportional allocation Business survey: – Division of take-all, take-some and take-none units – Stratified SRS with progressive allocation • Two-stage sampling: – PPS-SRS and SRS-SRS are equal prediction designs, respectively, provided zero or unity intra-cluster correlation – Stratified SRS-SRS with progressive first-stage allocation

Three principle advantages of CIP as a design-criterion • Model-based inference as a mode of inference – Prediction of individual impossible under design-based perspective • Randomization designs motivated by prediction – Simple random sampling (SRS) unmotivated for efficiency – SRS yields non-informative sampling, but so can any randomization. – SRS targets at simple balance, but it is not effective for that. • Combination with optimality/efficiency for total (OPT) – Need for population totals – Need for socio-economic micro-data – Need for statistics at more detailed levels