Indirect Sampling Jerilyn Boykin and Zhongxue Chen Indirect
Indirect Sampling Jerilyn Boykin and Zhongxue Chen
Indirect Sampling n n n Introduction: What is indirect sampling? Generalized Weight Sharing: A Unified Method Some specific cases q q q Cross Sectional Estimation (Ernst, 1989) Multiplicity Estimation (Sirken, 1970) Frames Containing Unknown Amount of Duplicity (Rao, 1968)
Indirect Sampling n Why Indirect Sampling q q Population frame is not available; Population frame is available; There is a relationship (link) between these two populations The Generalized Weight Sharing Method is a unified method indirect sampling developed by Lavallee (2002)
GWSM: Notation n n Population: Number of units: Label of units: j Selected sample: Link Matrix i
GWSM: Sampling n n n Sample from is the selection probability For each j in , identify the units i in such that The set Want to estimate
GWSM: Estimation n Let And Then
GWSM: Estimation n Horvitz-Thompson: q Let q HT-estimator:
GWSM: Variance n Variance estimation: q Let q Then q Variance: q Where
GWSM: Variance Estimation n Variance estimation (Horvitz-Thompson):
Specific Examples n n n Monroe G. Sirken 1970, Multiplicity Estimation J. N. K Rao 1968, Sampling a Frame with an Unknown Amount of Duplicity L. R. Ernst, Longitudinal Household Surveys
Household Surveys with Multiplicity (Sirken, 1970) n n n Estimate the number of individuals in population with certain attribute Complete frame is not available Sample households report information about their own residents as well as others persons who live elsewhere q q Relatives Neighbors
Multiplicity Rule n n n Other persons are specified by a “multiplicity rule” adopted in the survey Example: “siblings report each other” Total number of households in population reporting an individual is referred to as their multiplicity q Multiplicity of a person is number of different households in which he or one of his siblings is a resident.
Some Notation….
Some Notation…. n n n Consider the conventional survey indicator variable: if is a resident of otherwise
Some Notation…. n n n Consider the conventional survey indicator variable: if is a resident of otherwise not a resident but reported by otherwise
Some Notation…. n Number of individuals reported by in the conventional survey n Weighted number of individuals reported by n multiplicity is n where n or the multiplicity of in the survey with is the number of households reporting
Some Notation…. n Notice the variate based on multiplicity survey requires the multiplicity of every individual reported by household, .
The Estimators n Assume a sample of replacement, then households without n is the estimate of survey, and n is the estimate from the survey with multiplicity. derived from the conventional
Variance n The variances of and are, and n It follows that where n is the relative gain in sampling efficiency resulting from the survey with multiplicity.
Surveys with Multiplicity n n Household surveys w/ multiplicity are applicable whenever multiplicity rules can be devised that produce estimates having smaller MSE’s than those from conventional surveys Non-sampling error may be a problem
Sampling Theory When Frame Contains Unknown Amount of Duplicity (Rao, 1968) n n Arose in connection with a sample survey of beef cattle producers Beef cattle producing operation which could be operated by individual or partnership Frame was not available Frame of list of addresses of individuals believed to be beef cattle producers
Rao, 1968 n n n Questionnaire mailed to random sample of addresses then to a random sub-sample of non-respondents Respondents identified as partnerships were asked to give names and addresses of partners and only complete 1 questionnaire for the partnership Names and addresses were used to determine the number of times an operation was in list frame
Some Notation… n n n is number of names in sample that respond to mail questionnaire is number in nonresponse group Data are obtained by direct interview for random subsample of nonrespondents
Some Notation… n n is unknown number of beef operations covered by list frame is population total of a character attached to beef operations is the total attached to the operation is the number of addresses on the list frame and is the number of times the operation is listed on the list frame.
Some Notation… n Let and denote the and the sample operation contactable via the sample address of
The Estimator n Using the Hansen-Hurwitz estimator, and the fact that an unbiased estimate of can be obtained
The Estimator n n The unbiased estimator of Y is where and denote the number of distinct operations in the sample and subsample and are the number of times the operation appears in the sample and subsample
Variance n The variance for the estimator with multiplicity is given by,
The Estimator n n n Estimators that do not depend on and may be obtained Concept of sufficiency in sampling theory Very cumbersome for moderate to large sample sizes
Cross Sectional Estimation from Longitudinal Household Survey, Ernst (1989) n n n What happens to households and families over time Composition of households and families can change over time What weighting procedures should be used to obtain unbiased estimates
Ernst (1989) n n n Take a month to be a basic unit of time denotes a cross sectional universe of households is set of units residing in a household in Several rounds of interviews, at each month or interval of months Initial sample is taken at month Final interview month for sample panel is month
Ernst (1989) n n n Individual in a chosen household at month is an “original sample person” For each month all original sample people in plus all other people residing with original sample person Latter group of people are “associated sample people”
Longitudinal Household (LHH) n Each LHH is of the form n where is a given household at month Has two part definition n q q For any specify which if any can be in the same LHH What kind of LLH’s can exist in L
LHH n This paper considers the restriction that L consists of a cohort of LHH’s q q existence at month , the initial LHH’s, LHH formed after month, those generated by initial LHH’s
Obtaining Weights n Let n The unit has a known positive probability of being chosen would be estimated by , where n be the parameter of interest
Obtaining Weights n n n Subsequent LHH’s would only be in sample if at least one household member is an original sample person To use regular estimator we need to know those probabilities “Operationally impossible” to determine this probability q q Determine 1 st round HH for each member of current HH Compute probability at least one 1 st round HH was selected
Obtaining Weights n n In order for estimator to be unbiased it is only necessary that Let M be the individuals in Let denote the probability that the individual’s household is in sample at month Their associated weight is
Obtaining Weights n For the ith LHH associate a set of constants independent of and n The weight of the ith LHH is
References n n n Lavallee, P and Caron, P. Estimation using the generalised weight share method: the case of record linkage. http: //www. statcan. ca/english/ads/12 -001 XXPB/pdf/27_2_lavallee_e. pdf Jean-Claude Deville and Myriam Maumy. A new survey methodology for describing tourism activities and expanses http: //www. tourismforum. scb. se/papers/Papers. Selected/CS/Pap er 33 FRANCE/Deville_Maumy_article. pdf Ernst L. R. (1989) Weighting issues for longitudinal household ans family estimates. In Panel Surveys Rao, J. N. K(1968). Some nonresponse sampling theory when the frame contains unknown amount of duplication. Journal of American Statistical Association Sirken M. G. (1970). Household surveys with multiplicity. Journal of the American Statistical Association
- Slides: 39