Introduction to weighting February 28 2020 Nadi FIJI

Introduction to weighting February 28, 2020 Nadi, FIJI

Sampling In the simplest case…. …we select n units from a population of N units. From this sample, we want to produce estimates that are representative of the whole population, i. e. all N units. But how do we do that when we only have information for the n units in the sample?

Sampling weights or weights are: • used to ensure estimates produced from a sample are representative of the target population • positive values used to “rate-up” or adjust the data collected for each sample unit • calculated for and assigned to each unit in the sample • adjustments for different probabilities of selection between units in a sample, either due to the sample design or happenstance (e. g. nonresponse).

Sampling Weights •

Sampling Weights - Selection A typical weight for each household found in a household survey may be composed as follows: with sel representing selection, ps representing post -stratification, and nr representing non-response.

Probability of selection – SRS

Example: Probability of selection – SRS

Probability of selection – Stratified SRS

Example: Probability of selection – Stratified SRS Populati on (Nh) Sample Probability Selection size of weight (w (nh) selection sel) (Ph) Province 2, 000 1 200 Province 1, 000 2 200 =200/2, 000 =0. 1 or 10% =200/1, 000 =0. 2 or =2, 000/200 = 10 =1, 000/200 =5

Probability of selection – Two-stage selection

Probability of selection – 1 st stage

Example: Probability of selection – Two-stage selection

Probability of selection – 2 nd stage

Probability of selection – Two-stage selection

Selection weight – Two-stage selection

Sampling Weights – Non-response A typical weight for each household found in a household survey may be composed as follows: with sel representing selection, ps representing post -stratification, and nr representing non-response.

Non-Response • Nearly every survey has some degree of nonresponse for which we can make adjustments in the weights. • It is important to note that this is a non-response adjustment, not a “correction. ” Without perfect information on all variables, it is not possible to completely correct for non-response. • The best we can do is try to estimate the bias and make the best adjustment possible based on the information available.

Non-Response (2) Simplest form: If your cluster has 12 households, but one refuses, a simple calculation for the nr component of the weights would be => Each remaining household counts a bit more than 1 to make up for its missing neighbour. => Selection weight for each household in the cluster is adjusted upwards (by 9%) Weighting class: divides the data into cells (such as age x gender x geography) and assigns a correction factor based on the cell response.

Sampling Weights – Poststratification A typical weight for each household found in a household survey may be composed as follows: with sel representing selection, ps representing post -stratification, and nr representing non-response.

Post-Stratification Post stratification is generally the last step in the process and adjusts weighted totals to known population totals and has been shown in the literature to reduce overall variance. Example: Weighted Total Known Adjustment State Maryland Virginia DC Population from Survey Population 5, 245, 757 8, 475, 901 662, 842 5, 699, 478 7, 882, 590 599, 657 Factor* 1. 0865 0. 9300 0. 9047 Þ Selection weight for each person in Maryland is adjusted upwards (by approx. 9%) Þ Selection weights for each person in Virgina and DC are adjusted downwards (by approx. 7% and 10% respectively)

Reasons for Weighting Kish (1992) identifies six reasons for weighting survey data prior to analysis: 1. To reflect differential probabilities of selection (=> produce unbiased sample estimates) 2. To reduce biases introduced by errors in the sampling frame 3. To reduce bias introduced by non-response 4. To reduce sampling variance, by making use of auxiliary information (post-stratification) 5. To produce standardised, consistent estimates 6. To produce approximately unbiased estimates from a sample formed by combining other samples

Final Thoughts •

Extra slides

Trimming weights replaces outlier weights to reduce the variance of the resulting estimations. This causes some bias in the estimates and needs to be carefully considered against gains in precision.

Trimming (2) Trimming the outlier weights decreases the standard errors but biases the estimate. untrimmed 99 95 90 75 Mean 6. 899 6. 882 6. 875 6. 895 6. 964 Std. Err. 0. 434 0. 429 0. 421 0. 422 0. 386 CV 0. 878 0. 830 0. 760 0. 713 0. 544