STATISTICS NETHERLANDS STATISTICS NORWAY Partial donor imputation with

  • Slides: 15
Download presentation
STATISTICS NETHERLANDS – STATISTICS NORWAY Partial (donor) imputation with adjustments Jeroen Pannekoek and Li-Chun

STATISTICS NETHERLANDS – STATISTICS NORWAY Partial (donor) imputation with adjustments Jeroen Pannekoek and Li-Chun Zhang Work Session on Statistical Data Editing Ljubljana, Slovenia, 9 -11 May 2011 CBS-SSB

Contents • The problem of inconsistent micro-data • Simple solutions and there limitations •

Contents • The problem of inconsistent micro-data • Simple solutions and there limitations • More general approaches CBS - SSB

Example x 1 = Variable x 1: Profit Response II x 5 - x

Example x 1 = Variable x 1: Profit Response II x 5 - x 8 Donor values x 3 330 330 + 20 25 20 x 4 x 3: Turnover main 1000 x 5 x 4: Turnover other 30 30 30 x 5: Turnover total 950 1030 x 6: Wages 500 550 500 + x 7: Other costs 200 200 x 7 X 8: Total costs 700 700 x 8 x 2: Employees CBS - SSB x 6

Simple solutions (for response pattern I) • Prorating Edit 1: Turnover = Profit +

Simple solutions (for response pattern I) • Prorating Edit 1: Turnover = Profit + Total Costs 950 ≠ 330 + 700 multiply imputations by 950 /(330+700)=0. 92 Edit 2: Total costs = Wages + Other costs 0. 92*700 ≠ 500 + 200 multiply r. h. s. by 0. 92 • Ratio adjustment (ratio imputation) with R = Turnover main (donor) / Turnover main (observed). In this case the same results as for prorating except that Employees, that doesn't appear in any edit rule is also adjusted. CBS - SSB

Problems with single constraint adjustments Consider response pattern II Edit violations E 1: Turnover

Problems with single constraint adjustments Consider response pattern II Edit violations E 1: Turnover ≠ Profit + Total costs E 2: Total costs ≠ Wages + Other costs Option: 1. Adjust Profit and Total costs to fit E 1. 2. For the resulting value of Total costs adjust Other costs to fit E 2. Problems: - Order does matter, different solution if we do it the other way around - Information on Wages is not used in adjusting Total costs - Infeasible solutions for adjusted Total costs do occur (adjusted Total costs < Wages) CBS - SSB

Edit constraints as a system of equations For the vector of values x the

Edit constraints as a system of equations For the vector of values x the constraints are Ex=0 with Each row of E is a constraint and the columns correspond to the variables. Constraints E 1 and E 2 are linked because they have variable x 5 (Turnover total) in common. E 2 and E 3 are also linked (through E 1). CBS - SSB

An optimization approach Change the values of the imputed variables such that: • Edit

An optimization approach Change the values of the imputed variables such that: • Edit rules are satisfied • Change is as small as possible Formally, find an adjusted data vector x. A such that: x. A = arg min D(x. A , x) s. t. Ex. A ≤ 0 means that we consider both equalities and inequalities. CBS - SSB

Distance functions Least Squares : (LS) Σi (xi – xi. A)2 Weighted Least Squares

Distance functions Least Squares : (LS) Σi (xi – xi. A)2 Weighted Least Squares : (WLS) Σi wi (xi – xi. A)2 Kullback-Leibler Divergence: (KL) Σi xi (ln xi – ln xi. A) CBS - SSB

Adjustments models 1/2 • Least squares(LS): D= Σi (xi – xi. A)2 xi. A

Adjustments models 1/2 • Least squares(LS): D= Σi (xi – xi. A)2 xi. A = xi + Σkekiαk Additive adjustments: total adjustment for a variable is a sum of adjustments to each of the constraints. The same adjustment parameter (αk ) for all variables in constraint k. • Weighted least squares (WLS): D=Σi wi (xi – xi. A)2 xi. A = xi + (1/wi)Σkekiαk Additive adjustments but amount of adjustment varies according to the weights. CBS - SSB

Adjustments models 2/2 • Kullback-Leibler Divergence (KL): D=Σi xi (ln xi – ln xi.

Adjustments models 2/2 • Kullback-Leibler Divergence (KL): D=Σi xi (ln xi – ln xi. A) xi. A = xi × Πkexp(ekiαk) Factor can be written as βk if eki =1 and 1/ βk if eki = -1 Multiplicative adjustments, the total adjustment to a variable is the product of adjustments to each constraint. The same multiplicative adjustment parameter β for all variables in constraint k. It can be shown that for weights 1/xi KL ≈ WLS. CBS - SSB

Algorithm Simple iterative procedures exists to estimate the adjustments for general convex distances. Adjust

Algorithm Simple iterative procedures exists to estimate the adjustments for general convex distances. Adjust the x-vector to each constraint one by one. This series of single constraint adjustments are easy to perform. After all constraints are visited one iteration is completed. Repeat. • For sum-to-total constraints and KL-divergence equivalent to repeated prorating and Iterative Proportional Fitting • But, more general constraints: differences, linear inequalities, interval constraints. • And more general distances and confidence weights CBS - SSB

The generalized ratio approach 1/2 • Methods so far adjust only variables that appear

The generalized ratio approach 1/2 • Methods so far adjust only variables that appear in edit constraints. Aim is only to satisfy “hard” edits. • Inconsistencies between imputed and observed values indicate a difference between the donor record and receptor record. Therefore: adjust all donor values to better fit the receptor record. • For response pattern I, with only Turnover total observed, all donor values were multiplied by the ratio Observed/Donor Turnover. Thus rescaling with a measure of “size”. CBS - SSB

The generalized ratio approach 2/2 As a generalisation we propose the following componentwise multiplicative

The generalized ratio approach 2/2 As a generalisation we propose the following componentwise multiplicative adjustments xi. A = xiδi The δi are determined by minimizing their variance subject to the resulting adjusted record satisfying the edit constraints. • Adjustments are as uniform as possible as with ratioimputation. • But, all kinds of constraints can be satisfied. CBS - SSB

Example revisited (response pattern II) Variable Imputed unadj. LS adjust. WLS / KL Gen.

Example revisited (response pattern II) Variable Imputed unadj. LS adjust. WLS / KL Gen. ratio 330 260 249 239 25 25 x 3: Turnover main 1000 960 922 921 x 4: Turnover other 30 -10 28 29 x 5: Turnover total 950 950 x 6: Costs wages 550 550 x 7: Costs other 200 140 151 161 X 8: Costs total 700 690 701 711 x 1: Profit x 2: Employees CBS - SSB

Concluding remarks Optimization approach to solving inconsistency problems. • Simultaneous adjustment to all constraints

Concluding remarks Optimization approach to solving inconsistency problems. • Simultaneous adjustment to all constraints • Generalizes prorating and ratio adjustment for single constraints • Minimum distance approach that aims at consistency with minimum (optimal) adjustments. • Generalized ratio approach, aims to better preserve the structure of the imputed record as in ratioimputation. CBS - SSB