SHARE data versions IDs Stephanie Stuck MEA Antwerpen

  • Slides: 10
Download presentation
SHARE data versions & IDs Stephanie Stuck MEA Antwerpen February 2008 Mannheim Research Institute

SHARE data versions & IDs Stephanie Stuck MEA Antwerpen February 2008 Mannheim Research Institute for the Economics of Aging www. mea. uni-mannheim. de

Data versions and ID-variables who gets the id Household ID all households Data cleaning

Data versions and ID-variables who gets the id Household ID all households Data cleaning Publications internal version public version sampid 2 (scrambled version of sampid) all household members (in CV), that means: non-eligible persons get a cvid, too, e. g. children, other people living in the household cvid should be used to merge modules within waves all eligible household members (in CV), that means: all household members that should be interviewed, e. g. respondents and partners even if partners areyounger than 50 years respid should be used to merge waves Person ids 2

Data versions and ID-variables Household ID Download site Data cleaning Publications internal version public

Data versions and ID-variables Household ID Download site Data cleaning Publications internal version public version sampid 2 (scrambled version of sampid) country specific versions all countries Raw version (updates during fieldwork and ‘shortly’ after fieldwork) Cent. ERdata site http: //cdata 8. uvt. nl/share/version 2. 7/ public website www. share-project. org & internal website (data for working groups) Corrected versions during cleaning process new internal SHARE site (not yet available) Available for respective country team, Cent. ERdata, MEA working groups, external users 3

sampid rules (old) Ø Digits 1 -2: country code (e. g. 23 for Belgium

sampid rules (old) Ø Digits 1 -2: country code (e. g. 23 for Belgium French speaking) Ø Digits 3 -5: wave indicator (042 for wave 1 and 062 for wave 2 main survey) Ø Digits 6 -11: household ID Ø Digits 12 -13: longitudinal household split indicator 00 by default, if respondent moves out based on respid, e. g. if ‘moving out respondent’ has respid 01 it is changed to 01 Examples 1104200010000: Austria, starting in wave 1 (longitudinal sample) 2306214010300: Belgium (French), starting in wave 2 (refresher) è One needs to combine sampid with the respondent ID (respid) to identify and merge cases on the respondent level è Merging problems esp. for split households / ‘moving’ respondents across waves 4

Therefore. . . Ø We will change the system and Ø have unique person

Therefore. . . Ø We will change the system and Ø have unique person ids, that can be used to merge modules and waves Ø person id will not change across waves, even if a household splits Ø have string country codes instead of numeric ones Ø We will divide sampid into different parts: Ø household id (fixed part and split indicator if needed) Ø new wave indictor variable ‘wi’ indicates when a household first entered the sample 5

New household identifier hhid 1 (internal) & hhid (public) Ø Digits 1 -2: country

New household identifier hhid 1 (internal) & hhid (public) Ø Digits 1 -2: country code in letters. e. g. AT for Austria, Bf for Belgium French speaking (internal) Ø Digits 3 -8: fixed household ID This part will not change across waves if household splits off Ø Digit 9: one digit added to the fixed household id to identify whether it is an ‘additional’ household that resulted from a split, Ø A for all ‘original’ household (all in wave 1, refresher in wave 2) Ø B used only if a household has split. A is than still used for the ‘first’ part of the household and B for the ‘splitting part’ (the one that is interviewed second, normally the one that moved out) Ø C is used for very rare case of split off household when original household in wave 1 consisted of 3 eligible sisters for example and split in 3 parts. Examples for new household id AT 100100 A: Austria, ‘original’ household AT 100100 B: Austria, split off household Bf 140103 A: Belgium French speaking household (internal) 6

New person identifier: person 1 Ø Digits 1 -2: country code (CC) in letters

New person identifier: person 1 Ø Digits 1 -2: country code (CC) in letters e. g. AT for Austria, Bf for Belgium French speaking Ø Digits 3 -8: fixed household ID this part will not change across waves. Ø Digit 9 -10: respondent id, e. g if respid is 1 it will be 01 Respondent identifier old new Sampid & respid person 1 1110010000 1 AT 10010001 1110010000 2 AT 10010002 2314010300 1 Bf 14010301 7

Old and new ids internal version public version (scrambled) old new Household ID &

Old and new ids internal version public version (scrambled) old new Household ID & wave indicator sampid hhid 1 & wi sampid 2 hhid Person id sampid & respid person 1 sampid & respid & wi personid 8

In addition: Ø A dataset will be generated that shows to which households a

In addition: Ø A dataset will be generated that shows to which households a respondent belonged during her or his ‘SHARE history’, e. g. : person 1 hhid 1_w 2 hhid 1_w 3 AT 10010001 AT 100100 A AT 10010002 AT 100100 A AT 100100 B Bf 14010301 Bf 140103 A Ø A compatibility file will be made for internal use to merge the old sampid respid files with the new ids Ø We will have an additional person id (uuid) to insure uniqueness, but it will be used in the background only for technical reasons 9

Data cleaning Ø always use the unscrambled version that includes sampid for data cleaning

Data cleaning Ø always use the unscrambled version that includes sampid for data cleaning Ø use sampid and respid to identify respondents Ø generate/compute sampid_original, respid_original and cvid_original before you change ids 10