Disseminating census microdata the IPUMS and IECM experiences

  • Slides: 21
Download presentation
Disseminating census microdata: the IPUMS and IECM experiences, 2002 -2010 (and plans for beyond)

Disseminating census microdata: the IPUMS and IECM experiences, 2002 -2010 (and plans for beyond) * * * Robert Mc. Caa and Albert Esteve Minnesota Population Center and Centre d’Estudis Demogràfics rmccaa@umn. edu; aesteve@ced. uab. es www. ipums. org/international (Global) www. iecm-project. org (Europe portal) “Only used statistics are useful statistics. ” -- Joint UNECE/Eurostat Meeting on Population and Housing Censuses inf. 1

3 goals of presentation: IPUMS/IECM census microdata projects 1. Discuss dissemination statistics from 59,

3 goals of presentation: IPUMS/IECM census microdata projects 1. Discuss dissemination statistics from 59, 170 extracts downloaded by IPUMS registered users 2. Invite 21 European partners to entrust 2010 round samples as expeditiously as possible 3. Invite non-partners to entrust samples of historical censuses (2000 and earlier rounds) as well as for the 2010 round

Outline: Integrating census samples and metadata for timely dissemination via the IPUMS-International and IECM

Outline: Integrating census samples and metadata for timely dissemination via the IPUMS-International and IECM initiatives, 2010 -2014 no. of slides 1. 2. 3. IPUMS-International: massive, global dissemination IPUMS-International: usage statistics Conclusion 7 9 2

1. IPUMS-International: Massive, Global Integration and Dissemination “…best practice for a data repository of

1. IPUMS-International: Massive, Global Integration and Dissemination “…best practice for a data repository of international statistical data” --Dennis Trewin chair UNECE task force on Statistical Confidentiality & Microdata Access See also: » » 2006: "IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted access census microdata extracts to academic users, " Monographs of official statistics: Work session on statistical data confidentiality. 2009: Entrusting census microdata and metadata for timely integration and dissemination via the IPUMS-Eur. Asia and IECM initiatives, 2010 -2014. ECE/CES/GE. 41/2009/23

IPUMS-International: » Begun in 1999, IPUMS-International is the world’s largest integrated demographic database: »

IPUMS-International: » Begun in 1999, IPUMS-International is the world’s largest integrated demographic database: » » » 159 integrated, anonymized census samples (55 countries) 325 million person records; 3, 600 approved researchers Database is likely to double over the next five years, by the addition of: » » » 2010 round samples of 17 current Eur-Asian partners: Armenia, Austria, Belarus, Canada, France, Greece, Hungary, Italy, Kyrgyzstan, Netherlands, Portugal, Romania, Slovenia, Spain, Switzerland, UK, USA, etc. Samples for 8 Eur-Asian countries currently in development: Belgium, Czech Republic, Ireland, Germany, Poland, Turkey, Turkmenistan, Ukraine Future partners? Albania? Bulgaria? Croatia? Estonia? Finland? …

59, 170 extracts— 586, 643 variables—disseminated jumped 10% in June, with the 2010 launch

59, 170 extracts— 586, 643 variables—disseminated jumped 10% in June, with the 2010 launch » IPUMS-International NEVER disseminates source microdata! 4 IPUMS constructed variables ranked in the top 30 » » » » Spouse’s location in household Mother’s location in household Father’s location in household Spouse rule for inferring location in household These variables are constructed from household samples » 3 countries with person samples are invited to construct household samples: » Canada » Netherlands » UK

IPUMS-International dark green = integrated and disseminating (55 countries, 159 censuses, 325 millon person

IPUMS-International dark green = integrated and disseminating (55 countries, 159 censuses, 325 millon person records) green = to be integrated (35 countries, 90 censuses, 150 mill. ) 2011: Cambodia 2008 Egypt 2006 France 2006 Germany Indonesia Ireland etc. 2012: why not yours? Mollweide projection

2011 launch at the 58 th Session ISI: Dublin, Aug 21 -26, 2011 http:

2011 launch at the 58 th Session ISI: Dublin, Aug 21 -26, 2011 http: //www. isi 2011. ie » European samples to be launched » France, 2006 » Germany (1970 -87; DFR ‘ 71, ‘ 81) » Ireland (1971 -2006) » Beyond Europe, samples for: » Cambodia 2008 » Egypt 2006 » Jamaica, 1981 -2001 » Iran 2006 » Etc. » Successive annual launches planned for 2012, 2013, 2014.

Dissemination of microdata extracts via IPUMS-International » IPUMS-International NEVER disseminates source microdata! Usage is

Dissemination of microdata extracts via IPUMS-International » IPUMS-International NEVER disseminates source microdata! Usage is restricted to bona-fide researchers who agree to stringent conditions of use to protect statistical confidentiality IPUMS disseminates extracts, custom-tailored to researchers needs » » » Unlike most statistical agencies which disseminates an identical entire sample to every user

Dissemination of microdata and metadata extracts » The massive scale of IPUMS requires users

Dissemination of microdata and metadata extracts » The massive scale of IPUMS requires users to be selective: » » » » Once an extract request is submitted, the IPUMS extract engine: » » » Select country (or countries) Select samples (census years) Select variables (e. g. , age, sex, educational attainment, etc. ) Select sub-populations (e. g. , nurses) Select sample density Constructs the microdata extract Constructs the metadata Emails the researcher to retrieve the extract password protected, transmission is encrypted 128 bit SSL The researcher downloads the extract, un-zips and analyzes Extract system validated as usage has soared

2. IPUMS-International Usage statistics See card hand-out for list of current samples and usage

2. IPUMS-International Usage statistics See card hand-out for list of current samples and usage statistics

Usage Statistics (June 4, 2010) » » 59, 170 extracts (jumped 10% in June)

Usage Statistics (June 4, 2010) » » 59, 170 extracts (jumped 10% in June) Average: 1, 000 extracts per country » » Smallest number of extracts: Kyrgyz Republic, 116 census of 1999; first year of availability Largest number of extracts: Mexico, 7, 637 6 censuses, 8 years of availability Mexico 2000: 2, 464 extracts Usage statistics by country: see Table 2

Table 2. Usage statistics: Sample Rank and Details Table 2. Extract Rank and Sample

Table 2. Usage statistics: Sample Rank and Details Table 2. Extract Rank and Sample Details for the Top Five and all European Countries Rank Country Sample %* Variables (n)* Years of census samples 1 Mexico 10 120 1960 p, 70, 95, 2000, 05 2 Brazil 5 106 1960, 70, 80, 91, 2000 3 United States 5 92 1960, 70, 80, 90, 2000, 05 4 Colombia 10 120 1964 p, 72, 85, 93, 2005 5 France 5 99 1962, 68, 75, 82, 90, 99 10 Canada 2. 5 59 1971 p, 81 p, 91 p, 2001 p 12 Spain 5 99 1981, 91, 2001 13 Greece 10 89 1971, 81, 91, 2001 19 Hungary 5 74 1970, 80, 90, 2001 21 Austria 10 75 1971, 81, 91, 2001 22 Portugal 5 96 1981, 91, 2001 23 Romania 10 97 1976, 92, 2002 23 Austria 10 75 1971, 81, 91, 2001 29 UK 3 47 1991, 2001 p 30 Netherlands 1 33 1960 p, 71 p, 2001 p 32 Belarus 10 84 1999 38 Italy 5 81 2001 43 Slovenia 10 80 2002 Total extracts from the IPUMS-International database for 55 countries (158 samples) Jun 4, 2010 *2000 round census; refers to all integrated variables, including IPUMS constructed Extracts 7, 637 5, 191 4, 559 3, 428 2, 795 1, 614 1, 514 1, 496 1, 132 1, 087 1, 028 1, 012 1, 087 657 570 333 209 133 59, 170

Table 3. 32 most popular variables Table 3. Thirty-two most popular variables in IPUMS-International

Table 3. 32 most popular variables Table 3. Thirty-two most popular variables in IPUMS-International Label Extracts Mnemonic Comment 1 Educational attainment 19, 307 EDATTAN 2 Age (single years to 85+) 19, 009 AGE Grouped age n=3, 838 3 Employment status 18, 490 EMPSTAT 4 Marital status 18, 214 MARST 5 Person weight 17, 511 WTPER Technical variable 6 Relationship to head 15, 783 RELATE 7 Sex 14, 595 SEX 8 Class of work 12, 583 CLASSWK 9 Ownership of dwelling 8, 050 OWNRSHP 10 Occupation ISCO recode 8, 004 OCCISCO 11 School attendance 7, 919 SCHOOL 12 Years of schooling 7, 576 YRSCHL 13 Literate 7, 290 LIT 14 Urban/rural 7, 098 URBAN 15 Industry-general code 7, 044 INDGEN 16 Household weight 6, 656 WTHH Technical variable 17 Children ever born 6, 363 CHBORN 18 Nativity (native/foreign born) 6, 332 NATIVTY 19 Occupation 6, 246 OCC

Table 3. 32 most popular variables (cont. ) Table 3. Thirty-two most popular variables

Table 3. 32 most popular variables (cont. ) Table 3. Thirty-two most popular variables in IPUMS-International Label Extracts Mnemonic Comment 1 Educational attainment 19, 307 EDATTAN 19 Occupation 6, 246 OCC 20 Country of birth 6, 153 BPLCTRY 21 Religion 6, 075 RELIG 22 Industry 5, 670 IND 23 Location of spouse in household 5, 007 SPLOC Constructed (household) 24 Rule for locating spouse 4, 171 SPRULE Constructed (household) 25 Location of mother in household 4, 153 MOMLOC Constructed (household) 26 Number of children surviving 4, 074 CHSURV 27 Place of residence 5 years ago 4, 064 MGRATE 5 28 Location of father in household 3, 983 POPLOC Constructed (household) 29 Total household income 3, 965 INCTOT Household variable 30 Earned income 3, 655 INCEARN 31 Number of rooms 3, 465 ROOMS 32 Consensual union 3, 443 CONSENS

For uses, see http: //bibliography. ipums. org

For uses, see http: //bibliography. ipums. org

And: scholar. google. com IPUMS & name of country, subject, etc.

And: scholar. google. com IPUMS & name of country, subject, etc.

Minimum Standards for Samples Entrusted to IPUMS for dissemination 1. Household samples only 2.

Minimum Standards for Samples Entrusted to IPUMS for dissemination 1. Household samples only 2. High precision: 5% minimum, 10% preferred 3. Broad set of variables—omit only those required for statistical confidentiality (low-level geography, low frequency attributes) Detailed codes 4. » » » » Age: single year to 85 Occupation, industry: 3 digit ISCO, ISIC Country of birth: detail individual countries consistent with statistical confidentiality Thanks to INSEE France for sample of recensement renovee, 2004 -2008: 20 million person records to be launched next year.

Conclusion: Invitation to continued cooperation » In 1999, our dream: integrate samples of 21

Conclusion: Invitation to continued cooperation » In 1999, our dream: integrate samples of 21 countries in 10 years » » » By 2009, integrated samples for 44 countries » » » Number of users and usage far exceeded expectations For the 2010 decade, our dream: » » » » Thanks to generous cooperation of 55 National Statistical Offices Undreamed technological innovations Double the number of users Double the number of integrated samples Re-draw samples that do not meet minimum standards, where feasible Participating statistical agencies: please entrust 2010 samples in due course Other statistical agencies: entrust series of samples for each census for which microdata exist

…and to the 58 th Session ISI: Dublin, Aug 21 -26, 2011 http: //www.

…and to the 58 th Session ISI: Dublin, Aug 21 -26, 2011 http: //www. isi 2011. ie » IPUMS Workshop, Aug 19 -20 » New IPUMS initiatives » Reports by IPUMS users » Reports by National Statistical Office-partners » IPUMS sponsorship for delegates from participating countries: » economy air, » registration fees, » 8 nights accomodations and modest per-diem » Simultaneous interpretation: Russian/French/English

Thank you for your cooperation!! rmccaa@umn. edu aepalos@ced. uab. es www. ipums. org/international www.

Thank you for your cooperation!! rmccaa@umn. edu aepalos@ced. uab. es www. ipums. org/international www. iecm-project. org