Disseminating integrated census microdata to academic researchers and

  • Slides: 34
Download presentation
Disseminating integrated census microdata to academic researchers and policy makers at no cost (plus

Disseminating integrated census microdata to academic researchers and policy makers at no cost (plus we pay US$1 -5, 000 per census to the NSO-owner) *** Robert Mc. Caa Minnesota Population Center rmccaa@umn. edu www. ipums. org/international www. hist. umn. edu/~rmccaa/IPUMSI www. ipums. org/international

IPUMS-International, 2009 Special thanks to: CSO-Vietnam NIS-Cambodia BPS-Indonesia PCO-Pakistan NBS-China NSSO-India BBS-Bangladesh DOS-Malaysia NSO-Mongolia

IPUMS-International, 2009 Special thanks to: CSO-Vietnam NIS-Cambodia BPS-Indonesia PCO-Pakistan NBS-China NSSO-India BBS-Bangladesh DOS-Malaysia NSO-Mongolia CBS-Nepal NSO-Philippines NSO_Thailand dark green = disseminating medium green = integrating lightest green = negotiating Mollweide projection www. ipums. org/international

Integrating Asian Census Microdata dark green = disseminating Respectful invitation medium green = integrating

Integrating Asian Census Microdata dark green = disseminating Respectful invitation medium green = integrating to the National lightest green = talking Statistical Offices of: Afghanistan Bhutan Iran DPR Korea DPR Laos Maldives Sri Lanka Timor Leste www. ipums. org/international

Outline: Disseminating Census Microdata 1. 2. 3. 4. What are census microdata 3 slides

Outline: Disseminating Census Microdata 1. 2. 3. 4. What are census microdata 3 slides Electronic archiving of census microdata: do it now! 4 slides Why are census microdata essential? 2 slides IPUMS-International: invitation to participate 10 slides What is IPUMS? What are the benefits? How are the integrated metadata and microdata constructed and accessed? 5. Conclusions 3 slides www. ipums. org/international

1. What are census microdata? And how do they differ from “raw data”? (3

1. What are census microdata? And how do they differ from “raw data”? (3 slides) www. ipums. org/international

16 th century Aztec census written on fig-bark paper, in Nahuatl, will survive another

16 th century Aztec census written on fig-bark paper, in Nahuatl, will survive another 500 years manuscript transcribed translated and converted to microdata Sources: www. ipums. org/international Museo Nacional de Antropología e Historia (Mexico City). "Libro de Tributos, " Colección Antigua, ms. 549 bis. Sarah Cline, The Book of Tributes. Early Sixteenth-Century Nahuatl Censuses from Morelos. Los Angeles: 1993.

What are “census microdata”? : anonymized, computerized census records of individuals, households & dwellings

What are “census microdata”? : anonymized, computerized census records of individuals, households & dwellings 12100102600700720000011210000104 22200202600700720000011210000104 32300100600700720000012123000000 4230020040070000000000 Study any desired set of characteristics. 5230020020070000000000 Easier to integrate than tables. 6230020000070000000000 Facilitates comparative research. www. ipums. org/international

How do census microdata differ from “raw data”? : 1. detailed geography is suppressed

How do census microdata differ from “raw data”? : 1. detailed geography is suppressed and 2. strict measures are implemented to protect privacy of individuals, households, dwellings & other entities 12100102600700720000011210000104 22200202600700720000011210000104 32300100600700720000012123000000 4230020040070000000000 5230020020070000000000 6230020000070000000000 Note absence of detailed geography www. ipums. org/international

2. Digital Archiving (4 slides) Census Tablet (digital image): Assyria, 2700 B. P. Library

2. Digital Archiving (4 slides) Census Tablet (digital image): Assyria, 2700 B. P. Library of King Ashurbanipal www. ipums. org/international

Bangladesh Bureau of Statistics Tape Archive photo: April 14, 2006 2009: Census data on

Bangladesh Bureau of Statistics Tape Archive photo: April 14, 2006 2009: Census data on most of these tapes were recovered. www. ipums. org/international

Archiving: no longer a problem for recent censuses --generally excellent in Asian agencies I

Archiving: no longer a problem for recent censuses --generally excellent in Asian agencies I have visited-- » Documentation (forms, instructions, definitions, dictionaries, methodological reports, etc. ): Preserve at least two copies in at least 2 institutes Census docs: Send 1 copy pre-paid courier to MPC » Paper » . PDF » and one of the following: . HTML, . DOC, . XLS, or. TXT » DATA Preserve at least two copies in at least 2 institutes on the most stable media (CD and Servers) Census microdata: send copy pre-paid courier to MPC » » Un-edited “Raw Data” (ASCII) Edited Data (ASCII) 1981 census of Bangladesh 3 tapes containing microdata www. ipums. org/international Even the moldy one was recovered!!!!

R I E C P O U M V Si E R S Centro

R I E C P O U M V Si E R S Centro Latino Americano y Caribeño de Demografía (CELADE: Santiago, Chile) ~3000 microdata tapes recovered and fully documented (funded by NSF) www. ipums. org/international

R I E C P O IPUMS now has largest collection of census documentation

R I E C P O IPUMS now has largest collection of census documentation U in the. V world, having acquired paper/electronic archives M from: Si E » United. R Nations Statistical Division Centro Latino Americano y Caribeño de » United. SNations Population Division Demografía (CELADE) » » » CELADE (Latin America) ~3000 microdata tapes recovered and East-West Center (Asia/Pacific) fully documented (funded U. S. Census Bureau International Programs Centerby NSF) www. ipums. org/international

Archived census microdata by region and decade % of censuses conducted inventory by IPUMS-International

Archived census microdata by region and decade % of censuses conducted inventory by IPUMS-International Region/continent Countries 2000 s 1990 s 1980 s 1970 s 1960 s Latin America 21 100% 89% 81% 72% North America 27 100% 72% 64% 24% 8% Africa 58 100% 53% 46% 25% 2% Asia 44 100% 54% 30% 13% Europe 46 100% 67% 55% 41% 13% Pacific (pop>. 5 m) 7 100% 43% 29% • Note: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. Source: http: //www. hist. umn. edu/~rmccaa/IPUMS/country 6. htm March 15, 2009 www. ipums. org/international

Archived census microdata by region and decade % of censuses conducted What Asianinventory census

Archived census microdata by region and decade % of censuses conducted What Asianinventory census microdata and documentation still exist by IPUMS-International » » …for the 1960 s? … 1970 s? … 1980 s? … 1990 s? Region/continent Countries 2000 s 1990 s 1980 s 1970 s 1960 s How much will be lost before they can be recovered, documented and archived? Latin America 21 100% 89% 81% 72% Help preserve these treasures now—IPUMS pays costs of North America 27 100% 72% 64% 24% 8% shipping and recovery. Africa 58 100% 53% 46% 25% 2% Asia 44 100% 54% 30% 13% Europe 46 100% 67% 55% 41% 13% Pacific (pop>. 5 m) 7 100% 43% 29% • Note: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. Source: http: //www. hist. umn. edu/~rmccaa/IPUMS/country 6. htm March 15, 2009 www. ipums. org/international

3. Why is the dissemination of census microdata essential? (2 slides) www. ipums. org/international

3. Why is the dissemination of census microdata essential? (2 slides) www. ipums. org/international

Julia Lane, European Statisticians Conference (2003) 6 benefits from disseminating microdata » 1. Analyze

Julia Lane, European Statisticians Conference (2003) 6 benefits from disseminating microdata » 1. Analyze more realistic questions » 2. Acquire new constituencies and stakeholders » 3. Build trust; reduce suspicion » 4. Replicate findings » a. use standards of UNSD, Eurostat, ISCO, ISCED, etc. » b. facilitate comparative research in time and space » 5. Calculate marginal effects » 6. Assess data quality » …and much, much more…. www. ipums. org/international

UNSD Principles and Recommendations (Rev. 1, 1997) endorse dissemination of census microdata » §

UNSD Principles and Recommendations (Rev. 1, 1997) endorse dissemination of census microdata » § 1. 218: “There a range of methods…that can be used to make such microdata available while still protecting individuals’ rights to privacy. ” » 2006 Africa Symposium on Statistical Development (Cape Town, Jan 30 -Feb. 2, 2006) » “microdata may be disseminated provided that confidentiality is preserved” » Most (all? ) advanced statistical agencies make census microdata available (some more widely than others). Since the: » » 1960 s: 1970 s: 1980 s: 1990 s: USA, Finland, France, Korea, plus 18 Latin American countries Canada, Czechoslovakia, Japan, Malaysia, Norway, Philippines Australia, Italy, Spain, Thailand, plus many Asian countries Germany, Russia, Switzerland, UK, plus many other countries » In four decades of distributing census microdata there is not a single allegation of violation of confidentiality or privacy. www. ipums. org/international

4. Invitation to participate in IPUMS -International (10 slides) www. ipums. org/international

4. Invitation to participate in IPUMS -International (10 slides) www. ipums. org/international

What is IPUMS-International? …a global collaboratory of National Statistical Institutes & Universities to: »

What is IPUMS-International? …a global collaboratory of National Statistical Institutes & Universities to: » 1. Inventory the world’s census microdata » 2. Archive census microdata and documentation *** » 3. Integrate census microdata » a. use standards of UNSD, Eurostat, ISCO, ISCED, etc. » b. facilitate comparative research in time and space » 4. Anonymize census microdata to preserve statistical confidentiality, using highest standards » 5. Disseminate restricted access, custom extracts to approved researchers/research projects at no cost www. ipums. org/international

IPUMS-International (2009): 130 high precision samples 44 countries, 279. 5 million person records Country

IPUMS-International (2009): 130 high precision samples 44 countries, 279. 5 million person records Country Censuses Samples France 1962 -1999 6 Netherlands 1960 -2001 3 Argentina 1970 -2001 4 Ghana 2002 Palestine 1997 Armenia 2001 1 Greece 1981 -2001 3 Panama 1960 -2000 5 Austria 1971 -2001 4 Guinea(Conakry)1983 -1996 1 Philippines 1990 -2000 3 Belarus 1999 1 Hungary 1970 -2001 4 Portugal 1981 -2001 3 Bolivia 1976 -2001 3 India 1983 -1999 4 Romania 1977 -2001 3 Brazil 1960 -2001 3 Iraq 1996 1 Rwanda 1991 -2002 2 Cambodia 1998 1 Israel 1972 -1995 3 Slovenia 2001 Canada 1971 -2001 4 Italy 2001 1 South Africa 1991 -2007 3 Chile 1960 -2002 5 Jordan 2004 1 Spain 1981 -2001 3 China 1982 -1990 2 Kenya 1989 -1999 2 Uganda 1991 -2002 2 Colombia 1964 -2005 5 Kyrgyz Republic 1999 Costa Rica 1963 -2000 4 Malaysia 1970 -2000 4 United States 1960 -2005 6 Ecuador 1962 -2001 5 Mexico 1960 -2005 5 Venezuela 1971 -2001 4 Egypt 1996 1 Mongolia 1989 -2000 2 Vietnam www. ipums. org/international 1 1 United Kingdom 1991 -2001 2 1989 -1999 2

IPUMS-International strengths 1. Uniform legal authorization with each National 2. 3. 4. 5. 6.

IPUMS-International strengths 1. Uniform legal authorization with each National 2. 3. 4. 5. 6. 7. Statistical Office Access restricted to bona fide researchers with need MPC Experienced integration teams MPC Proven web-based distribution system High user satisfaction NSO: Improved research and empirically based policy-making Sustainable: NSF, NIH funded through 2014 www. ipums. org/international

Legal: NSO (Austria) and U. of Minnesota www. ipums. org/international

Legal: NSO (Austria) and U. of Minnesota www. ipums. org/international

IPUMS—Microdata integration method: composite codes (multiple digits) retains not only significant distinctions but also

IPUMS—Microdata integration method: composite codes (multiple digits) retains not only significant distinctions but also integrates comparable concepts Chile Code 0 Label NIU 100 111 112 113 114 115 116 117 118 119 120 ACTIVE (In Labor Force) EMPLOYED, not specified At work, and 'student' At work, and 'housework' At work, and 'seeking work' At work, and 'retired' At work, and 'no work' At work, and 'other' At work, family holding, not specified At work, family holding, not agricultural At work, family holding, agricultural Have job, not at work last week México 1992 X 2002 X 1990 X 2000 X · X · · · · · X · · · · · X · X X X X · · · X www. ipums. org/international

Metadata: Employment Status EMPSTAT Employment status Description EMPSTAT indicates whether or not the respondent

Metadata: Employment Status EMPSTAT Employment status Description EMPSTAT indicates whether or not the respondent was part of the labor force -working or seeking work -- over a specified period of time. Depending on the sample, EMPSTAT can also convey further information. The first digit of EMPSTAT is fully comparable, and classifies the population into three groups: employed, unemployed, and inactive. The combination of employed and unemployed yields the total labor force. The second and third digits of EMPSTAT preserve additional information available for some countries and census years but not for others. Employment status is sometimes referred to in other sources as "activity status. " Comparability -- General The age of persons to whom the question applies varies across the samples (see Universe). The reference period for the employment status question varies. For most samples, employment status was reportedwww. ipums. org/international with respect to the day of the census or…

Integrate: retain all significant detail, harmonize everything Not standardize: force square pegs in round

Integrate: retain all significant detail, harmonize everything Not standardize: force square pegs in round holes Comparability -- Mexico The universe and reference period are fully comparable across the Mexico samples. The 1970 Census did not provide detail on the inactive population except for "houseworkers, " while the later samples have numerous subcategories. In 1990, the employment status question refers to "Principal Activity" and therefore underreports secondary economic activity by students, housewives, family-workers, the semiretired, and others. The 2000 Census sought to overcome deficiencies in reporting work status for people whose primary activity was not work (students, housewives, retirees, etc. ), but who in fact were working according to international definitions. A second question introduced for the first time in 2000 sought to capture thiswww. ipums. org/international secondary economic activity. For strict comparability with earlier Mexican censuses, this recovered activity (coded “at work and …”) should be

6 steps using https: //www. ipums. org/international: 1. Logon w/ password 4. Download extract

6 steps using https: //www. ipums. org/international: 1. Logon w/ password 4. Download extract (SSL encrypted) 2 a. Study documentation 2 b. Design extract 3. Receive email; logon with p/word (also SAS, STATA) 5. Un. Zip data www. ipums. org/international 6. Analyze

Project pays all costs, including: » License fee to participating National Statistical Institute »

Project pays all costs, including: » License fee to participating National Statistical Institute » Asian Producer/User workshop, South Africa, 2009 (ISI) initiative work plan. Durban, (3 years): 1. Establish legal foundations: negotiate Memorandum of 2. 3. 4. 5. 6. 7. Understanding: National Statistical Institute (NSI) & Minnesota Population Center (MPC) Entrust copies of microdata and documentation to project (NSI) License microdata (MPC pays US$5, 000 per census to NSI, upon receipt of microdata, documentation and invoice) Design regional harmonization protocols census-by-census, concept-by-concept, code-by-code and write integrated metadata (MPC) Impose confidentiality protections customized for each census Disseminate microdata to licensed users (MPC, NSI) free of charge www. ipums. org/international Cooperate with regional partners in education and training

IPUMS-Eur. Asia: Will your statistical institute participate? » Formalities: 1. 2. 3. » »

IPUMS-Eur. Asia: Will your statistical institute participate? » Formalities: 1. 2. 3. » » » Sign Memorandum of understanding Entrust Microdata and documentation to project Collect license fee 2009+: advise on technical details as needed; workshops as funding permits 2011: ISI meeting, Dublin, Ireland Inauguration of integrated database with 180 census samples 2013: ISI meeting, Hong Kong SAR, China Inauguration of integrated database with 220 census samples www. ipums. org/international

5. Conclusions (3 slides) www. ipums. org/international

5. Conclusions (3 slides) www. ipums. org/international

Benefits from Disseminating Census Microdata » National Statistical Institutes 1. 2. 3. 4. 5.

Benefits from Disseminating Census Microdata » National Statistical Institutes 1. 2. 3. 4. 5. » Manage statistics for more equitable societies Increase trust, transparency and stakeholders Increase usage, better science and policy Enhance cost-benefit ratio Little marginal cost (project pays $5, 000 per census) Citizens, Society and Government: 1. 2. 3. Who we are What the future may bring How policies might be improved www. ipums. org/international

Integrating Asian Census Microdata dark green = disseminating Respectful invitation medium green = integrating

Integrating Asian Census Microdata dark green = disseminating Respectful invitation medium green = integrating to the National lightest green = talking Statistical Offices of: Afghanistan Bhutan Iran DPR Korea DPR Laos Maldives Sri Lanka Timor Leste www. ipums. org/international

What needs to be done to participate? » National Statistical Office: 1. 2. 3.

What needs to be done to participate? » National Statistical Office: 1. 2. 3. » Endorse project Memorandum of Understanding--80+ countries Entrust copies of documentation & microdata to MPC--75+ countries Invoice for each census $1, 000 per census for less than one million person records $5, 000 per census for one million or more person records MPC: 1. 2. 3. 4. 5. Endorse project Memorandum of Understanding– Afghanistan? , Bhutan? , Iran? , DPR Korea? , DPR Laos? , Maldives? , Timor Leste? Pay license fee for microdata and documentation– Indonesia!! Digitize metadata and translate to English– Pakistan!! Harmonize microdata– Cambodia!! www. ipums. org/international Disseminate microdata with copies on CDs to each NSO– Vietnam!!

Thank you!! https: //www. ipums. org/international additional information at: www. hist. umn. edu/~rmccaa/IPUMSI ******

Thank you!! https: //www. ipums. org/international additional information at: www. hist. umn. edu/~rmccaa/IPUMSI ****** Contact: rmccaa@umn. edu www. ipums. org/international