Public Use Microdata Samples Using PDQ Explore Software

Public Use Microdata Samples Using PDQ Explore Software Grace York University of Michigan Library May 2004

2000 Census Data Tabulations • Summary Files 1 -4, Equal Employment Opportunity, School District Data, and Work Flow data are TABULATED data • American Factfinder EXTRACTS the tabulated data

Public Use Microdata Samples • Copies of the original questionnaires with identifying information edited out • Create your own cross tabulations of census data

Typical PUMS Questions • Single years of age by sex for teachers in Michigan (e. g. when will they retire? ) • Race of those with Arab ancestry (no, they are not all white) • Demographic characteristics of immigrants from Senegal (age, sex, education, occupation, income, citizenship for a social survey) • Age, race and sex of automotive industry employees (campaign for organ donations)

PUMS Software Programs • FTP data from Census Bureau (and manipulate with SAS or SPSS) http: //www. census. gov/Press. Release/www/2003/PUMS 5. html • Census Bureau CD-ROMS (Beyond 20/20 software) http: //www. census. gov/mp/www/Tempcat/ PUMS. html • SDA Software for Michigan (UMich Only) http: //nds. umdl. umich. edu/n/nds/ • PDQ Explore http: //www. pdq. com

PDQ Explore Software • Easy interface to – Public Use Microdata Samples, 1 and 5%, 1980 -2000 – IPUMS, edited PUMS, 1850 -1880, 19001920, 1940 -1990 – Current Population Survey, 1991+ – Mortality Schedules • Permits users to tabulate their own variables

Access to PDQ • Librarians may request free Ids, passwords, and software from PDQ • Send e-mail to info@pdq. com – You are a librarian who talked to Grace York – Requesting ID and password for using PDQ Explore – Want to download software for the PDQ Toolbox, Expert Edition http: //www. pdq. com

Software • Download the software per instructions to your hard drive • To begin searching, open the icon on your desktop

Before Beginning … Choose File Two PUMS files – 1% and 5% sample • 1% has data for the nation, states, MSAs and super-Pumas (areas of 400, 000) • 5% has data for the nation, states, MSAs and Pumas (areas of 100, 000)

Before Beginning… Define the data you want in terms of a spreadsheet. The longer part should be defined as rows rather than columns. I want single years of age by sex for all Vietnam-era veterans in the United States Universe = Vietnam-era veterans in the U. S. Column=sex (not very wide) Row=single years of age (could be long)

Before Beginning… Consult Chapter 7 of the PUMS codebook if you want to check the possible variables and the appendices for place/language/ancestry and occupation codes http: //www. census. gov/prod/cen 2000/doc/pums. pdf Chapter 7 is also available on the University of Michigan web site at: http: //www. lib. umich. edu/govdocs/census 2/pums 2000/pums 7. pdf

Before Beginning… Housing Record All geographic codes (state, MSA, PUMA) All housing records Some population records Population Record All population variables Ok to combine with geographic codes in housing Ask for help for other population/housing combinations at: info@pdq. com

Before Beginning… Variable Codes for the Question in the Technical Documentation Data Dictionary AGE SEX VPS 5 Single Years of Age Male or Female Veteran’s Period of Service 5: On active duty during the Vietnam Era (Aug. 1964 to Apr. 1975) http: //www. lib. umich. edu/govdocs/census 2/pums 2000/pums 7. pdf

Logging On Enter the subscriber name and password that you were given by the PDQ staff

Logging On Press OK to close the message of the day

Defining Workspace • To conduct a new search, create a new workspace • Press Finish or return twice

Defining Workspace Name your file on your hard drive and save.

Defining Workspace At the next screen, use the top menu to choose Workspace; then Add a Data Set

Defining Workspace Browse data sets; highlight ipums, cps, or mortality file; Open

Defining Variables • • Once you choose a data set, its codebook will open up Click on the plus button to get a list of variables, their alphabetic symbols, and any numeric values

Defining Variables • Determine the alphanumeric variables you want (e. g. Vietnam-era veteran: yes is VPS 5=1) • Use Top Menu to Choose Query/Setup New Expert Query (Access the codebook later through a tab on the desktop toolbar)

Expert Query Form • • • Make sure you have the correct data set Determine if you want a tabulation (counts or numbers) Name your file

Expert Query Form Enter the code for UNIVERSE (what you’re counting) in the Universe box (e. g. vps 5=1 are Vietnam-era veterans for the entire U. S. )

Expert Query Form • • • Enter the code for the variables in the ROW box (age = single years of age; age/5 would be five year age groups) Enter the code for the variables in the COLUMN box (e. g. sex) Press RESULTS to run the query

Search Results Search results appear in spreadsheet format

Saving Results • • • Click on File/Export Query Results You can save as CSV , tab delimited and several other formats. CSV (WYSIWIG) recommended for use with Excel Use SETUP button to return to query or icon at bottom to review the codebook

Geographic Codes • Geographic codes are found in the Housing documentation • Limit files to Michigan with the code state=26 • Click on Query/New Expert Query to continue

Narrowing the Universe Narrow the universe by using & newcode (e. g. vps 5=1 & state=26)

Logical Operators in PDQ http: //www. lib. umich. edu/govdocs/census 2/pdqop. pdf & is one of numerous operators used in PDQ Operator X: a. . b unary + unary * / % + < > <= >= = or == != or <> & or && ^ | or || Name range plus minus multiply divide modulo add subtract less than greater than less than or equal greater than or equal not equal and exclusive or or Example/Comment age: 15. . 44 sex=+1 (never needed) income 4<=-1000 73*income 1/100 rhhinc/persons subsample%10 income 1+income 2 rhhinc-rearning age<65 age>64 age<=65 age>=65 age=23 income!=0 race=2 & looking=1 bit-wise--use with caution age<18 | age>=65

Altering the Spreadsheet Tabulations Once you have a spreadsheet, click on Options to create totals or percentages for tables or columns

Adding More Parameters Expand the table detail by repeating the row and column data for another parameter (e. g. race) as shown in Dimension 3

Altering Spreadsheet Appearance • • The default shows separate tables for each of the values in the third dimension (e. g. separate spreadsheets for white and black) Change Axis 3 tab to FOREACH everything on same spreadsheet

Calculating Means or Averages • • Calculate averages by changing the query type to summary statistics (e. g. mean or average) at the top Fill in the new Describe Expression box at the bottom with a variable code (e. g. age, income)

Complex Table Mean income of white male Vietnam-era veterans in Michigan by age, whether or not they have earnings You can respecify only veterans with earnings

Altering Mean Income Add & incws > 0 to universe to count only Vietnam-era veterans who are earning more than $0

Complex Table Mean income is higher when data limited to wage-earning veterans

Small Area Geography • Data from the PUMS 5% file is available for states, metropolitan areas, and Public Use Microdata Areas (PUMAS) of 100, 000 • You can identify a PUMA or group of PUMAs using – Maps in American Factfinder (http: //factfinder. census. gov/) – PDF maps on the Census Bureau web site (http: //www. census. gov/geo/www/maps/puma 5 pct. htm) – Mable/Geocorr Search Engine (http: //mcdc 2. missouri. edu/websas/geocorr 2 k. html)

Small Area Geography This map shows Detroit as PUMAs 3701 -3708

PUMA Codes for Michigan Ann Arbor Detroit Flint Grand Rapids Lansing 3200 3701 -3708 2200 1300 1800 PUMA to Place http: //www. lib. umich. edu/govdocs/census 2/pumapl 00. txt Place to PUMA http: //www. lib. umich. edu/govdocs/census 2/plpuma 00. txt

Codebook and PUMAS The Explore Codebook shows PUMA 5 as term for 5% PUMA boundaries

Small Area Geography and Ranges When creating data sets for PUMAS, be sure to include the correct state as the universe (e. g. state=26)

Small Area Geography and Ranges Puma 5: 3701. . 3708 will list the data for each individual area

Small Area Geography and Ranges Search result for each individual PUMA

Small Area Geography for Ranges To get the total for the area, list it in the universe as puma 5 >3700 & puma 5 <3709 & state=26

Small Area Geography for Ranges To get a listing of single years of age between 65 and 85, list column as age: 65. . 85

Calculating Totals • To calculate the most spoken languages by 65 -85 year olds as a group • Click on Options/Total Options/Row

Complex Result Spanish and Polish are two most popular languages spoken by seniors 65 -85 in Detroit

Access to PDQ • Librarians may request free Ids, passwords, and software from PDQ • Send e-mail to info@pdq. com – You are a librarian who talked to Grace York – Requesting ID and password for using PDQ Explore – Want to download software for the PDQ Toolbox, Expert Edition http: //www. pdq. com

Contacts for Research Assistance Initial Queries Grace York, Documents Center, 203 Hatcher graceyor@umich. edu or 936 -2378 Jo. Ann Dionne, Numeric and Spatial Data Services, 825 Hatcher, jdionne@umich. edu, 763 -9408 Complex Data Sets Lisa Neidert, Population Studies Center, 426 Thompson, lisan@umich. edu, 763 -2163 PDQ Staff, 310 Depot Street, Suite C, Ann Arbor 48104, info@pdq. com
- Slides: 49