Using XML Mapper and XMLMAP to Read Data

  • Slides: 33
Download presentation
Using XML Mapper and XMLMAP to Read Data Documented by Data Documentation Initiative (DDI)

Using XML Mapper and XMLMAP to Read Data Documented by Data Documentation Initiative (DDI) Files Larry Hoyle Policy Research Institute University of Kansas SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand product names are registered trademarks or Trademarks of their respective companies.

Overview n A SAS program reads an XML metadata file and writes a SAS

Overview n A SAS program reads an XML metadata file and writes a SAS program to read the raw data file described by the metadata file. make. Reader. sas 2825. xml Read 2825. sa s DDI. ma p Work. ICPSR 2825 Household Read 2825. sas Work. ICPSR 2825 Family Work. ICPSR 2825 Person Da 2825. txt

DDI n “an international effort to establish a standard for technical documentation describing social

DDI n “an international effort to establish a standard for technical documentation describing social science data” - http: //www. icpsr. umich. edu/DDI/index. html

DDI Files n XML • DTD - http: //www. icpsr. umich. edu/DDI/users/dtd/index. html n

DDI Files n XML • DTD - http: //www. icpsr. umich. edu/DDI/users/dtd/index. html n Metadata about: • The DDI file itself • The study that collected the data • The data file • Variables within the data file • Other Material

The Minimal DDI File <? xml version="1. 0"? > <code. Book> <stdy. Dscr> <citation>

The Minimal DDI File <? xml version="1. 0"? > <code. Book> <stdy. Dscr> <citation> <titl. Stmt> <titl>Howdy World: Valid but Useless Metadata</titl> </titl. Stmt> </citation> </stdy. Dscr> </code. Book>

Real Example: ICPSR 6084 Raw Data File 100001 161132146211115555299 99911219 200001 49992 30000102534 100002

Real Example: ICPSR 6084 Raw Data File 100001 161132146211115555299 99911219 200001 49992 30000102534 100002 000325222641942 12213112421111212112222221121 2122 12 200002 30000202574 3834101202 12221 000756221622052 4261103202

ICPSR 6084 – About The Study <citation> <titl. Stmt> <titl>CBS News Monthly Poll #2,

ICPSR 6084 – About The Study <citation> <titl. Stmt> <titl>CBS News Monthly Poll #2, August 1992</titl>

ICPSR 6084 – About The File <dimensns> <case. Qnty>1, 546</case. Qnty> <var. Qnty>70</var. Qnty>

ICPSR 6084 – About The File <dimensns> <case. Qnty>1, 546</case. Qnty> <var. Qnty>70</var. Qnty> <log. Rec. L>80</log. Rec. L> <rec. Pr. Cas>3</rec. Pr. Cas>

ICPSR 6084 – Reading The File With SAS <dimensns> <case. Qnty>1, 546</case. Qnty> <var.

ICPSR 6084 – Reading The File With SAS <dimensns> <case. Qnty>1, 546</case. Qnty> <var. Qnty>70</var. Qnty> <log. Rec. L>80</log. Rec. L> <rec. Pr. Cas>3</rec. Pr. Cas> <rec. Num. Tot>4, 638</rec. Num. Tot> </dimensns> infile 'C: DDRIVEdataicpsrdata6084da 6084. txt' LRECL=80 PAD;

More From ICPSR 6084 – first variable

More From ICPSR 6084 – first variable

More From ICPSR 6084 – Reading the first variable input #1 cardno 1 -1

More From ICPSR 6084 – Reading the first variable input #1 cardno 1 -1

More From ICPSR 6084 – another variable #3 respno 2 -6

More From ICPSR 6084 – another variable #3 respno 2 -6

The Tasks n Pull the necessary information from a hierarchical xml file into SAS

The Tasks n Pull the necessary information from a hierarchical xml file into SAS as tables • Use XML libname engine with an XMLMAP file n Use that information in SAS to read the raw data file

Making the XMLMAP File – SAS XML Mapper

Making the XMLMAP File – SAS XML Mapper

Drag the element that defines rows to the root of the XMLMAP structure Defining

Drag the element that defines rows to the root of the XMLMAP structure Defining Tables –What Defines Rows

Defining Tables –Row Defined

Defining Tables –Row Defined

Defining Tables – What Defines Columns Drag an element that defines a column to

Defining Tables – What Defines Columns Drag an element that defines a column to the root of the table

Defining Tables – Column Defined

Defining Tables – Column Defined

Viewing The XMLMap File

Viewing The XMLMap File

Viewing The XMLMap File – Row Path

Viewing The XMLMap File – Row Path

Viewing The XMLMap File Column Path

Viewing The XMLMap File Column Path

Viewing Sample SAS Code

Viewing Sample SAS Code

Previewing the Table

Previewing the Table

XMLMapper Limitations n Not every XML file will have all the elements of any

XMLMapper Limitations n Not every XML file will have all the elements of any possible XML file of that type. • Use XML Schema instead of XML file n An XML Schema file may not work • XML file type defined by DTD • XML Schema too complex for XML Mapper

What then n You can use XMLMapper to start and then hand edit the

What then n You can use XMLMapper to start and then hand edit the XML MAP file.

Lots of Tables From DDI Mostly for Comments DATADSCR_VARGRP DATADSCR_VAR_CATGRY DATADSCR_VAR_INVALRNG_ITEM DATADSCR_VAR_INVALRNG_RANGE DATADSCR_VALRNG_ITEM DATADSCR_VALRNG_RANGE

Lots of Tables From DDI Mostly for Comments DATADSCR_VARGRP DATADSCR_VAR_CATGRY DATADSCR_VAR_INVALRNG_ITEM DATADSCR_VAR_INVALRNG_RANGE DATADSCR_VALRNG_ITEM DATADSCR_VALRNG_RANGE DOCDSCR_CITATION__AUTHENTY DOCDSCR_CITATION__COPYRIGHT DOCDSCR_CITATION__IDNO DOCDSCR_CITATION__OTHID DOCDSCR_CITATION__PRODDATE DOCDSCR_CITATION__PRODUCER DOCDSCR_CITATION__TITL FILEDSCR_FILETXT_RECGRP STDYDSCR_CITATION_BIBLCIT STDYDSCR_CITATION_TITLSTMT STDYDSCR_CITATION_VERSTMT STDYDSCR_CITATION__AUTHENTY STDYDSCR_CITATION__COPYRIGHT STDYDSCR_CITATION__DISTRBTR STDYDSCR_CITATION__FUNDAG STDYDSCR_CITATION__GRANTNO STDYDSCR_CITATION__PRODDATE STDYDSCR_CITATION__PRODUCER STDYDSCR_CITATION__SOFTWARE STDYDSCR_METHOD__COLLMODE STDYDSCR_METHOD__DATACOLLECTOR STDYDSCR_METHOD__FREQUENC STDYDSCR_METHOD__RESINSTRU STDYDSCR_METHOD__SAMPPROC STDYDSCR_METHOD__TIMEMETH STDYDSCR_METHOD__WEIGHT STDYDSCR_STDYINFO_ABSTRACT STDYDSCR_STDYINFO__ANLYUNIT STDYDSCR_STDYINFO__COLLDATE STDYDSCR_STDYINFO__DATAKIND STDYDSCR_STDYINFO__GEOGCOVER STDYDSCR_STDYINFO__KEYWORD STDYDSCR_STDYINFO__NATION STDYDSCR_STDYINFO__TIMEPRD STDYDSCR_STDYINFO__TOPCCLAS STDYDSCR_STDYINFO__UNIVERSE

Write a SAS Program – Metadata Comment data _null_; file reader lrecl=1024 ; length

Write a SAS Program – Metadata Comment data _null_; file reader lrecl=1024 ; length v. Edited $ 2000; set DDIfile. stdy. Dscr_citation_titl. Stmt; if _n_=1 then put '/*' / ' SAS program to read ' agency ' ' IDNo ; stdy. Dscr. Titl= compbl(tranwrd(translate(stdy. Dscr. Titl, ' ', '09'x), '*/', '*_/')); put 'Study Title' _n_ ': ' stdy. Dscr. Titl; alt. Titl=compbl(tranwrd(translate(alt. Titl, ' ', '09'x), '*/', '*_/')); put ' ' alt. Titl; /* SAS program to read ICPSR 6084 Study Title 1 : CBS News Monthly Poll #2, August 1992 August National Poll II, Republican National Convention

Cntlin file for Formats data make. The. Formats; input fmtname $ 1 -7 type

Cntlin file for Formats data make. The. Formats; input fmtname $ 1 -7 type $ 9 -9 start $ 11 -26 default 28 -35 / label : &$512. ; datalines; V 00006 f N 1 1 Yes V 00006 f N 3 1 Converted Refusal ; run; proc format cntlin=make. The. Formats; run;

Input n Logic to produce different input statements for: • Fixed column data •

Input n Logic to produce different input statements for: • Fixed column data • Delimited data

Output n Multiple datasets if different record types • Separate keep dataset options •

Output n Multiple datasets if different record types • Separate keep dataset options • Logic to output to appropriate dataset if "&file. Structure. Type" eq "hierarchical" then do; put 'if left(_Record. Set. Identifier) eq left("' cat. Valu '") then DO; ' / " output " safe. Agency +(-1) safe. IDNo +(-1) rectype '; ' / 'END; ' //; end; if left(_Record. Set. Identifier) eq left("2 ") then DO; output ICPSR 2825 FAMILY ; END;

Some of the Limitations n DDI can describe n. Cubes, geographic coverage, variable groups

Some of the Limitations n DDI can describe n. Cubes, geographic coverage, variable groups • Current make. Reader. sas can’t handle these n DDI definition includes recursive elements • E. g. rec. Grps within rec. Grps • Current make. Reader, sas would not find nested elements

Questions?

Questions?

About the Speaker Larry Hoyle Associate Scientist Policy Research Institute, University of Kansas 1541

About the Speaker Larry Hoyle Associate Scientist Policy Research Institute, University of Kansas 1541 Lilac Lane Lawrence, KS 66044 -3177 Larry. Hoyle@ku. edu