Using XML Mapper and XMLMAP to Read Data
- Slides: 33
Using XML Mapper and XMLMAP to Read Data Documented by Data Documentation Initiative (DDI) Files Larry Hoyle Policy Research Institute University of Kansas SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand product names are registered trademarks or Trademarks of their respective companies.
Overview n A SAS program reads an XML metadata file and writes a SAS program to read the raw data file described by the metadata file. make. Reader. sas 2825. xml Read 2825. sa s DDI. ma p Work. ICPSR 2825 Household Read 2825. sas Work. ICPSR 2825 Family Work. ICPSR 2825 Person Da 2825. txt
DDI n “an international effort to establish a standard for technical documentation describing social science data” - http: //www. icpsr. umich. edu/DDI/index. html
DDI Files n XML • DTD - http: //www. icpsr. umich. edu/DDI/users/dtd/index. html n Metadata about: • The DDI file itself • The study that collected the data • The data file • Variables within the data file • Other Material
The Minimal DDI File <? xml version="1. 0"? > <code. Book> <stdy. Dscr> <citation> <titl. Stmt> <titl>Howdy World: Valid but Useless Metadata</titl> </titl. Stmt> </citation> </stdy. Dscr> </code. Book>
Real Example: ICPSR 6084 Raw Data File 100001 161132146211115555299 99911219 200001 49992 30000102534 100002 000325222641942 12213112421111212112222221121 2122 12 200002 30000202574 3834101202 12221 000756221622052 4261103202
ICPSR 6084 – About The Study <citation> <titl. Stmt> <titl>CBS News Monthly Poll #2, August 1992</titl>
ICPSR 6084 – About The File <dimensns> <case. Qnty>1, 546</case. Qnty> <var. Qnty>70</var. Qnty> <log. Rec. L>80</log. Rec. L> <rec. Pr. Cas>3</rec. Pr. Cas>
ICPSR 6084 – Reading The File With SAS <dimensns> <case. Qnty>1, 546</case. Qnty> <var. Qnty>70</var. Qnty> <log. Rec. L>80</log. Rec. L> <rec. Pr. Cas>3</rec. Pr. Cas> <rec. Num. Tot>4, 638</rec. Num. Tot> </dimensns> infile 'C: DDRIVEdataicpsrdata6084da 6084. txt' LRECL=80 PAD;
More From ICPSR 6084 – first variable
More From ICPSR 6084 – Reading the first variable input #1 cardno 1 -1
More From ICPSR 6084 – another variable #3 respno 2 -6
The Tasks n Pull the necessary information from a hierarchical xml file into SAS as tables • Use XML libname engine with an XMLMAP file n Use that information in SAS to read the raw data file
Making the XMLMAP File – SAS XML Mapper
Drag the element that defines rows to the root of the XMLMAP structure Defining Tables –What Defines Rows
Defining Tables –Row Defined
Defining Tables – What Defines Columns Drag an element that defines a column to the root of the table
Defining Tables – Column Defined
Viewing The XMLMap File
Viewing The XMLMap File – Row Path
Viewing The XMLMap File Column Path
Viewing Sample SAS Code
Previewing the Table
XMLMapper Limitations n Not every XML file will have all the elements of any possible XML file of that type. • Use XML Schema instead of XML file n An XML Schema file may not work • XML file type defined by DTD • XML Schema too complex for XML Mapper
What then n You can use XMLMapper to start and then hand edit the XML MAP file.
Lots of Tables From DDI Mostly for Comments DATADSCR_VARGRP DATADSCR_VAR_CATGRY DATADSCR_VAR_INVALRNG_ITEM DATADSCR_VAR_INVALRNG_RANGE DATADSCR_VALRNG_ITEM DATADSCR_VALRNG_RANGE DOCDSCR_CITATION__AUTHENTY DOCDSCR_CITATION__COPYRIGHT DOCDSCR_CITATION__IDNO DOCDSCR_CITATION__OTHID DOCDSCR_CITATION__PRODDATE DOCDSCR_CITATION__PRODUCER DOCDSCR_CITATION__TITL FILEDSCR_FILETXT_RECGRP STDYDSCR_CITATION_BIBLCIT STDYDSCR_CITATION_TITLSTMT STDYDSCR_CITATION_VERSTMT STDYDSCR_CITATION__AUTHENTY STDYDSCR_CITATION__COPYRIGHT STDYDSCR_CITATION__DISTRBTR STDYDSCR_CITATION__FUNDAG STDYDSCR_CITATION__GRANTNO STDYDSCR_CITATION__PRODDATE STDYDSCR_CITATION__PRODUCER STDYDSCR_CITATION__SOFTWARE STDYDSCR_METHOD__COLLMODE STDYDSCR_METHOD__DATACOLLECTOR STDYDSCR_METHOD__FREQUENC STDYDSCR_METHOD__RESINSTRU STDYDSCR_METHOD__SAMPPROC STDYDSCR_METHOD__TIMEMETH STDYDSCR_METHOD__WEIGHT STDYDSCR_STDYINFO_ABSTRACT STDYDSCR_STDYINFO__ANLYUNIT STDYDSCR_STDYINFO__COLLDATE STDYDSCR_STDYINFO__DATAKIND STDYDSCR_STDYINFO__GEOGCOVER STDYDSCR_STDYINFO__KEYWORD STDYDSCR_STDYINFO__NATION STDYDSCR_STDYINFO__TIMEPRD STDYDSCR_STDYINFO__TOPCCLAS STDYDSCR_STDYINFO__UNIVERSE
Write a SAS Program – Metadata Comment data _null_; file reader lrecl=1024 ; length v. Edited $ 2000; set DDIfile. stdy. Dscr_citation_titl. Stmt; if _n_=1 then put '/*' / ' SAS program to read ' agency ' ' IDNo ; stdy. Dscr. Titl= compbl(tranwrd(translate(stdy. Dscr. Titl, ' ', '09'x), '*/', '*_/')); put 'Study Title' _n_ ': ' stdy. Dscr. Titl; alt. Titl=compbl(tranwrd(translate(alt. Titl, ' ', '09'x), '*/', '*_/')); put ' ' alt. Titl; /* SAS program to read ICPSR 6084 Study Title 1 : CBS News Monthly Poll #2, August 1992 August National Poll II, Republican National Convention
Cntlin file for Formats data make. The. Formats; input fmtname $ 1 -7 type $ 9 -9 start $ 11 -26 default 28 -35 / label : &$512. ; datalines; V 00006 f N 1 1 Yes V 00006 f N 3 1 Converted Refusal ; run; proc format cntlin=make. The. Formats; run;
Input n Logic to produce different input statements for: • Fixed column data • Delimited data
Output n Multiple datasets if different record types • Separate keep dataset options • Logic to output to appropriate dataset if "&file. Structure. Type" eq "hierarchical" then do; put 'if left(_Record. Set. Identifier) eq left("' cat. Valu '") then DO; ' / " output " safe. Agency +(-1) safe. IDNo +(-1) rectype '; ' / 'END; ' //; end; if left(_Record. Set. Identifier) eq left("2 ") then DO; output ICPSR 2825 FAMILY ; END;
Some of the Limitations n DDI can describe n. Cubes, geographic coverage, variable groups • Current make. Reader. sas can’t handle these n DDI definition includes recursive elements • E. g. rec. Grps within rec. Grps • Current make. Reader, sas would not find nested elements
Questions?
About the Speaker Larry Hoyle Associate Scientist Policy Research Institute, University of Kansas 1541 Lilac Lane Lawrence, KS 66044 -3177 Larry. Hoyle@ku. edu
- Mysas2
- Sas xml mapper
- C# xdocument vs xmldocument
- Active record vs data mapper
- Active record vs data mapper
- Active record vs data mapper
- Transaction script pattern
- Sas read xml file
- Dose mapper
- Lunar polar hydrogen mapper
- Pcm
- Bible mapper
- Hyper electronics mapper
- Census flows mapper
- Imagic mapper
- Spectral angle mapper
- Port mapper failure - timed out
- Mapper design pattern
- Ieee 1599
- Gcd mapper
- Data integration with xml and semantic web technologies
- Extracting data from xml
- Xml data mining
- System.collections.generics
- Dtfd switch
- Oodb and xml database
- Internal dtd
- Sgml vs html
- "web services xml"
- Asynchronous javascript and xml
- Asynchronous javascript and xml
- And xml
- And xml
- Difference between xml and xhtml