Dissemination of Statistical Data Publications and Metadata Process
Dissemination of Statistical Data, Publications and Metadata - Process Based on Common Structure of Statistical Information (Co. SSI) Harri Lehtinen (harri. lehtinen@stat. fi)
Co. SSI: (Common Structure of Statistical Information) The point of departure in the Co. SSI was an (infological) analysis of the information being considered. n The conclusion from the analysis was that although in practice the definition of statistical information has varied according to a given situation and application, in reality statistical information has a certain simplifiable and acceptable universal structure. n The Co. SSI describes the general structure that is not dependent on the situation of the statistical information presented in differing formats. n => Co. SSI defines the structures of statistical data, metadata and publications. Harri Lehtinen 25. 5. 2007 2
XML based dissemination - Co. SSI n Modules: l Document metadata l Statistical metadata l Processing metadata l Publications l DATA: l Matrices (XDF) l Tables (CALS) l Sparse matrix (KEYS) Co. SSI – (www. stat. fi/cossi) Harri Lehtinen 25. 5. 2007 3
Implementation Modular DTD system l Document Type Definitions l Use of standards l CALS, XDF, Dublin-Core. . . l Statistical matrix (statinfo_xdf. dtd): statmeta. dtd, docmeta. dtd, xdf. dtd l Statistical table (statinfo_cals. dtd): statmeta. dtd, docmeta. dtd, cals. dtd l Publications and documents (publication. dtd): docmeta. dtd, statinfo_cals. dtd, figure. dtd. . . XML l One XML-file -> data and metadata l Multi-lingual documents n Harri Lehtinen 25. 5. 2007 4
Metadata Statistical metadata l Information vital for the interpretation of numerical statistical information n Document metadata Information about: l The producer of document l Document’s content n Processing metadata l Information for a software to process data n Harri Lehtinen 25. 5. 2007 5
Statistical metadata Content model of statistical metadata Document metadata Statistical metadata Variable name Concept definition Operational definition Description Calculation formula Measurement unit ID Type Classification Author Date Values Figure Harri Lehtinen 25. 5. 2007 6
Document metadata Creator Content model of document metadata Person Subject Keywords Content description Publisher Organisation Contributor Person Date Published, modified Type Format Language Main and other language Document information Identifier SVT and Category URN, URL, ISBN, ISSN, DOI, Number Rights Coverage Relations Source Harri Lehtinen 25. 5. 2007 7
Content model of statistical data matrix Statistical data Title Document metadata Statistical metadata Processing metadata Statistical data matrix Variable XDF Class values Statistical units Footnotes Statistical unit Variables. . ai. . an Harri Lehtinen x 1 x 2 … xj … xp x 11 x 21. . x i 1. . x n 1 x 12 x 22. . x i 2. . x n 2 … … x 1 j x 2 j. . x ij. . x nj … … x 1 p x 2 p. . x ip. . x np … … 25. 5. 2007 8
Statistical table Statistical metadata Table title Content model of statistical table Document metadata Processing metadata Statistical table CALS Column headings Row headings Numerical data Table footnotes Harri Lehtinen 25. 5. 2007 9
Document metadata Documents and publications Document main title Ingress Introduction Abstract Headnote Product specification Title Chapters Sections Title Paragraphs Summary Footnotes Bibliography Appendix Harri Lehtinen Definition lists 25. 5. 2007 10
Paragraph List (unordered / ordered) Statistical table Figure Link Footnote reference Bibliographical reference Emphasis Harri Lehtinen 25. 5. 2007 11
Implementation to the PC-Axis Need for the XML format for the PC-Axis n Co. SSI-matrix-format is close to the PC-Axis data format and supports also multi-lingual data n Processing metadata for the PC-Axis (pxmeta) n Mapping of PC-Axis metadata to the Co. SSI-model statistical, document and processing metadata n Three data formats l Matrix (XDF) l Table (CALS) l Keys (PC-Axis) => but the same metadata for all formats! n Allows more metadata than the original PC-Axis format n Automatical conversion between data formats n Harri Lehtinen 25. 5. 2007 12
Co. SSI for the PC-Axis Matrix l Docmeta l Procmeta l Statmeta l Data -> XDF n Table l Docmeta l Procmeta l Data -> CALS l Statmeta n Keys l Docmeta l Procmeta l Statmeta l Data -> Keys n Data part is in different formats but everything else stays the same Information is the same in all formats! Harri Lehtinen 25. 5. 2007 13
/ Dissemination process –Office 97. PX. PX PC-Axistables Statistical application PX-Edit manual or batch processing - checking - edit metadata PX-Edit or PC-Axis Automatical publishing -Timer controlled manual or batch processing - exclusion - save as: Excel or txt www. stat. fi Super. Star to PX Publication production SAS to PX (Monthly & quarterly publ, publication tables. . . ) PX-templates PX-Web Web-site PX-Edit Metadata: - statistical metadata - classifications - processing metadata Database services Publication editor Word, Excel, . . . Fast. Web -Timer controlled Fast. Web: XLS - Conversion to XHTML Conversion to PDF Harri Lehtinen HTML PDF 25. 5. 2007 14
What we need: More and better metadata n Validation n Language versions n All information in a single file n Archiving n Automatical conversion to different dissemination channels n Structured searches n SVG n Vendor free solution n To add new dissemination channels n Harri Lehtinen 25. 5. 2007 15
/ XML based dissemination process – XML and PC-Axis PX-Web: PC-Axis tables . PX. PX Fast. Web-XML PX-Edit -> PX&Co. SSI Conversion Super. Star -> PX&Co. SSI Metadata: e. Xist, XMLdatabase Database services PX-Web Statistical application SAS -> PX&Co. SSI Publishing and preview Publication editor Arbortext Monthly & quarterly publ, publication tables. . . ) Dissemination database e. Xist, XMLdatabase - statistical metadata - classifications - processing metadata HTML PDF RSS, SDMX Web-site www. stat. fi Printing house PDF Harri Lehtinen 25. 5. 2007 16
/ XML based dissemination process – integration completed Fast. Web-XML Statistical application PX-Edit -> PX&Co. SSI PX-Web: . xml matrices (PXML) . xml Metadata: e. Xist, XMLdatabase Publication editor Arbortext Monthly & quarterly publ, publication tables. . . ) Database services PX-Web Conversion Super. Star -> PX&Co. SSI SAS -> PX&Co. SSI Publishing and preview Dissemination database e. Xist, XMLdatabase - statistical metadata - classifications - processing metadata HTML PDF RSS, SDMX Web-site www. stat. fi Printing house PDF Harri Lehtinen 25. 5. 2007 17
XML Database and Statistical Information Harri Lehtinen 25. 5. 2007 18
e. Xist XML database Statistical metadata Statistics Statistical publications Statistical tables Harri Lehtinen 25. 5. 2007 19
Statistical publication in the Arbortext editor Harri Lehtinen 25. 5. 2007 20
Statistical metadata for a variable in a table Statistical metadata for a variable ”Disposable income” Harri Lehtinen 25. 5. 2007 21
HTML output of a statistical publication with statistical metadata Link to the statistical metadata Harri Lehtinen 25. 5. 2007 22
User interface for publishing and preview Harri Lehtinen 25. 5. 2007 23
- Slides: 23