Developing Statistical Information Systems and XML Information Technologies

  • Slides: 31
Download presentation
Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions Heikki

Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions Heikki Rouhuvirta, Statistical Methodology R&D heikki. rouhuvirta@stat. fi Geneva, 8 -10 May 2007

Approaches to Statistics Production Sources to statistics – Data Processing n Sources to statistics

Approaches to Statistics Production Sources to statistics – Data Processing n Sources to statistics – Statistical Methodology n Statistics as Information n Heikki Rouhuvirta 01. 04. 2007 2

IT in Statistics Production registers Inquiries other statistical data Compilation / combining of data

IT in Statistics Production registers Inquiries other statistical data Compilation / combining of data logical verifications processing into statistical concepts tilasto. Datum Dirty data aineisto Imputation etc. quality control and approval of data for the purpose of statistics compilation protection of unit-level data reporting analyses release reporting further processing release Heikki Rouhuvirta 01. 04. 2007 3

Methodological processing of statistical data In statistics production Heikki Rouhuvirta 01. 04. 2007 4

Methodological processing of statistical data In statistics production Heikki Rouhuvirta 01. 04. 2007 4

Statistical Information Heikki Rouhuvirta 01. 04. 2007 5

Statistical Information Heikki Rouhuvirta 01. 04. 2007 5

Challenge: create solutions that unite the foregoing point of views n the solutions offer

Challenge: create solutions that unite the foregoing point of views n the solutions offer the services that statistic production needs n the solutions are easy recognizable by a user and n offer an adequate informative basis for each individual task n by solutions the entity of tasks is manageable for the statistician n Key for Solution: n exploitation of XML Technology Heikki Rouhuvirta 01. 04. 2007 6

Basic of XML Spesification for Statistical Information Common Structure of Statistical Information (Co. SSI)

Basic of XML Spesification for Statistical Information Common Structure of Statistical Information (Co. SSI) Heikki Rouhuvirta 01. 04. 2007 7

… the result from a statistics standpoint … Heikki Rouhuvirta 01. 04. 2007 8

… the result from a statistics standpoint … Heikki Rouhuvirta 01. 04. 2007 8

Statistics Production and Statistical Information Stages of Processing 0. Defining 1. 2. 3. 4.

Statistics Production and Statistical Information Stages of Processing 0. Defining 1. 2. 3. 4. Collecting Editing Producing public statistics Using condensed format table and description interpreting basic format datamatrix and description condensing Model of Data Organisation descriptions in different documents matrix model including statmeta table model including statmeta statistical metadata model matrix module table module statmeta module Heikki Rouhuvirta 01. 04. 2007 9

… case studies of XML in statistics production … Heikki Rouhuvirta 01. 04. 2007

… case studies of XML in statistics production … Heikki Rouhuvirta 01. 04. 2007 10

XML Database and Statistical Information Heikki Rouhuvirta 01. 04. 2007 11

XML Database and Statistical Information Heikki Rouhuvirta 01. 04. 2007 11

Retrieval of Statistical Metadata for a Variable - Simple User Interface Heikki Rouhuvirta 01.

Retrieval of Statistical Metadata for a Variable - Simple User Interface Heikki Rouhuvirta 01. 04. 2007 12

Turn over the Documents in XML Database Heikki Rouhuvirta 01. 04. 2007 13

Turn over the Documents in XML Database Heikki Rouhuvirta 01. 04. 2007 13

Saving Documents to XML Database Heikki Rouhuvirta 01. 04. 2007 14

Saving Documents to XML Database Heikki Rouhuvirta 01. 04. 2007 14

Event log of XML Database /db /system admin dba /config admin dba users. xml

Event log of XML Database /db /system admin dba /config admin dba users. xml admin dba rwurwu--- /Tilastot admin dba /logs admin dba rwurwur-- contents. xml /db/logs/contents. xml . . . <event timestamp="2007 -03 -02 T 10: 57: 47. 941+02: 00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu 4. xml</path> </event> <event timestamp="2007 -03 -02 T 10: 57: 48. 235+02: 00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu 4_001. gif</path> </event> <event timestamp="2007 -03 -02 T 10: 57: 48. 898+02: 00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu 4_002. gif</path> </event> <event timestamp="2007 -03 -02 T 10: 57: 49. 89+02: 00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu 4_002. png</path> </event> <event timestamp="2007 -03 -02 T 10: 58: 35. 741+02: 00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu 4_eq_00. gif</path> </event> <event timestamp="2007 -03 -02 T 11: 26: 28. 432+02: 00"> <type>UPDATE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu 1. xml</path> </events> Heikki Rouhuvirta 01. 04. 2007 15

Tabulation Application Architecture in SAS Heikki Rouhuvirta 01. 04. 2007 16

Tabulation Application Architecture in SAS Heikki Rouhuvirta 01. 04. 2007 16

Tabulation Wizard User Interface in SAS EG Heikki Rouhuvirta 01. 04. 2007 17

Tabulation Wizard User Interface in SAS EG Heikki Rouhuvirta 01. 04. 2007 17

SAS Data Editing Process Heikki Rouhuvirta 01. 04. 2007 18

SAS Data Editing Process Heikki Rouhuvirta 01. 04. 2007 18

Logical schema of an XML file Statistical data Heikki Rouhuvirta 01. 04. 2007 19

Logical schema of an XML file Statistical data Heikki Rouhuvirta 01. 04. 2007 19

Archiving and Backuping to XML Heikki Rouhuvirta 01. 04. 2007 20

Archiving and Backuping to XML Heikki Rouhuvirta 01. 04. 2007 20

Example of Xquery/SQL Heikki Rouhuvirta 01. 04. 2007 21

Example of Xquery/SQL Heikki Rouhuvirta 01. 04. 2007 21

Content of XML file Heikki Rouhuvirta 01. 04. 2007 22

Content of XML file Heikki Rouhuvirta 01. 04. 2007 22

Production and Dissemination of Tables in Publishing Process Heikki Rouhuvirta 01. 04. 2007 23

Production and Dissemination of Tables in Publishing Process Heikki Rouhuvirta 01. 04. 2007 23

XML Publication Editor - User Interface Heikki Rouhuvirta 01. 04. 2007 24

XML Publication Editor - User Interface Heikki Rouhuvirta 01. 04. 2007 24

Retrieval of Statsitical Information Heikki Rouhuvirta 01. 04. 2007 25

Retrieval of Statsitical Information Heikki Rouhuvirta 01. 04. 2007 25

… and statistical information in tables Heikki Rouhuvirta 01. 04. 2007 26

… and statistical information in tables Heikki Rouhuvirta 01. 04. 2007 26

Table 1. Statistical Metadata informative statistical table (I) Statistical metadata: title, subtitle, footnote, metadata

Table 1. Statistical Metadata informative statistical table (I) Statistical metadata: title, subtitle, footnote, metadata Variable 3 declaration) reference (quality Variable 2 Variable 1 Document metadata elements: subject, keywords, content description, date, identifier Class value 1 Class value 2 Statistical figure 1 Statistical figure 2 Statistical metadata elements: -name, specification, concept definition description, operational definition description, calculation name, calculation formula, calculation. Statistical description, figure 6 measurement unit, measurement Statistical figure 5 description Statistical metadata elements: -note Statistical figure 3 Statistical figure 4 Statistical metadata elements: -code, name, description Statistical. Register figure 7 metadata elements: Statistical figure 8 Document metadata elements: -classification id, type, author, date Heikki Rouhuvirta name, concept definition, formation intsruction, law, interpretation of law, lawcases, etc. 01. 04. 2007 27

Table 1. Statistical Metadata informative statistical table (II) Variable 2 Variable 3 Variable 1

Table 1. Statistical Metadata informative statistical table (II) Variable 2 Variable 3 Variable 1 Quality declaration Quality Indicators: Coefficient of Variation Value=0. 92 Statistical figure 6 Class value 1 Statistical figure 2 Statistical figure 5 Quality Indicators: Class value 2 Coefficient of Variation Value=0. 87 Statistical figure 3 Statistical figure 4 Statistical figure 7 Heikki Rouhuvirta Statistical figure 8 01. 04. 2007 28

Table 1. Statistical Metadata informative statistical table (III) Variable 2 Variable 3 Variable 1

Table 1. Statistical Metadata informative statistical table (III) Variable 2 Variable 3 Variable 1 Quality declaration Quality Indicators: Coefficient of Variation Value=0. 92 Statistical figure 6 Class value 1 Statistical figure 2 Statistical figure 5 Quality Indicators: Class value 2 Coefficient of Variation Value=0. 87 Statistical figure 3 Statistical figure 4 Statistical figure 7 Heikki Rouhuvirta Statistical figure 8 01. 04. 2007 29

Conclusions XML Based Service Environment in Statistics Production n The statistics production solution briefly

Conclusions XML Based Service Environment in Statistics Production n The statistics production solution briefly described above gives indications of the kinds of services that could be produced from a statistical information system in future, both for statisticians and the users of statistical data. The foundation (for statistics production) is an XML-based information architecture and standard applications exploiting it. Basing the implementation of the information architecture on XML allows utilisation of standard and standard-like specifications, but the special characteristics of statistical information should be taken into consideration in their application and implementation. If, for instance, the possibilities of a semantic structural specification are not exploited in the structural analysis and the final structure of statistical data, from the point of information management the solutions become complicated, on the one hand, and ineffective in practice, on the other. From the perspective of application development, it seems especially important that the information architecture itself does not contain application-specific data specifications, because we are unlikely to see a situation where we would have just one monolithic application for both statistics production and information service provision. A semantically relevant structure helps the statistician and the user of statistics to control the correctness of contents. Heikki Rouhuvirta 01. 04. 2007 30

Thank you for your attention! Heikki Rouhuvirta 01. 04. 2007 31

Thank you for your attention! Heikki Rouhuvirta 01. 04. 2007 31