SDMX Standards Relationships to ISOIEC 11179CMR Arofan Gregory

  • Slides: 39
Download presentation
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on

SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva 3 -5 April 2006

Major Points • • • Activity and Results Scope Areas of Commonality Applying ISO

Major Points • • • Activity and Results Scope Areas of Commonality Applying ISO 11179 to SDMX CMR and SDMX Registries Use in Applications: A Case Study

Activity • We were asked to describe the relationship between ISO 11179/CMR and SDMX

Activity • We were asked to describe the relationship between ISO 11179/CMR and SDMX • There is not complete overlap – We focused on points of contact between them • This was an interesting project, which we will continue working on

Results • ISO 11179/CMR and SDMX: – ISO 11179 provides a useful pivotal model

Results • ISO 11179/CMR and SDMX: – ISO 11179 provides a useful pivotal model for mapping between CMR (or others) and SDMX constructs – ISO 11179 does not model the semantics of a metamodel – it is useful for the semantics of the models (specific metadata structure) or even instances of the model (specific data set) – The utility of this mapping depends on the application • The SDMX and ISO 11179 registries are complementary in function (semantics + access)

Characterization • ISO 11179 models the semantics of data elements • CMR extends this

Characterization • ISO 11179 models the semantics of data elements • CMR extends this to cover survey lifecycle metadata – adds structure • SDMX models the structure of metadata and data for aggregate statistical data – you supply your own semantics (concepts) • This is not a comparison of like things: it is a connecting of complementary ones

Scope – ISO 11179 • ISO 11179 is a metadata content standard focused primarily

Scope – ISO 11179 • ISO 11179 is a metadata content standard focused primarily on the semantics of data. Secondarily, it provides the rules and structures for registering descriptions of data. • The standard exists in 6 Parts, and the 2 nd edition is now available. – – – Part 1: Framework for the specification and standardization of data elements Part 2: Classification for data elements Part 3: Registry metamodel and basic attributes Part 4: Rules and guidelines for the formulation of data definitions Part 5: Naming and identification principles for data elements Part 6: Registration of data elements • This presentation is concerned mainly with Parts 3 and 5

SCOPE: ISO 11179 Part 3 High Level View - Data Elements and Concepts Country

SCOPE: ISO 11179 Part 3 High Level View - Data Elements and Concepts Country Identifier Country Name Country Code Countries of the World ISO 3166: Country Code List

SCOPE – CMR • The CMR model is designed to support – metadata necessary

SCOPE – CMR • The CMR model is designed to support – metadata necessary to describe the survey life cycle – linkages between similar designs and processes used across surveys – use of metadata to drive systems in support of the survey life cycle • The CMR model extends the ISO 11179 (Part 3) metamodel to support CMR specific artefacts

SCOPE - CMR ISO 11179 CMR Links to appropriate ISO 11179 artefact to supply

SCOPE - CMR ISO 11179 CMR Links to appropriate ISO 11179 artefact to supply semantics and management

SCOPE - SDMX • SDMX standards comprise – Information Model – Syntax implementations based

SCOPE - SDMX • SDMX standards comprise – Information Model – Syntax implementations based on the Information model (XML schemas and UN/EDIFACT) – Content-oriented guidelines • Statistical subject matter domain scheme • Cross Domain metadata concepts • Metadata vocabulary (MCV – Metadata Common Vocabulary)

Information Meta Model Aggregated data, metadata, structural metadata Design Meta Schema for aggregated data

Information Meta Model Aggregated data, metadata, structural metadata Design Meta Schema for aggregated data and metadata Generate structure specific schema Create data/metadata structure instance (data/metadata structure definition) Report/publish data/metadata Register SDMX Registry Use cross domain metadata concepts and code lists

Areas of Commonality ISO 11179 and SDMX • Both standards support the definition of

Areas of Commonality ISO 11179 and SDMX • Both standards support the definition of – Concepts – Use of concepts in structures – Representations/Value domains • Both standards support the concept of a registry – ISO 11179 specifies a registry metamodel – SDMX does not specify a registry metamodel – ISO 11179 does not specify registry interfaces based on the registry model – SDMX specifies registry interfaces based on the SDMX model • Implementation of a registry and mapping to a registry model is left to the implementor • Can use ISO 11179 registry implementation, eb. XML registry implementation, or bespoke registry implementation

Concepts – ISO 11179

Concepts – ISO 11179

Concepts - SDMX Data Element Conceptual Domain (for SDMX this defines the core/default representation:

Concepts - SDMX Data Element Conceptual Domain (for SDMX this defines the core/default representation: implicit here is whether it is enumerated or non enumerated) [not given explicit name in SDMX] ISO-11179 equivalent

Data Element: ISO 11179

Data Element: ISO 11179

Concept Usage - SDMX Concept Usage = Data Element [can be given explicit name

Concept Usage - SDMX Concept Usage = Data Element [can be given explicit name in SDMX (e. g. ISO 11179 equivalent name), but this is not required and is not used for identification purposes] Value Domain

Concept Usage - SDMX • There is no explicit “Data Element” name for usage

Concept Usage - SDMX • There is no explicit “Data Element” name for usage of a Concept (e. g. FREQUENCY Concept used for a Dimension) – Concept names are used within the context of the role of the Concept (Dimension, Attribute, Measure) – these are structure components – The unique identifier for a structure component comprises: <data structure definition. structure component list. concept> e. g. BOP. KEYDESCRIPTOR. FREQUENCY [actual identifier includes maintenance agencies] • This is useful if one wishes to reference the component in, say, a registry scenario, but it is not normally retained in processing systems – For instance in the data structure definition the identifier of the Concept is used, but the full identifier can be constructed from its context in the structure – However the SDMX-ML specification does allow the full identifier [URN] to be specified as well • ISO 11179 equivalent name could be BOP_KEYDESCRIPTOR. FREQUENCY. CODE

SDMX: Part of Key Family

SDMX: Part of Key Family

CMR and ISO 11179 • CMR is a model that supports the survey lifecycle

CMR and ISO 11179 • CMR is a model that supports the survey lifecycle • Element and Concept semantics are provided by an association to ISO 11179 e. g • Question is associated to Data Element Concept • Response Domain is associated to Value Domain • Data Set component (element) is associated to Data Element • Data set contains the output from a survey • All data elements are explicitly named • ISO 11179 is the pivot between CMR and SDMX • We need to apply ISO 11179 to SDMX

Applying ISO 11179 to SDMX • ISO 11179 gives you: [Object_Class]. [Property]. [Representation] •

Applying ISO 11179 to SDMX • ISO 11179 gives you: [Object_Class]. [Property]. [Representation] • An ISO 11179 instance would have: [Object_ID]. [Property_Term]. [Representation_Term] • We have these constructs in SDMX data and metadata sets

SDMX Objects • For data, we have a limited set of object classes: –

SDMX Objects • For data, we have a limited set of object classes: – – Data Sets Groups Series Observations • For reference metadata, we have a larger set of object classes: – – Data Providers Category Schemes Data and Metadata Flows Etc. • Note that SDMX instance IDs are compound – For data, a “key” – For metadata, a “target identifier”

SDMX Properties and Representations • In SDMX, properties are taken from Concepts • The

SDMX Properties and Representations • In SDMX, properties are taken from Concepts • The names and definitions are supplied by the user for most properties – Some are required by the model • Representations are equivalent to those in ISO 11179 (code, text, etc. )

Object (from the SDMX model) Property (from the Concept that is the Data Attribute

Object (from the SDMX model) Property (from the Concept that is the Data Attribute or Measure in the Data Structure Definition) Representation Term (derived from the Representation of the Data Attribute or Measure in the Data Structure Definition) [Observation_ID]. Confidentiality. code [Series_ID]. Availability. code [Group_ID]. Title. text [Observation_ID]. Value. number [Data Set_ID]. Title. text

SDMX Object IDs • Compound object IDs in SDMX are either: – Data Keys

SDMX Object IDs • Compound object IDs in SDMX are either: – Data Keys (for Group, Series, or Observation), which depend on the data structure definition – Metadata Target Identifiers (for data provider, metadata flow, categorization scheme, etc. ), which depend on the metadata structure definition • Data Examples – Observation Key: “Annual – Total Population – Kenya – 1994” – Series Key: “Annual – Total Population – Kenya” – Group Key: “Annual – Total Population” (this is the group of all countries)

Examples, cont. • For the concept of “Availability” for a series, in ISO 11179

Examples, cont. • For the concept of “Availability” for a series, in ISO 11179 we would have: Annual – Total Population – Kenya. Availability. Code • Note that the SDMX distinction between “dimensions” and “attributes” is not important to ISO 11179 – the semantic models use the same basic approach for both – You only model the semantics of discrete, valuecontaining constructs: attributes and observation values

Reference Metadata Object IDs in SDMX • All of the objects in the SDMX

Reference Metadata Object IDs in SDMX • All of the objects in the SDMX Information Model can be Object Classes for ISO 11179 • They can be combined to identify new object classes • The IDs for instances of objects are composite “target identifiers”

Example of Some SDMX Object Classes Category Scheme Structure Definition Data Set or Metadata

Example of Some SDMX Object Classes Category Scheme Structure Definition Data Set or Metadata Set publishes/ reports data sets or metadata sets Data Provider uses specific data/metadata structure conforms to business rules of the data/metadata flow can provide data/metadata for many data/metadata flows using agreed data/metadata structure Data or Metadata Flow can be linked to categories in multiple category schemes can get data/metadata from multiple data/metadata providers Provision Agreement comprises subject or reporting categories Category can have child categories

Reference Metadata Example • A data provider is identified with an ID which references

Reference Metadata Example • A data provider is identified with an ID which references the organization scheme (and agency) the data provider ID code comes from: [Org. Scheme. Agency. ID]_[Org. Scheme. ID]_[Data_Provider. ID] • Example: sdmx_sdmx. Provider. Scheme 1_IMF shows the ID for a data provider from agency scheme 1 maintained by SDMX, identifying the IMF as a data provider

Reference Metadata Example, cont. • To model a “Contact_Name” concept in an SDMX metadata

Reference Metadata Example, cont. • To model a “Contact_Name” concept in an SDMX metadata report for a specific data provider, we have in ISO 11179 sdmx_sdmx. Provider. Scheme 1_IMF. Contact_Name. Text A more useful approach is possible: Data_Provider. Contact_Name. Text This is not possible for data objects in SDMX.

Registry Model: ISO 11179 • But no API specification for implementing the model

Registry Model: ISO 11179 • But no API specification for implementing the model

Registry Metamodel: SDMX • SDMX does not have an abstract registry metamodel • SDMX

Registry Metamodel: SDMX • SDMX does not have an abstract registry metamodel • SDMX specifies registry interfaces based on the SDMX Information Model • Compliance with SDMX registry standards is based on implementation of the SDMX registry APIs • The actual registry implementation can be whatever the registry service provider chooses – The JEDH pilot project is implemented using an eb. XML registry (this has a model which is very similar to the ISO 11179 model) – An SDMX compliant registry could use an ISO 11179 registry (if such an implementation exists)

Functionality • An ISO 11179 registry is used to provide a semantic registry of

Functionality • An ISO 11179 registry is used to provide a semantic registry of specific data elements – The focus is on understanding the meaning of data and metadata • An SDMX registry provides mechanistic services to facilitate the exchange of data and metadata sets – The focus is on access to data and metadata • These are complementary functions

Using the Mapping • CMR is one example of a model which could be

Using the Mapping • CMR is one example of a model which could be mapped against SDMX – SDMX is about aggregate data exchange – CMR is about survey lifecycle metadata – These are potentially related, but are not the same thing • To be useful in an application, we care about the semantic equivalencies of data and metadata – These equivalencies allow us to re-use and map data between models and systems – ISO 11179 acts as a pivot

ISO 11179 SDMX Semantics Structures and rules are proprietary to individual organisations Data and

ISO 11179 SDMX Semantics Structures and rules are proprietary to individual organisations Data and Metadata Structures CMR defines a conceptual model for data and metadata structures for the survey lifecycle, using the ISO 11179 model for structuring semantics Describe SDMX semantics based on SDMX structures Organisations map between their data and metadata structures and SDMX data and metadata structures using ISO 11179 semantics Structures and rules prescribed by SDMX standards Data and Metadata Structures SDMX defines a conceptual model for data and metadata structures for the aggregated data and reference metadata and provides a canonical syntax representation

Another Case Study • One use case – CMR – is not enough to

Another Case Study • One use case – CMR – is not enough to validate our work • At the ISO TC-154 meeting in Vancouver we met with the designers of ISO-15000 part 5, the “Core Components Technical Specification” (CCTS) – CCTS is a modelling methodology for e-commerce, standardized in ISO TC 154 – It is based on ISO-11179 • We were asked: “Could SDMX data be modelled according to CCTS? ”

Why Bother? • Statistics and e-commerce do have connection points: – Customs and international

Why Bother? • Statistics and e-commerce do have connection points: – Customs and international trade – Payments transactions and business reporting • Some systems today map between business transactions and statistical reports • Thus, mapping between CCTS (e-commerce transactions) and SDMX (statistics) is potentially a real-world use case.

The Answer is “Yes” • Based on our SDMX-to-ISO 11179 mapping: – We could

The Answer is “Yes” • Based on our SDMX-to-ISO 11179 mapping: – We could express our semantics – Compare them to the e-commerce semantics – Determine the relationship between specific data elements • This was a quick, straightforward validation of our work

Conclusions • ISO 11179 is very useful as a pivotal semantic model for working

Conclusions • ISO 11179 is very useful as a pivotal semantic model for working with other models (CMR, CCTS, etc. ) • SDMX is mapped at the model level, not the metamodel level – Semantics are introduced at this level • The SDMX and ISO 11179 registries are complementary in function (semantics + access) • The structural mapping of any model depends on the ISO 11179 implementation – ISO 11179 only does semantics, not structure – Applications need both structural and semantic mapping