LOD reference architecture 31 10 2017 Version 0

  • Slides: 13
Download presentation
LOD reference architecture 31 -10 -2017 Version #0. 1 ESS EA team : ESTAT-EA@ec.

LOD reference architecture 31 -10 -2017 Version #0. 1 ESS EA team : ESTAT-EA@ec. europa. eu

Linked Open Data use cases 1. Use Case 1: Added service 3 rd party

Linked Open Data use cases 1. Use Case 1: Added service 3 rd party As an external party I would like to access ESS data in LOD format so that I can combine multiple data sets (any) to create a new service, visualization or information 2. Use case 2: ESS Member interconnected data research As a ESS Member I would like to integrate data with other ESS members’ data sets so that I could perform statistical research on enriched data sets 4. Use case 4: ESS Member publishing linked data sets As an ESS Member state I would like to publish interoperable linked data so that the consumers of the data sets can easily find similar data sets from other ESS Members 5. Use case 5: ESS Member linking data to another ESS Member As an ESS member I would like to link my data sets to another ESS Members data sets so that the linked data consumers can benefit from enriched information (e. g. linking European-level statistics to 3. Use case 3: Citizen users country statistics and back) As a citizen use I would like to find linked data sets through search engines (e. g. SPARQL) so that I could see information derived from linked data

Scope: Statistical Organisation delivering Smart Data/Metadata Services Smart data services provide conditioned and calculated

Scope: Statistical Organisation delivering Smart Data/Metadata Services Smart data services provide conditioned and calculated data. The data is not just discovered, collected and conditioned, as additional analytical rules and calculations are applied to derive further insight from the collected data. Current Scope Simple data services are the simplest and most common. Data brokers collect data from multiple sources and offer it in collected and conditioned form — data which would otherwise be fragmented, conflicted and sometimes unreliable. Adaptive data services apply analysis to a customer's requestspecific data combined with data in a context store. This is a more advanced form of a service. Examples are: Rage Frameworks and Anomaly 42.

ESS LOD-related/-required business capabilities (link with ESS BCM) Statistical content management LOD API management

ESS LOD-related/-required business capabilities (link with ESS BCM) Statistical content management LOD API management LOD metadata management Development and maintenance of common vocabularies and basic ontology Flexible data access provisioning Provide bookmarkable data Provide data linked to other data on the web Provide access to data schema Provide discoverable data LOD content management

Linked Open data deliver flexible data provisioning through its bookmarkability, interlinking and self-describing discoverable

Linked Open data deliver flexible data provisioning through its bookmarkability, interlinking and self-describing discoverable data mechanisms Provide bookmarkable data Linked open data sets can be bookmarked by the consumers to be used at any point in time. This is due to the fact that linked open data requires permanent URIs, which can be bookmarked or stored to be used any time in the future. Provide data linked to other data on the web Linked open data sets of ESS Members can be linked to other data on the web and vise versa. Consumers can look up every URI in an RDF graph over the Web to retrieve additional information. There is a need to apply a common, standard vocabulary to describe data that is contextually consistent. Provide access to data schema Linked data uses self-describing data mechanism, where the data schema is represented with the data itself. Provide discoverable data Usage of common vocabularies and formal ontologies make data discoverable. Semantic meaning is attached to the data so that it can be branched across domains of knowledge automatically.

Linked Open data requires a number of content-management related capabilities LOD API management LOD

Linked Open data requires a number of content-management related capabilities LOD API management LOD metadata management Development and maintenance of common vocabularies and basic ontology LOD content management This capability includes persistent URI management, versioning and serialization formats and protocols of provided data. This capability includes mapping of different data sets (e. g. disseminated multi-domain data in SDMX format) to RDF semantics (e. g. vocabularies, ontologies). Vocabularies and ontology require further development and formalization to cover all statistical domains. Only using consistently across all ESS Member vocabularies and ontologies will ensure discoverability and cross-linking of the data. This capability ensures management of LOD-related content.

Definitions: Vocabulary - A collection of terms given a well-defined meaning that is consistent

Definitions: Vocabulary - A collection of terms given a well-defined meaning that is consistent across contexts. Ontology - Allows you to define contextual relationships behind a defined vocabulary. It is the cornerstone of defining a knowledge domain. A formal syntax for defining ontologies is OWL (Web Ontology Language) which is an extension to RDFS (RDF Schema). Note: In terms of LOD there is always Ontology used together with vocabularies, where latter is used to have a formal consistent naming of things and the former for interconnecting those.

LOD reference architecture A common standard vocabulary is used to describe organisation datasets and

LOD reference architecture A common standard vocabulary is used to describe organisation datasets and to make the contextually consistent. Adopting a common base ontology and a common vocabulary, for expressing the meaning behind the data they expose, allows publishing that data on a queryable endpoint so that data sets can be linked across organisation SPARQL RDF/XML SOAP/ REST/ XML JSON API gateway RDF/XML N-Triples Conversion Turtle Cache/store Data access Data sets Data files Data repositories Data semantic mapping Vocabularies Ontology LOD Metadata Other Metadata

Standards to agree on § Ways to collaborate/working groups § API standards (e. g.

Standards to agree on § Ways to collaborate/working groups § API standards (e. g. format, URI) § Vocabularies selection and development § Ontology selection § SDMX to RDF transformation according to selected vocabularies and ontology

LOD building blocks Building block Description Data repositories Any sources of statistical data including

LOD building blocks Building block Description Data repositories Any sources of statistical data including SDMX, cvs, etc. LOD metadata Semantic mapping of the current data sets to LOD vocabularies (linking). It contains also Vocabularies descriptions and ontologies. It should include interface for putting mapping between existing data structures and linked data ontologies and vocabularies Data access An internal interface to extract data from different data repositories. This building block can be a direct interface (such as ESB, Message Queue) Converter Depending on the data source and required interface/format a proper conversion will be selected together with the proper conversion approach to transform data from an original data source to RDF using proper ontologies and vocabularies. The response is stored or cached in cache in-memory/store Cache or storage of a converted response (e. g. in-memory cache, triple store) API gateway Building block to create, publish, maintain, monitor, and secure APIs

LOD principles Principle Statement Rationale HTTP URI as names Use URIs as names that

LOD principles Principle Statement Rationale HTTP URI as names Use URIs as names that can be looked up via HTTP LOD principle Re-use before build Consider using an existing solution Re-use is one of the main principles of within ESS before building a new one ESS. It reduces costs and enables sharing best practices Multiple interfaces Data is available in multiple formats Uniform vocabularies All member states should use uniform Similar data should be represented by vocabularies identical vocabularies. Everyone is involved in vocabulary development RDF is one of the data representations, which might be required for machine-tomachine interaction. Multiple interfaces and formats will enable more consumers to consume data

LOD Standards and Architecture Governance (Roles) § Environment Management § Compliance § Policy Management

LOD Standards and Architecture Governance (Roles) § Environment Management § Compliance § Policy Management § Dispensation § Reference Architecture Quality Control § Monitoring and Reporting § Business Control § Organizational Architecture Development § Solution Development § Definition of standards and guidelines § LOD Standards Management

Template for mapping of Po. Cs on LOD architecture Possible Po. Cs Focus LOD

Template for mapping of Po. Cs on LOD architecture Possible Po. Cs Focus LOD architectue Po. C 1 – Linking official statistics within an NSI to improve data dissemination Guidelines, data and metadata integration, development of an open source tool to transform a traditional data source (CSV, database, etc. ) into linked (open) data, publication of the linked data through a SPARQL endpoint § Touches Conversion (replicated scenario) and touches also LOD metadata § Standards Po. C 2 – Publishing standardised nomenclatures as linked open metadata Publishing nomenclatures (such as NUTS) as linked open data Based on SKOS ontology § Touches also LOD metadata § Standards Po. C 3 – Linking official statistics with other data to develop value added services and apps The transformation of official statistics dissemination will be demonstrated, § Touches Conversion (replicated scenario) and so that data is published in linkable, machine-readable formats, based on touches also LOD metadata open standards such as SDMX, RDF Data Cube or CSV, and at level of § Standards granularity that matches the requirements of reusers, from single observations to full datasets.