Linking Open Data in the European Statistical System

  • Slides: 43
Download presentation
Linking Open Data in the European Statistical System Eoin Mac. Cuirc, Central Statistics Office,

Linking Open Data in the European Statistical System Eoin Mac. Cuirc, Central Statistics Office, Ireland CPS 03, Linked Open Data, NTTS, Brussels 12 March 2019

Ultimately we must digitally transform so we can remove the “fog of war, ”

Ultimately we must digitally transform so we can remove the “fog of war, ” and have clear visibility and insights into our businesses and the needs of our customers. The end goal of digital transformation, however, is the ability to rapidly act and react to changing data, competitive conditions and strategies fast enough to succeed. Knowledge is nothing, if not tied to action. In a recent survey of 500 managers, they reported the number one mistake companies are making in digital transformation is moving too slow. They may have all the necessary information and strategies, but if they are incapable of acting or reacting fast enough to matter, then it is wasted. Digital transformation includes the information logistics systems capable of collecting, analyzing and reporting data fast enough to be useful, plus the ability to act and react in response. Kevin Benedict – The End Goal of Digital Transformation

Top 15 Global Brand Ranking (2000 -2018) https: //www. youtube. com/watch? v=BQov. QUga 0

Top 15 Global Brand Ranking (2000 -2018) https: //www. youtube. com/watch? v=BQov. QUga 0 VE

Overview • Linking open data the terrain • ESSnet Linked Open Statistical Data (LOSD)

Overview • Linking open data the terrain • ESSnet Linked Open Statistical Data (LOSD) • ESS LOSD - The proposed solution • LOSD challenges and opportunities in ESS

Linked Open Data – the terrain

Linked Open Data – the terrain

M&M’s Metaphor

M&M’s Metaphor

M&Ms – linking data, greater insight

M&Ms – linking data, greater insight

M&M’s linked to Gala apples

M&M’s linked to Gala apples

What is Open Data? Open data is data that can be freely used, re-used

What is Open Data? Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike. Open Data Handbook

Brussels 390, 000 hits

Brussels 390, 000 hits

Linking open data – seeing stars ★ make your stuff available on the web

Linking open data – seeing stars ★ make your stuff available on the web (whatever format) ★★ make it available as structured data (e. g. excel instead of image scan of a table) ★★★ non-proprietary format (e. g. csv instead of excel) ★★★★ point at use URLs to identify things, so that people can your stuff ★★★★★ link your data to other people’s data to provide context

ESSnet Linked Open Statistical Data (LOSD)

ESSnet Linked Open Statistical Data (LOSD)

Background to ESSnet LOSD • ESS (European Statistical System) Network (Grant agreement no. 11102.

Background to ESSnet LOSD • ESS (European Statistical System) Network (Grant agreement no. 11102. 2017. 001 -2017. 661 - € 619, 696. 32) • DIGICOM (DIGItal COMmunication) Project, Work Package 3, Open Data Dissemination, ESS Vision 2020 DIGICOM key areas of ESS Vision 2020: Focus on users, Improve dissemination and communication Work Package 3: Facilitate automated access to aggregate data for heavy users/re-disseminators, access to microdata • Four NSIs Bulgaria (Project Coordinator), France, Ireland, Italy collaborating on an ESSnet on Linked Open Statistical Data (LOSD) • Subcontracting work to industry and academic experts in France and Ireland (Consortium of Derilinx, Adapt Centre (TCD), Insight Institute (NUIG) in Ireland) • Objectives of ESSnet • • To explore NSIs publishing statistical data as LOSD on the semantic web • To allow citizens to engage with statistical data easily as linked open data • To prepare NSIs in the wider ESS to publish LOSD • To recommend a way forward for the ESS to adopt and exploit LOSD Launched November 2017, initial LOSD published April and October 2018, final deliverables in April 2019

Outline of ESSnet LOSD Work Packages • WP 0 - Project Coordination, Bulgaria NSI

Outline of ESSnet LOSD Work Packages • WP 0 - Project Coordination, Bulgaria NSI • WP 1 – LOSD Pilots, user assessments and recommendations on horizontal topics, CSO Ireland • • Set up LOSD platform • LOSD pilot, publishing LOSD for a limited number of datasets (Census, EU SILC, Labour Force Survey, spatial and local government data) locally, nationally & internationally • Use cases for LOSD and user assessment of the LOSD solutions provided • Recommendations for the future of LOSD in ESS and lessons learned WP 2 - ESS Networking, Cooperation and Capacity Building, Insee France • Dissemination of outputs from the ESSnet • Set up a cooperation and collaboration platform for LOSD • Organise capacity building materials and webinars

Technical Consortium • Derilinx • Adapt Centre, Trinity College Dublin • Insight Centre for

Technical Consortium • Derilinx • Adapt Centre, Trinity College Dublin • Insight Centre for Data Analytics, National University of Ireland Galway

Use Cases

Use Cases

ESSnet approach – agile, MVP • Agile - initially linking one Census dataset then

ESSnet approach – agile, MVP • Agile - initially linking one Census dataset then expanding further • Link data locally, nationally and internationally incorporating non statistical datasets e. g. spatial, local government • Each NSI hosts their own single source of RDF data • NSI links RDF datasets to the European Data Portal (EDP) through their national data portals • Data analyst accesses LOSD through a cloud based tool pulling from EDP (CKAN enabled) • Access to LOSD Open Cube toolkit (openly licensed LOSD, open source tools, freely available, cloud based) • NSI data analysts can access and create cross domain products and services e. g. linking statistical, spatial and local government datasets for an easy to use citizen friendly view • Can publish RDF ‘on the fly’ from NSOs single source of data (ESSnet focus on CSV and JSON-stat) • Capacity building in ESS once proven model is created

ESS LOSD - The proposed solution

ESS LOSD - The proposed solution

LOSD Publication Pipeline

LOSD Publication Pipeline

LOSD Platform Architecture

LOSD Platform Architecture

LOSD Publication Pipeline tools • CKAN serves as a data catalog for datasets and

LOSD Publication Pipeline tools • CKAN serves as a data catalog for datasets and resources used in the LOSD project • Juma is a CSV to RDF mapping and conversion tool build on top of R 2 RML technology - which is a standard for the conversion of tabular data to RDF • Virtuoso has been added to the pipeline to make data exposed via a SPARQL endpoint. • Cube Visualizer is the cube visualisation tool that serves as a visual analytics tool • OLAP Browser enables performing OLAP (Online Analytical Processing) operations (e. g. pivot, drill-down, and roll-up) on top of multiple linked data cubes

Key Points of Access to ESSnet • CROS Portal https: //ec. europa. eu/eurostat/cros/content/essnetdocuments_en •

Key Points of Access to ESSnet • CROS Portal https: //ec. europa. eu/eurostat/cros/content/essnetdocuments_en • Linked Open Statistical Data Hub https: //losddata. staging. derilinx. com • Learning Platform http: //losd. staging. derilinx. com/moodle/ • Google Drive ESSnet CSO-LD Deliverables • Slack Workspace LOS Essnet

Linked Open Statistical Data Hub

Linked Open Statistical Data Hub

LOSD challenges and opportunities in ESS

LOSD challenges and opportunities in ESS

ESS Challenges 1. Data/Metadata alignment 2. Hosting RDF data and LOSD cloud service

ESS Challenges 1. Data/Metadata alignment 2. Hosting RDF data and LOSD cloud service

Population of Ireland • • • HC 55 defines Population Usually Resident and Present

Population of Ireland • • • HC 55 defines Population Usually Resident and Present in the State 2011 to 2016 by Single Year of Age, Sex, Nationality and Census Year. The Irish HC 55 dataset is published multi-annually and is available in CSO’s Stat. Bank database This usual residence criteria includes all persons who are usually resident in the State who were also present in the State on the night of the Census, April 24 th 2016. It excludes those who were present in the State on Census Night but were usually resident elsewhere as well as those usually resident in the state but temporarily absent abroad on Census Night.

The key population figures produced by CSO from Census 2016 were as follows: Table

The key population figures produced by CSO from Census 2016 were as follows: Table 1: Irish Census Population Estimates Population A) and Population B) are included in the Usually Census summary resident and publication. present on Population C) is used Census Night for CSO’s population estimates A) Census De-facto Population = All persons present in the State on Census Night irrespective of usual residence 4, 689, 921 Visitors on Census night Residents temporarily absent abroad on Census night Total 71, 944 X 4, 761, 865 B) Census Usually Resident and Present Population = All 4, 689, 921 Usual residents present on Census night X X 4, 689, 921 C) Usually resident population = All Usual residents present on Census night plus usually resident persons who are absent abroad on Census night from households where other persons were present 4, 689, 921 X 49, 676 4, 739, 597 Totals 4, 689, 921 71, 944 49, 676

Population of Ireland • CSO is working towards producing a usual resident count from

Population of Ireland • CSO is working towards producing a usual resident count from Census 2021 that will include all of the elements indicated in Population C above, but will also include usual residents who were absent abroad on Census night from households where no persons were present. • The obvious challenge with census data is the definition of the census information as defined by Eurostat and the definitions put in place.

Table 2: Options for Census Data definition Option • The Irish pilot should convert

Table 2: Options for Census Data definition Option • The Irish pilot should convert its census data to be compliant to HC 55 definitions • Pros the input data will be compliant to HC 55 • definitions only one RDF mapping will be required for the conversion part • We propose two different vocabularies to describe census data • no need to do any data pre-processing Ignore the difference in the • definitions and proceed as if de facto census • population is similar to usual residence population no need to do any data pre-processing querying the different NSOs’ SPARQL endpoints will give results • Cons the data is not currently available and might need some time to get it ready querying the different NSOs’ SPARQL endpoints will not give results for the Irish pilot (if the query uses the vocabulary compliant to HC 55 definitions) and viceversa the results are not properly matching a standard definition consequently linking datasets across countries might derive wrong interpretations

This is a data/metadata challenge not a technical challenge • This is a data/metadata

This is a data/metadata challenge not a technical challenge • This is a data/metadata challenge not a technical challenge: • Challenges of definition e. g. population definition • Challenges of classification e. g. LFS age ranges • Challenges of data type e. g. unadjusted/seasonally adjusted • Challenges of periodicity e. g. monthly, quarterly, annual • Challenges of granularity e. g. LAU, NUTS 2 & 3, National • This is not an issue the ESSnet LOSD can resolve, just highlight.

Data challenges identified and documented • There are data challenges with Census, EU SILC

Data challenges identified and documented • There are data challenges with Census, EU SILC and LFS • There are challenges linking to mapping agencies • There are challenges with linking to data publishers outside the ESS

How to link data? • Don’t link • Sort out the data anomalies •

How to link data? • Don’t link • Sort out the data anomalies • Link, but, somehow inform the machine of data anomalies (Technical Challenge that might not be possible)

Hosting RDF data and LOSD cloud service

Hosting RDF data and LOSD cloud service

ESS Opportunities 1. An ESS web of Linked Open Statistical Data and Metadata 2.

ESS Opportunities 1. An ESS web of Linked Open Statistical Data and Metadata 2. New products and services, new insights

What linked open data can achieve?

What linked open data can achieve?

The Vision - Killer App Draft National Profile Design Linking ESS Data: 1. Locally,

The Vision - Killer App Draft National Profile Design Linking ESS Data: 1. Locally, nationally and nternationally 2. With other data sources outside ESS (exemplars) Spatially with maps Local Government Common ESS data and metadata across four partner NSIs: Census, EU SILC, Labour Force Survey (LFS) Local & National Maps Local Government Information Share code on Git. Hub

Opportunities to improve the process • Automation 855, 242 datasets on the European Data

Opportunities to improve the process • Automation 855, 242 datasets on the European Data Portal (EDP). Automating production of RDF from these datasets is important. The ESSnet explored JSON-stat to RDF. • Automating linkage discovery across variables in the 855, 242 EDP datasets • Optimising the size of the RDF files

Takeaways • Quickening pace of change • ESS staying relevant given this rapid change

Takeaways • Quickening pace of change • ESS staying relevant given this rapid change • ESS data/metadata challenges highlighted with LOSD • LOSD an opportunity for innovative products and services, greater insights and a challenge to align data/metadata across the ESS

Please play with and give feedback…

Please play with and give feedback…