LOD 2 Introduction jordsegmail com BIKE lab Creating

LOD 2 Introduction jordse@gmail. com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data Introduction FP 7 project (2010 -2014) 15 partners (technology researchers, companies and service providers) from 11 European countries plus 1 associated partner from Korea Coordinated by the AKSW research group at the University of Leipzig

Creating Knowledge out of Interlinked Data The emerging Web of Data achievements and challenges • Web - a global, distributed platform for data, information and knowledge integration • exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF Achievements 1. July 2007 April 2008 September 2008 2. 3. 4. 5. July 2009 Challenges 1. Coherence: Relatively few, Extension of the Web expensively maintained links with a data commons (currently amounting 25 2. Quality: partly low quality data and inconsistencies Billion facts) 3. Performance: Still vibrant, global RTD substantial penalties community compared to relational Industrial uptake begins 4. Data consumption: large(e. g. BBC, Thomson scale processing, schema Reuters, Eli Lilly) mapping and data fusion still Emerging governmental in its infancy adoption in sight 5. Usability: Missing direct end. Establishing Linked Data user tools and network effect as a deployment path for the Semantic Web.

Creating Knowledge out of Interlinked Data Why Linked Open Data? Problem: Try to search for these things on the current Web: • Apartments near German-Russian bilingual childcare in Leipzig. • ERP service providers with offices in Vienna and London. • Researchers working on multimedia topics in Eastern Europe. Information is available on the Web, but opaque to current Web search. Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate such structured information from different sources: Search engine HTML berlin. de Has everything about childcare in Berlin. Web server DB RDF HTML RDF Immobilienscout. de Knows all about real estate offers in Germany Web server DB

Creating Knowledge out of Interlinked Data Objectives of LOD 2

Creating Knowledge out of Interlinked Data Inter-linking/ Fusing Manual revision/ authoring Classification/ Enrichment Linked Data Lifecycle Challenges Storage/ Querying Extraction Quality Analysis Evolution / Repair Search/ Browsing/ Exploration

LOD 2 in a Nutshell Research focus • Very large RDF data management • Knowledge Enrichment & Interlinking • Fusion & Information Quality • Adaptive, semantic user interfaces Use Cases • Media & Publishing • Enterprise Data Webs • Open Gov Data Main Result • Integrated LOD 2 -Stack for Linked Data lifecycle management Partner Uni Leipzig, CWI, DERI Galway, FU Berlin, Semantic Web Company, Open. Link, Tenforce, Exalead, Wolters Kluwer, OKFN 7

LOD 2 STACK

Creating Knowledge out of Interlinked Data LOD 2 stack as Debian package repository LOD 2 stack repository is a Debian package repository http: //stack. lod 2. eu/deb/distributions/dists/. We have chosen a new reference OS: Ubuntu 12. 04 LTS o This version is supported for the next 5 years. Changes in repository management system for o enabling quality control (development -> test -> stable) enabling architecture dependent distribution support (e. g. Virtuoso RDF store) o Public access to documentation • http: //wiki. lod 2. eu

Creating Knowledge out of Interlinked Data LOD 2 stack contribution process

Creating Knowledge out of Interlinked Data LOD 2 stack components

Creating Knowledge out of Interlinked Data Linked Data publishing capabilities currently offered Covers most of the LOD publishing cycle o Combination of • locally installed software, • online available software, and • online available data sources as well as data packages • about page in the LOD demonstrator (http: //demo. lod 2. eu/lod 2 demo)

LOD 2 STACK – Extraction Virtuoso Sponger D 2 RQ

Creating Knowledge out of Interlinked Data Virtuoso Sponger An RDFizer introduced in Virtuoso 5. 0 Provides built-in RDF middleware for transforming non-RDF data into RDF "on the fly“. You can use non-RDF data sources as Semantic Web data sources. Inputs: Wide variety of non-RDF Web data sources, e. g: o (X)HTML Web Pages (including hosted microformats) o Web services (Google, Del. icio. us, Flickr etc. ) o Binary files (MS Office, PDF, Open. Document etc. ) Output: RDF structured data

Creating Knowledge out of Interlinked Data Inputs: Supported Data Sources RDF (inc. N 3, Turtle) o SIOC, SKOS, FOAF, Atom. OWL, Annotea … (X)HTML pages o HTML header metadata: Dublin Core o Microformats: e. RDF, RDFa, h. Card, h. Calendar, XFN, x. Folk … Syndication formats o RSS 2. 0, Atom, OPML, OCS, XBEL GRDDL Web service APIs: Google Base, Flickr, Del. icio. us, Ning … Files: o Binary files: MS Office, Open. Office, images, audio, video … o Data exchange formats: i. Calendar, v. Card 3 rd party metadata extractors: Aperture, Spotlight, SIMILE RDFizers or add your own!

Creating Knowledge out of Interlinked Data Output: Structured Data In the context of the Semantic Data Web: “Data organized into semantic chunks or entities, with similar entities grouped together into relations or classes” Michael Bergman (http: //www. mkbergman. com) Article: “More Structure, More Terminology and (hopefully) More Clarity”

Creating Knowledge out of Interlinked Data Sponger Benefits Majority of the world's data resides in non-RDF form at the current time Sponger provides a “Swiss army knife” for RDF structured data generation from non-RDF sources Extracting data from non-RDF Web sources and converting it to RDF o helps “bootstrap” the Semantic Web o helps drive the transition of the traditional Document-Web into the emerging Semantic Data-Web o exposes the data in a canonical form for querying and inference

Creating Knowledge out of Interlinked Data Sponger Inputs & Outputs

Creating Knowledge out of Interlinked Data Sponger Architecture Sponger is comprised of Sponger Cartridges Default cartridge collection is bundled as a Virtuoso VAD Cartridge = Metadata Extractor + Ontology Mapper Metadata extracted from non-RDF resources is mapped to a suitable ontology by Ontology Mapper to produce Structured Data Sponger is highly customizable Custom cartridges can be developed o Using any language (e. g. Virtuoso PL, C/C++, Java) supported by Virtuoso Server Extensions API

Creating Knowledge out of Interlinked Data D 2 RQ Platform System for accessing relational databases as virtual RDF graphs Offers RDF-based access to the content of relational databases without having to replicate it into an RDF store Features: • query a non-RDF database using SPARQL • access the content of the database as Linked Data over the Web • create custom dumps of the database in RDF • access information using the Apache Jena API

Creating Knowledge out of Interlinked Data D 2 RQ Platform : Components The D 2 RQ Platform consists of: D 2 RQ Mapping Language, a declarative mapping language for describing the relation between an ontology and an relational data model. D 2 RQ Engine, uses the mappings to rewrite SQL queries against the database and passes query results up to the higher layers of the frameworks D 2 R Server, an HTTP server that provides a Linked Data view, a HTML view for debugging and a SPARQL Protocol endpoint over the database.

Creating Knowledge out of Interlinked Data Mapping Examples map: My. Database a d 2 rq: Database; d 2 rq: jdbc. DSN "jdbc: mysql: //localhost/mydb"; d 2 rq: jdbc. Driver "com. mysql. jdbc. Driver"; d 2 rq: username "user"; d 2 rq: password "password". map: People a d 2 rq: Class. Map; d 2 rq: uri. Pattern “http: //. . . /people/@@User. ID@@”; d 2 rq: condition “User. deleted=0”; d 2 rq: class foaf: Person. map: photo a d 2 rq: Property. Bridge; d 2 rq: belongs. To. Class. Map map: People; d 2 rq: property foaf: made; d 2 rq: join “User. ID = Photo. User. ID”; d 2 rq: refers. To. Class. Map map: Photos.

LOD 2 STACK - Onto. Wiki

Creating Knowledge out of Interlinked Data Onto. Wiki Ontowiki enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents. o o o o Knowledge Bases (aka. graphs, Linked Data optional) Generic list and resource views Versioning Commenting on arbitrary resources User management + access control Inline editing Navigation hierarchies (e. g. Class hierarchies)

Creating Knowledge out of Interlinked Data Onto. Wiki Screenshots

LOD 2 STACK - Interlinking LIMES, SILK

Creating Knowledge out of Interlinked Data LIME Declarative Link Discovery Framework Tuned towards efficiency and extensibility Set-theoretical grammar for specifying links Time-efficient mappers for single data types Machine learning for detecting link specs

Creating Knowledge out of Interlinked Data LIME : Architecture

Creating Knowledge out of Interlinked Data LIMES Link Specifications 1. Metadata 2. Sourceand. Target 3. Similarity. Measure 4. Acceptance. Conditions 5. Review. Conditions 6. Execution. Mode 7. Output. Format

Creating Knowledge out of Interlinked Data Silk : Link Discovery Framework Tool for discovering links between data items within different Linked Data sources. The Silk Link Specification Language (Silk-LSL) allows to express complex linkage rules Can be used to generate owl: same. As links as well as other relationships Scalability and high performance through efficient data handling

Creating Knowledge out of Interlinked Data Silk Versions Silk Single Machine o Generate links on a single machine o Local or remote data sets Silk Map. Reduce o Generate RDF links using a cluster of multiple machines o Based on Hadoop (usable with Amazon Elastic Map. Reduce) Silk Server o Provides an HTTP API for matching instances from an incoming stream of RDF data o Can be used as an identity resolution component within applications that consume Linked Data from the Web

Creating Knowledge out of Interlinked Data SILK : Linking Workflow

Creating Knowledge out of Interlinked Data SILK : Linkage Rule Components

LOD 2 STACK - Interlinking LIMES, SILK

Creating Knowledge out of Interlinked Data LODRefine LOD-enabled Open. Refine Google Refine ==> Open. Refine LODGrefine ==> LODRefine o Supporting DBpedia (and Freebase) o Supporting crowdsourcing o Exporting RDF o Extracting named entities

Creating Knowledge out of Interlinked Data Open. Refine

Creating Knowledge out of Interlinked Data The Extensions Extend functionalities of Open. Refine o RDF Refine extension • Reconciliation and interlinking • Exporting RDF o DBpedia extension • Extending reconciled data with columns from DBpedia • Extracting Named Entities using Zemanta API o NER extension • Extracts named entities from unstructured text o Crowdsourcing extension Developed by o Zemanta: DBpedia extension, Crowdsourcing o DERI: RDF Refine o Free Your Metadata Group: Named Entity Extraction extension

Creating Knowledge out of Interlinked Data References LOD 2 Webinar: The 2 nd release of the LOD 2 stack LOD 2 Webinar Series: Zemanta / Open refine LOD 2 Webinar Series: LIMES LOD 2 Webinar Series: SILK LOD 2 Webinar Series: Onto. Wiki Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources LOD 2 Webinar Series: D 2 R and Sparqlify LOD 2 Home. Page, http: //stack. lod 2. eu/blog/ LOD 2 Prototype, http: //demo. lod 2. eu/lod 2 demo