Mapping Relational Databases to RDF with Open Link
Mapping Relational Databases to RDF with Open. Link Virtuoso Orri Erling - Lead Developer, Virtuoso Team © 2008 Open. Link Software, All rights reserved.
Who Wants to Map? n Semantic Web Scalers q Expose whatever there is as RDF, the next guy will unify terms, make search and apps n Data Warehouse Keepers q Data is spread out, has implicit semantics, complex schemas, heterogeneous sources, ambiguous terms but we must make it join and aggregate cleanly © 2008 Open. Link Software, All rights reserved.
Present State n SPARQL to SQL exists but still, complex integrations are data warehouses n We'd really like to map, but. . . n Can it be otherwise? © 2008 Open. Link Software, All rights reserved.
Why RDF Data Warehouse? n Pros ¡ Even query performance across all data ¡ Possibility of forward-chaining inference ¡ Some SPARQL features may be better supported, e. g. Unspecified predicates n Cons ¡ Keeping data up-to-date ¡ Complex set up, needs dedicated servers: you don't build them on a whim © 2008 Open. Link Software, All rights reserved.
Why Map? n No copying, no timeliness issues n RDBMS outperforms RDF for analytics workloads n Agile reconfiguration without reloading data © 2008 Open. Link Software, All rights reserved.
Virtuoso n Mapping of SPARQL to SQL against any existing schema - whether stored in Virtuoso or elsewhere n Physical quad store n Federated/local RDBMS © 2008 Open. Link Software, All rights reserved.
For Mapping to Deliver. . . n Tackle any SQL analytics workload in SPARQL without extra cost n Deal with arbitrary SQL schema n Produce single SQL statements, optimizable by target RDBMS n Have intelligence for cases where one RDF entity can come from many relational sources © 2008 Open. Link Software, All rights reserved.
The Cases of Integration n Bring similar but heterogeneous schemas into a unified ontology - Union View n Translate FKs of one schema to PKs in another Distributed Join n Hide differences in normalization - Views for hiding joins n - Unit/Terminology conversions © 2008 Open. Link Software, All rights reserved.
Defining a Mapping Use SPARQL/SQL to: n Define URI formats and their subclass relations n Define which key-column-value combinations make a triple n Arbitrary SQL is allowed for mapping values and filtering n A single RDF node can be a composite of many columns, e. g. multipart key © 2008 Open. Link Software, All rights reserved.
The TPC-H Case http: //demo. openlinksw. com/tpc-h/ n The 22 queries as extended SPARQL n Each generates a single SQL statement, executable by Virtuoso, Oracle, Others n Next make several TPC-H databases on different servers and run the queries against the union © 2008 Open. Link Software, All rights reserved.
Where Problems Begin n In Open. Link Data Spaces, 6 Collaborative apps all mapped to SIOC: select * from <ods> where {? s ? p ? o. ? s has_comment ? c has_author <xxx> } n Trivially becomes a union of everything, 1000+ lines of SQL n Intelligently (once per app) becomes a Union of : select post. * from post, comment, user where c_post = p_id and c_author = u_id and u_name = f ('xxx') © 2008 Open. Link Software, All rights reserved.
What One Must Know n Mapping for integration is not trivial n Be careful when mapping multiple tables/columns to one class/property n Make URI schemes which encode type and source, so that senseless joins are not attempted if types not specified in query n Understand what the mapping logic can and cannot optimize n Understand what SQL can and cannot optimize n View resulting SQL for sanity check © 2008 Open. Link Software, All rights reserved.
SQL Extensions n Mapping must work against any RDBMS/Schema, as is n But there is Virtuoso SQL between the mapping and target RDBMS(s) n Location and latency - conscious distributed cost model n Breakup for making a wide result set into a row per property n Inverse functions © 2008 Open. Link Software, All rights reserved.
Use Cases n Open. Link Data Spaces - Blog, Wiki, News, Social Network, Feed Aggregation, Tag Clouds, Bookmarks etc. n Open. Link's own MIS - “total information awareness”: URI for any CRM Object, Account, Product, Support Case, Email etc. . n Musicbrainz n php. BB, Drupal, Media. Wiki, Word. Press, Bugzilla, and others. © 2008 Open. Link Software, All rights reserved.
Open. Link Software Thank You! http: //virtuoso. openlinksw. com © 2008 Open. Link Software, All rights reserved.
- Slides: 15