From Relational Databases to RDF in a Jiffy
From Relational Databases to RDF in a Jiffy: Linked Data and Visualization of VLDBs
Open Government Data Open government data is public government information – such as government records – that is shared with the public digitally, over the Internet, in open raw formats and ways that make it accessible and readily available to all, promote analysis and allow reuse – such as the creation of data mashups
Inclusion Transparency Multiple views, not just one Accountability Reuse Data Integration Free availability
speedily tailored data of a saleable kind
A computer as a research and communication instrument could enhance retrieval, obsolesce mass library organization, retrieve the individual’s encyclopedic function and flip into a private line to speedily tailored data of a saleable kind. M. M. , 1962
In a nutshell
Why is this important? • A majority of dynamic Web content is backed by relational databases (RDB), and so are many enterprise systems. • The emerging Web of Data, based on RDF, enables data sharing and integration on a Web scale • RDF can also be used for data integration. Using a common standard data model, with a standard query language (SPARQL) is very attractive.
Standard RDB 2 RDF Mapping Language • “R 2 RML: RDB to RDF Mapping Language”: R 2 RML is the language to express customized mappings from relational databases to RDF
RDB end ? Games Name Gaming ESRB Name Price Rating Games Game 05/12/2020 (c) Dept. Informática - PUC-Rio 10
05/12/2020 (c) Dept. Informática - PUC-Rio 11
• “The Top 10 Reasons Why Federated Can’t Succeed (and Why it Will Anyway)” Haas + Carey http: //research. microsoft. com/~gray/lowell/ Number 10: Robustness Number 5: Semantic heterogeneity Number 9: Security Number 4: Insufficient Metadata Number 8: Updates Number 3: Performance (Data Movement) Number 7: Configurability Number 2: Performance (Complexity) Number 6: Administration Number 1: Performance (Pathlength) 05/12/2020 (c) Depto. Informática - PUC-Rio 12
a Priori Approach – “The database designer should select an appropriate standard, if one exists, to guide design of the database. If none exists, the designer should publish a proposal for a common schema Bruegel, Pieter. The Tower of Babel. c 1563 covering the application domain” Oil on oak panel. 114 x 155 cm [Casanova 2007] Kunsthistorisches Museum Wien, Vienna – “One should reuse terms from well-known vocabularies wherever possible. You should only define new terms yourself if you can not find required terms in existing vocabularies” [Bizer 2010] 05/12/2020 (c) Dept. Informática - PUC-Rio 13
Design Decisions • Goal – published data should describe objects and relationships which are meaningful to the external world External Schema Conceptual Schema X 05/12/2020 Internal Schema (c) Depto. Informática - PUC-Rio 14
Design Decisions • Observation #1 – the vocabulary of the database internal schema is… “internal” ! STD S-PK S-ID S-SSN S-NM S-LV 1 102. 0153 2 10395799 John 3 – data should be published using a vocabulary which is meaningful to the external world STD wn: wordsense-student-noun-1 ims: student 05/12/2020 (c) Depto. Informática - PUC-Rio 15
Design Decisions • Observation #2 – the database internal schema is optimized for performance, data integrity, etc… • normalized tables, artificial primary keys, domain enumerations STD S-PK S-ID S-SSN S-NM S-LV 1 102. 0153 2 10395799 John 3 LV 05/12/2020 L-PK L-NM 1 BA 2 MSc 3 DSc 4 Ph. D UNIV ENRL (c) Depto. Informática - PUC-Rio UPK U-NM U-Web 32 PUCRio www. puc-rio. br S-PK U-PK 1 32 16
Design Decisions • Goal – published data should come from an external schema that describes objects and relationships and uses a vocabulary which are meaningful to the external world STUDEN T UNIVERSIT Y ENROLL 05/12/2020 SSN Name Leve l 10395799 John DSc Name Website PUCRio www. puc-rio. br Student University 103957 -99 www. puc-rio. br (c) Depto. Informática - PUC-Rio External Schema Conceptual Schema Internal Schema 17
Std. Tool • Std. Tool - External Schema Design – helps users specify an external schema, using the entity-relationship model Entity(Student) External Schema Entity(University) Relationship(Student, University) – helps users specify how to map the external schema into the database internal schema 05/12/2020 (c) Depto. Informática - PUC-Rio External Schema Conceptual Schema Internal Schema 18
Std. Tool • Std. Tool – Vocabulary Selection – helps users select a vocabulary for the external schema • • STD locate published vocabularies match distinct vocabularies wn: wordsense-student-noun-1 ims: student 05/12/2020 (c) Depto. Informática - PUC-Rio 19
OWL Ontology ENTITY RELATIONSHIP Relational databases 1 RDF Vocabularies 2 Ontology Alignment 3 Suggestions 4 STEP 1 Relational to ER [1] • Extract metadata from the DB • Adjust relationship names (Meaningful names) TRIPLES SET 1. Casanova, M. A. , Amaral de Sá, J. E. , “Mapping Uninterpreted Schemes into Entity- Relationship Diagrams: two Applications to Conceptual Schema Design”. In: IBM Journal of Research and Development 28(l) pp. 82 -94 (1984).
OWL Ontology ENTITY RELATIONSHIP Relational databases 1 RDF Vocabularies 2 Ontology Alignment 3 Suggestions 4 STEP 2 ER to OWL ontology [2] TRIPLES SET 2. M. Fahad, “ER 2 OWL: Generating OWL Ontology from ER Diagram, ” Intelligent Information Processing IV (2008): 28– 37.
OWL Ontology ENTITY RELATIONSHIP Relational databases 1 RDF Vocabularies 2 Ontology Alignment 3 Suggestions 4 STEP 3 Match OWL representation with known vocabularies • K-mmatch Tool – Ontology matching tool – Combines ontology matchers TRIPLES SET
Matching Approach Schema Input Ontology Matching Tools Matcher 1 Similarity Cube Result Combination mappings
OWL Ontology ENTITY RELATIONSHIP Relational databases 1 RDF Vocabularies 2 Ontology Alignment 3 Suggestions Generate Suggestions STEP 4 • • 4 N Attribute 1 Suggest Best match is manually chosen (by the user) TRIPLES SET
OWL Ontology ENTITY RELATIONSHIP Relational databases 1 RDF Vocabularies 2 Ontology Alignment <http: //example/address/1> <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#type> <http: //www. gnowsis. org/ont/vcard#Address>. <http: //example/address/1> <http: //www. gnowsis. org/ont/vcard#street. Address> "47 My. Sakila Drive - Alberta". <http: //example/address/1> <http: //www. gnowsis. org/ont/vcard#city> "Lethbridge". <http: //example/address/2> <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#type> <http: //www. gnowsis. org/ont/vcard#Address>. <http: //example/address/2> <http: //www. gnowsis. org/ont/vcard#street. Address> "28 My. SQL Boulevard - QLD". <http: //example/address/2> <http: //www. gnowsis. org/ont/vcard#city> "Woodridge". <http: //example/address/3> <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#type> <http: //www. gnowsis. org/ont/vcard#Address>. <http: //example/address/3> <http: //www. gnowsis. org/ont/vcard#street. Address> "23 Workhaven Lane - Alberta". <http: //example/address/3> <http: //www. gnowsis. org/ont/vcard#city> "Lethbridge". <http: //example/address/4> <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#type> <http: //www. gnowsis. org/ont/vcard#Address>. <http: //example/address/4> <http: //www. gnowsis. org/ont/vcard#street. Address> "1411 Lillydale Drive - QLD". <http: //example/address/4> <http: //www. gnowsis. org/ont/vcard#city> "Woodridge". <http: //example/address/5> <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#type> <http: //www. gnowsis. org/ont/vcard#Address>. <http: //example/address/5> <http: //www. gnowsis. org/ont/vcard#street. Address> "1913 Hanoi Way - Nagasaki". <http: //example/address/5> <http: //www. gnowsis. org/ont/vcard#postalcode> "35200". <http: //example/address/5> <http: //www. gnowsis. org/ont/vcard#city> "Sasebo". 3 Suggestions 4 Output Triples set + ontology (from step 2) + mappings to known vocabularies (from step 3) TRIPLES SET
Challenges • On the fly approach – RDB 2 RDF – Vocabularies are being created
Challenges • Wrong end fascination – Numbers & visualization – Not modeling
06/09/10
06/09/10
Resolve? ?
Example – your usual IT Topology • Complex IT environments • Biodiversity – resources are provided by multiple vendors – implement different manageability interfaces – applications are point solutions that are not easily integrated in a multi-vendor environment • Different knowledge resources are likely to be represented in different formats • True autonomic solutions must be vendor independent 31/1 K. Breitman / M. Perazolo
Tivoli Windows_OS_Agent Ontology March 8, 2007 32/1 K. Breitman / M. Perazolo
What worked • Hybrid approach = Lighter weight ontology + Instances • In our favor: – representations are different but, well known • CIM • Public – fairly easy to harvest – resources being monitored have (good approximations) for universal identifier
Because the more complex they are… • The harder it is to add a new export schema E 0 to a mediated schema M M I 0 0 E 0 05/12/2020 (c) Dept. Informática - PUC-Rio I 1 In 1 E 1 n En 34
Sub Problems in a nutshell 1. Adjust the classes and properties of M to accommodate the classes and properties of E 0 2. Revise the schema mappings 3. Change the constraints of M 3. 1. Translate the constraints of E 0 to the vocabulary of M 3. 2. Define a minimum set of changes to the constraints of M to accommodate the translated constraints 05/12/2020 (c) Dept. Informática - PUC-Rio 35
classes: P, B constraint: B ⊑ P mediated mapping: P P 1 ⊔ P 2 B B 1 ⊔ B 2 mediated schema B 2 ⊑ P 2 is a constraint of the import schema iff local mapping: P 2 R 2 B 2 L 2 ⊔ A 2 ⊑ R 2 where is the set of constraints of the export schema: L 2 ⊑ R 2 A 2 ⊑ R 2 classes: P 2 , B 2 constraints: B 2 ⊑ P 2 export schema classes: R 2 , L 2 , A 2 constraints: L 2 ⊑ R 2 A 2 ⊑ R 2
So what? • Are RDF schema representations expressive enough to promote interoperability? • Can they help visualize Big Data?
- Slides: 41