R 2 D A Bridge between the Semantic

  • Slides: 54
Download presentation
R 2 D: A Bridge between the Semantic Web and Relational Visualization Tool Ramanujam,

R 2 D: A Bridge between the Semantic Web and Relational Visualization Tool Ramanujam, Sunitha, et al. "R 2 D: A bridge between the semantic web and relational visualization tools. " 2009 IEEE International Conference on Semantic Computing. M 1 machida 2018/11/29 1

INTRODUCTION • Resource Description Framework(RDF) is more and more popular in the Semantic Web

INTRODUCTION • Resource Description Framework(RDF) is more and more popular in the Semantic Web Community. • The number of RDF store is increasing BUT Most of visualization tools are based on relational models SO R 2 D(RDF-to-Database) The authors aim for extracting relational structure from RDF Stores so that they can use many visualization tools with RDF 2

INTRODUCTION(contribution) The authors propose • mapping scheme • for translation of RDF Graph structures

INTRODUCTION(contribution) The authors propose • mapping scheme • for translation of RDF Graph structures to an equivalent normalized relational schema • transformation process • presents a normalized, non-generic, domain-specific, virtual relational schema view of the given RDF store based on the mapping file • SQL to SPARQL mechanism • to transform any relational SQL queries issued against the virtual relational schema into the SPARQL equivalent and return triples data to end-users in a tabular format • framework in which RDF data does not need to be duplicated. • JDBC interface that includes all of functionalities. 3

RELATED WORK • D 2 RQ[7], Virtuoso RDF view[8], Triplify[10] • input: relational database

RELATED WORK • D 2 RQ[7], Virtuoso RDF view[8], Triplify[10] • input: relational database • output: RDF • RDF 123[9] • input: spreadsheet • output: RDF 4

RELATED WORK • RDF 2 RDB[3] • input: RDF, output: RDB • very close

RELATED WORK • RDF 2 RDB[3] • input: RDF, output: RDB • very close with R 2 D make a better performance than RDF 2 RDB because RDF 2 RDB • needs data replication→synchronization and space problem • requires the presence of ontological information such as rdfs: class and rdf: property • has no SQL-to-SPARQL conversion component 5

RELATED WORK Related works about SPARQL-to-SQL conversion (R 2 D has SQL-to-SPARQL conversion) •

RELATED WORK Related works about SPARQL-to-SQL conversion (R 2 D has SQL-to-SPARQL conversion) • RDF/RDFS-based Relational Database Integration[12] • integrate different relational databases using the RDF model • An Efficient SQL-based RDF Querying Schema[13] 6

R 2 D PRELIMINARIES A)R 2 D’s Map. File. Generator R 2 D Map

R 2 D PRELIMINARIES A)R 2 D’s Map. File. Generator R 2 D Map File B)R 2 D’s DBSchema. Generator Relational Schema RDF Store SPARQL Query Processing inspect Visualization tool SQL SPARQL Query Engine C)R 2 D’s SQL-to-SPARQL Translation 7

A: R 2 D Mapping Constructs(Blank Node) • R 2 D mapping language is

A: R 2 D Mapping Constructs(Blank Node) • R 2 D mapping language is at the heart of the RDF-to. Database transformation • in this paper, R 2 D constract specific to blank nodes is disscussed • non-blank nodes → [5][6] 8

R 2 D constract specific to non-blank-node (Additional Explanation) [6] 9

R 2 D constract specific to non-blank-node (Additional Explanation) [6] 9

R 2 D constract specific to non-blank-node • • • r 2 d: Table.

R 2 D constract specific to non-blank-node • • • r 2 d: Table. Map r 2 d: key. Field r 2 d: Column. Bridge r 2 d: Multi. Valued. Column. Bridge(MVCB) r 2 d: belongs. To. Table. Map(BTTM) r 2 d: refers. To. Tablemap(RTTM) r 2 d: predicate r 2 d: Multi. Valued. Predicate(MVP) r 2 d: datatype 10

R 2 D constract specific to non-blank-node • r 2 d: Table. Map r

R 2 D constract specific to non-blank-node • r 2 d: Table. Map r 2 d: Table Map • refers to a table in a relational database • rdfs: class object will map to a r 2 d: Table. Map • Example: The RDF Graph in Figure 2 results in the creation of a Table. Map called “Student” • r 2 d: key. Field • specifies the primary key attribute for the r 2 d: Table. Map • The data value associated with the field specified by r 2 d: key. Field is the subject of the “rdf: type” predicate belonging to the rdfs: class object • Example: An r 2 d: key. Field called “Student_PK” field is attached to the ”Student” Table. Map r 2 d: key. Field 11

R 2 D constract specific to non-blank-node • r 2 d: Column. Bridge •

R 2 D constract specific to non-blank-node • r 2 d: Column. Bridge • relate single-valued RDF Graph predicates to relational database column • Each rdf: Property object maps to a distinct column attached to the table specified in the rdfs: domain predicate • Example: The “Name” and “Member Of” predicate in Figure 2 become r 2 d: Column. Bridges belonging to the “Student” r 2 d: Table. Map • r 2 d: Multi. Valued. Column. Bridge(MVCB) • Those RDF Graph predicate that have multiple object values for the same subject are mapped using the MVCB construct • Example: The “Works On” predicate in Figure 2 is an example of an MVCB mapping r 2 d: Column. Bridges belonging to the “Student” r 2 d: Table. Map 12

R 2 D constract specific to non-blank-node • r 2 d: belongs. To. Table.

R 2 D constract specific to non-blank-node • r 2 d: belongs. To. Table. Map(BTTM) • connects a r 2 d: Column. Bridge or MVCB to an r 2 d: Table. Map. • Example: the BTTM construct corresponding to “Name” r 2 d: Column. Bridge is set to a value of “Student” • r 2 d: refers. To. Table. Map(RTTM) • only used for those triples that contain a resource object for a predicate • used to generate primary key-foreign key relationships within the virtual relational schema • Example: the “Member. Of” r 2 d: Column. Bridge includes the RTTM construct with a value of “Department” 13

R 2 D constract specific to non-blank-node • r 2 d: predicate • used

R 2 D constract specific to non-blank-node • r 2 d: predicate • used to store the fully qualified property name of the predicate which corresponds to the column bridge • r 2 d: Multi. Valued. Predicate(MVP) • used when there are multiple predicate names that refers to the same overall object type despite each individual object having a different value • r 2 d: datatype • specifies the datatype of its column bridge and is derived from the rdfs: range predicate. 14

A: R 2 D Mapping Constructs(Blank Node) 15

A: R 2 D Mapping Constructs(Blank Node) 15

A: R 2 D Mapping Constructs(Blank Node) • r 2 d: Simple. Literal. Blank.

A: R 2 D Mapping Constructs(Blank Node) • r 2 d: Simple. Literal. Blank. Node(SLBN) • r 2 d: Multi. Valued. Simple. Literal. Blank. Node(MVSLBN) • r 2 d: Complex. Literal. Blank. Node(CLBN) • r 2 d: Multi. Valued. Complex. Literal. Blank. Node(MVCLBN) • r 2 d: Simple. Resource. Blank. Node(SRBN) • r 2 d: Complex. Resource. Blank. Node(CRBN) • r 2 d: Multi. Valued{Simple/Complex}Resource. Blank. Node (MVSRBN and MVCRBN) 16

A: R 2 D Mapping Constructs(Blank Node) SLBN CRBN MVSLBN CLBN SRBN 17

A: R 2 D Mapping Constructs(Blank Node) SLBN CRBN MVSLBN CLBN SRBN 17

A: R 2 D Mapping Constructs(Blank Node) • r 2 d: Simple. Literal. Blank.

A: R 2 D Mapping Constructs(Blank Node) • r 2 d: Simple. Literal. Blank. Node(SLBN) • help relate RDF Graph blank nodes that consist purly of distinct simple literal objects to relational database columns • Example: The object of the “Name” predicate is an example of an SLBN 18

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Multi. Valued. Simple.

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Multi. Valued. Simple. Literal. Blank. Node(MVSLBN) • maps duplicate SLBNs • generate separate r 2 d: Table. Map with foreign key MVSLBN 19

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Complex. Literal. Blank.

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Complex. Literal. Blank. Node(CLBN) • refers to blank nodes in the RDF Graph that have multiple literal object values for the same subject • generate a separate r 2 d: Table. Map with a foreign key CLBN 20

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Multi. Valued. Complex.

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Multi. Valued. Complex. Literal. Blank. Node(MVCLBN) • maps duplicate complex literal blank nodes • Example: Consider a scenario where the “Phone” predicate in Figure 2 is replaced with two similar predicates “Past. Ph. Nums” and “Current. Ph. Nums”, each of which are CLBNs. The object of two predicate together form an MVCLBN 21

A: R 2 D Mapping Construct(Blank Node) • Simple. Resource. Blank. Node(SRBN) • helps

A: R 2 D Mapping Construct(Blank Node) • Simple. Resource. Blank. Node(SRBN) • helps map blank nodes that have multiple predicate leading to resource object belonging to the same object class SRBN 22

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Complex. Resource. Blank.

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Complex. Resource. Blank. Node(CRBN) • represent blank nodes that have distinct or non distinct predicate leading to objects belonging to different object classes CRBN 23

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Multi. Valued{Simple/Complex}Resource. Blank.

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Multi. Valued{Simple/Complex}Resource. Blank. Node (MVSRBN and MVCRBN) • Duplicate simple/complex resource blank nodes are represented using the MVSRBN and MVCRBN constructs respectively • Example: Consider a scenario where the “Projects” predicate in Figure 2 is replaced with two similar predicates, “Past. Projects” and “Current. Projects”, each of which are SRBNs. The objects of these two predicates together form an MVSRBN 24

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Mixed. Blank. Node

A: R 2 D Mapping Construct(Blank Node) • r 2 d: Mixed. Blank. Node • maps blank nodes consisting of a mixture of literal, resource, and other blank node. 25

B: Types of Blank Node and Relationships 26

B: Types of Blank Node and Relationships 26

R 2 D PRELIMINARIES A)R 2 D’s Map. File. Generator R 2 D Map

R 2 D PRELIMINARIES A)R 2 D’s Map. File. Generator R 2 D Map File B)R 2 D’s DBSchema. Generator Relational Schema RDF Store SPARQL Query Processing inspect Visualization tool SQL SPARQL Query Engine C)R 2 D’s SQL-to-SPARQL Translation 27

R 2 D: A PROTOTYPE DESIGN A RDFMap. File. Generator • The first step

R 2 D: A PROTOTYPE DESIGN A RDFMap. File. Generator • The first step in the R 2 D Framework • map file generation through the RDFMap. File. Generator • automatically generates an RDF-to-Relational mapping file through extensive examination of RDF data. 28

A. RDFMap. File. Generator 29

A. RDFMap. File. Generator 29

A. RDFMap. File. Generator • The transformation process is not always as straightforward or

A. RDFMap. File. Generator • The transformation process is not always as straightforward or well-defined as Table 2 suggests • There are many RDF Graphs that do not have incomplete structural information • RDFMap. File. Generator works on RDF Stores with or without such structural information 30

A. RDFMap. File. Generator • The data structure discovery process When structual information about

A. RDFMap. File. Generator • The data structure discovery process When structual information about the RDF database is available 1. The algorithm discovers schema definitions 2. creates appropriate Table and Column structures Next, instance data is processed using three procedures 31

A. RDFMap. File. Generator Process. Literal. Predicate(the first procedure) • used to process predicate

A. RDFMap. File. Generator Process. Literal. Predicate(the first procedure) • used to process predicate that have literal objects. If the resource contains multiple literal object values for the same predicate → r 2 d: Multi. Valued. Column. Bridge otherwise → r 2 d: Column. Bridge 32

A. RDFMap. File. Generator • Process. Resource. Predicate(second procedure) • used to process predicate

A. RDFMap. File. Generator • Process. Resource. Predicate(second procedure) • used to process predicate that have resource objects 1. A new potential column is added for every resource predicate that belongs to the subject resource. 2. Duplicate predicates(predicate that have objects belonging to the same object class)are examined and eliminated 3. Any potential columns that refer to the same object resource class are set to r 2 d: Multi. Valued. Column. Bridges 4. columns referring to distinct object are set to r 2 d: collum. Bridges 33

A. RDFMap. File. Generator • Process. Blank. Node(third procedure) • used to process Blank

A. RDFMap. File. Generator • Process. Blank. Node(third procedure) • used to process Blank node predicate 1. Blank node predicate are classified into the categories depending on whether the blank node object are • • literals resources blank nodes combination of those 34

A. RDFMap. File. Generator 2. case 1: blank node objects are literals(such as the

A. RDFMap. File. Generator 2. case 1: blank node objects are literals(such as the Name and Phone blank nodes) I. the Process. Literal. Predicate procedure is called for each predicate off of the blank Node II. If every column generated through the Process. Literal. Predicate is simple r 2 d: Column. Bridge(such as the Name blank node) • the Blank. Node is set to r 2 d: Simple. Literal. Blank. Node III. If any of the columns are r 2 d: Multi. Valued. Column. Bridges(such as the Phone blank node) • the Blank. Node is set to r 2 d: Complex. Literal. Blank. Node IV. If no such blank Node has been previously encountered • this blank node is added to the set of blank nodes V. If a similar blank node is already an element of the set of blank nodes • the blank node type is set to r 2 d: MVSLBN or r 2 d: MVCLBN 35

A. RDFMap. File. Generator 3. case 2: blank node objects are resources(such as Project

A. RDFMap. File. Generator 3. case 2: blank node objects are resources(such as Project and Other. Activities blank nodes) I. the Process. Resource. Predicate procedure is called for each predicate off of the blank Node II. If the number of columns is equal to 1(such as Project blank node) • the blank node type is set to r 2 d: Simple. Resource. Blank. Node III. Otherwise(such as Other. Activites blank node) • the blank node type is set to r 2 d: Complex. Resource. Blank. Node IV. if a similar blank node exists • the blank node type is set to r 2 d: MVSRBN or r 2 d: MVCRBN 36

A. RDFMap. File. Generator 4. case 3: blank node objects are mixture of literal

A. RDFMap. File. Generator 4. case 3: blank node objects are mixture of literal objects, resource object, and other blank nodes I. considered to be of type r 2 d: Mixed. Blank. Nodes II. processed using the Depth-First-Search graph algorithm III. For every literal or resource predicate off of a blank node, a column is created and added to the blank node entity IV. For every blank node predicate off of a blank node, a new Blank Node entity is created and added to an array of blank nodes and is also added as a column to original blank node 37

R 2 D PRELIMINARIES A)R 2 D’s Map. File. Generator R 2 D Map

R 2 D PRELIMINARIES A)R 2 D’s Map. File. Generator R 2 D Map File B)R 2 D’s DBSchema. Generator Relational Schema RDF Store SPARQL Query Processing inspect Visualization tool SQL SPARQL Query Engine C)R 2 D’s SQL-to-SPARQL Translation 38

B. DBSchema. Generator • DBSchema. Generator • generate actual relational schema • takes the

B. DBSchema. Generator • DBSchema. Generator • generate actual relational schema • takes the RDF-to-Relational Schema mapping file • returns a virtual, appropriately normalized relational database schema 39

B. DBSchema. Generator • SLBNs(such as Name blank node object) • every r 2

B. DBSchema. Generator • SLBNs(such as Name blank node object) • every r 2 d: Column. Bridge entry that belongs to the blank node is simply added as a column to the table to which the SLBN belongs • CLBNs(such as Phone blank node object) • result in the creation of a new table that represents a 1: N relationship between the subject and the object of the blank node • CLBNs always include a “Type” column associated with the r 2 d: Multi. Valued. Predicate • SRBNs and CRBNs(such as Projects and Other. Activities) • result in the creation of join tables with primary keys of tables corresponding to the subject resource and the object resource included as fields in the join table 40

B. DBSchema. Generator • MVSLBN • result in the creation of new table •

B. DBSchema. Generator • MVSLBN • result in the creation of new table • this table has as columns primary key of the table corresponding to the blank node’s r 2 d: belongs. To. Table. Map value, and all the r 2 d: Column. Bridges that belong to the MVSLBN • MVCLBN , MVSRBN and MVCRBN • very similar to their Single. Valued counterparts • the only difference is the inclusion of an additional field in the event the predicate corresponding to the blank node is an “MVP” 41

R 2 D PRELIMINARIES A)R 2 D’s Map. File. Generator R 2 D Map

R 2 D PRELIMINARIES A)R 2 D’s Map. File. Generator R 2 D Map File B)R 2 D’s DBSchema. Generator Relational Schema RDF Store SPARQL Query Processing inspect Visualization tool SQL SPARQL Query Engine C)R 2 D’s SQL-to-SPARQL Translation 42

C. SQL-to-SPARQL Translation • SQL-to-SPARQL Translation • takes an SQL Statement as input •

C. SQL-to-SPARQL Translation • SQL-to-SPARQL Translation • takes an SQL Statement as input • returns an appropriate SPARQL equivalent as output • Input SQL query is parsed to identify • • tables fields where clause (if present) group by clause (if present) 43

C. SQL-to-SPARQL Translation Sample SQL query SPARQL SELECT 44

C. SQL-to-SPARQL Translation Sample SQL query SPARQL SELECT 44

C. SQL-to-SPARQL Translation • The SQL WHERE clauses are added to the FILTER clause

C. SQL-to-SPARQL Translation • The SQL WHERE clauses are added to the FILTER clause of the SPARQL statement with minor modifications GONE subject 1 subject 0 regex(? name_First, “^ABC”) 45

C. SQL-to-SPARQL Translation • For non-derived tables(such as Employee and Depatment) ? subject<table. Index>

C. SQL-to-SPARQL Translation • For non-derived tables(such as Employee and Depatment) ? subject<table. Index> <Field. Predicate> ? <Field. Name> • For derived tables corresponding to blank nodes and for fields belonging indirectly to non-derived tables(SLBN fields) ? subject<table. Index> <Blank. Node. Predicate> ? <Blank. Node. Name> <Field. Predicate> ? <Field. Name> 46

C. SQL-to-SPARQL Translation 47

C. SQL-to-SPARQL Translation 47

C. SQL-to-SPARQL Translation • For derived tables corresponding to non-mixed blank nodes that contain

C. SQL-to-SPARQL Translation • For derived tables corresponding to non-mixed blank nodes that contain multi-valued predicate(such as Employee. Phone) ? subject<table. Index> ? <MVPColumn. Name> ? <Non. MVPColumn. Name> ? <blank. Node. Name> ? <MVPColumn. Name> ? <Non. MVPColumn. Name> • Final query is 48

IMPLEMENTATION SPECIFICS The hardware used in the implementation of R 2 D • RAM:

IMPLEMENTATION SPECIFICS The hardware used in the implementation of R 2 D • RAM: 2 GB • CPU: Intel Core 2 Duo(2 GHz) • OS: Windows vista The software platform and tools used include • My. SQL 5. 0 • Jena 2. 5. 6(to manipulate the RDF triples) • Java 1. 5 (for development of algorithms) • Data. Vision v 1. 2. 0(to visualize reports based on RDF data) 49

A. Experimental Dataset • Two datasets were used in the experimentation process 1. The

A. Experimental Dataset • Two datasets were used in the experimentation process 1. The dataset 1 is based on the publications domain described in [6] • includes information about journals, issues, and articles(ingentaconnect. com) • optimized version of the map file generation process was executed against the dataset 1 in order to enable an apples-to-apples performance comparison against [6] 2. The dataset 2 is a subset of the scenario in Figure 2 • include the “Employee”, “Department”, and “Projects”. • The query performance experiments and reporting tool outputs presented are based on the dataset 2 50

B. Experimental Results Map file generation Times with and without data sampling, for RDF

B. Experimental Results Map file generation Times with and without data sampling, for RDF stores with and without ontological information 51

B. Experimental Results 52

B. Experimental Results 52

B. Experimental Results 53

B. Experimental Results 53

Conclusion and Future work • Inplementation of R 2 D were expanded from the

Conclusion and Future work • Inplementation of R 2 D were expanded from the previous work • By including the ability to handle different kinds of blank nodes • Future directions for R 2 D include • suport for reification concepts • improving the normalization process for mixed blank nodes • translation tules for nested/correlated SQL sub-queries. 54