Ontology materialization from relational database sources using D
Ontology materialization from relational database sources using D 2 RQ Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute
RDBMS Ø Ø The majority of data underpinning the Web are stored in Relational Databases (RDB). Advantages: § Secure and scalable architecture. § Efficient storage. § Reliability. Ø Disadvantages: § Difficult to share data across large organizations where different database schemata are used. § Most importantly, there is no check on semantics.
RDBMS to RDF Ø Ø Semantic web getting more mature, growing need for RDF applications to access content of legacy databases. Compared to RDB, RDF is: § More expressive. § More easily processed and interpreted. § Easily reasoned over by software agents. Ø Need a way to make data in RDBMS available as RDF.
Mapping data from RDBMS to RDF In order to generate Semantic Web content from a RDB, Tim Berners-Lee proposed a very direct mapping: § Each table in the RDB is a RDF class. § Each field (column) name is a RDF property. § Each record is a RDF node - an instance of the RDF class and so can play the role of a subject or an object in a RDF statement.
Two Approaches Ø Semi-automatic generation of ontology from RDB § Read all records, export as RDF triples. § Mappings are direct, complex mappings do not usually appear. § Need to convert to RDF regularly. § Does not allow the population of an existing ontology – a BIG limitation! Ø Map existing RDB to an existing ontology § Customize mapping according to existing ontology. § Complex mappings can be implemented.
The D 2 RQ platform Ø Ø Provides an integrated environment for accessing the content of non-RDF, relational databases as virtual, read-only RDF graphs. Using D 2 RQ we can: § Query a non-RDF database using SPARQL queries. § Access information in a non-RDF database using the Jena API or the Sesame API. § Access the content of the database as Linked Data over the Web.
The D 2 RQ platform § D 2 RQ mapping language – describes the relation between ontology and RDB § D 2 RQ engine – uses mappings to rewrite Jena and Sesame API calls to SQL queries. § D 2 R server - provides a Linked Data view, a HTML view for debugging and a SPARQL Protocol endpoint over the database.
D 2 R server
More about D 2 RQ mapping language formally defined by http: //www 4. wiwiss. fu-berlin. de/bizer/d 2 rq/0. 1/ Ø D 2 RQ namespace is defined by http: //www. wiwiss. fu-berlin. de/suhl/bizer/D 2 RQ/0. 1# Ø Database compatibility: Ø § § § Oracle My. SQL Postgre. SQL Microsoft SQL Server ODBC data sources (e. g. Microsoft Access) - mapping generator and automatic detection of column types do not work.
Command line tools Two command line tools (only on Windows and Unix systems ): § Mapping generator: § § Analyzes database schema. Generates a default mapping file. Resultant D 2 RQ map is an RDF document in N 3 format. Mapping can be used as-is or can be customized. § Dump script: § Writes the content of the RDB into a single RDF file. § Supported syntaxes are "RDF/XML" (the default), "RDF/XML-ABBREV", "N 3", "N-TRIPLE".
D 2 RQ mapping – how it works Ontology is mapped to a database schema using: § § d 2 rq: Class. Maps – Represents a class or a group of similar classes in the ontology. Specifies how instances of the class are identified. d 2 rq: Property. Bridges – A Class. Map has a set of Property. Bridges which specify how the properties of an instance are created.
BCODMO ontology materialization from My. SQL database using D 2 RQ
BCODMO - D 2 RQ map
Excerpt of the mapping file # Table dataset (default mapping) map: dataset a d 2 rq: Class. Map; d 2 rq: data. Storage map: database; d 2 rq: uri. Pattern "dataset/@@dataset_id@@"; d 2 rq: class vocab: dataset; d 2 rq: class. Definition. Label "dataset"; . map: dataset__label a d 2 rq: Property. Bridge; d 2 rq: belongs. To. Class. Map map: dataset; d 2 rq: property rdfs: label; d 2 rq: pattern "dataset #@@dataset_id@@"; . map: dataset_id a d 2 rq: Property. Bridge; d 2 rq: belongs. To. Class. Map map: dataset; d 2 rq: property vocab: dataset_id; d 2 rq: property. Definition. Label "dataset_id"; d 2 rq: column "dataset_id"; d 2 rq: datatype xsd: int; # Table dataset (customized mapping) map: dataset a d 2 rq: Class. Map; d 2 rq: data. Storage map: database; d 2 rq: uri. Pattern "http: //escience. rpi. edu/ontology/BCODMO/bcodmo/2/0/Deployment. Dataset. Collection_@@dataset_id@@"; d 2 rq: class bcodmo: Deployment. Dataset. Collection; d 2 rq: class. Definition. Label "Deployment. Dataset. Collection"; . map: see. Also. Statement a d 2 rq: Property. Bridge; d 2 rq: belongs. To. Class. Map map: dataset; d 2 rq: property rdfs: see. Also; d 2 rq: uri. Pattern "http: //osprey. bcodmo. org/dataset. cfm? id=@@dataset. datase t_id@@&flag=view"; . map: has. Identifier a d 2 rq: Property. Bridge; d 2 rq: property bcodmo: has. Identifier; d 2 rq: belongs. To. Class. Map map: dataset; d 2 rq: column "dataset_id"; d 2 rq: datatype xsd: int; . map: dataset_id a d 2 rq: Property. Bridge; d 2 rq: belongs. To. Class. Map map: dataset; d 2 rq: property bcodmo: has. Parameter; d 2 rq: refers. To. Class. Map map: parameters; d 2 rq: property. Definition. Label "dataset_id"; d 2 rq: join "dataset_id = dataset_parameters. dataset_id"; d 2 rq: join "dataset_parameters_id = parameters_id"; .
Customization of mapping Ø Customization is very direct in the case where a class in the ontology is represented by a table in the database. Ø Mapping is complicated or sometimes not possible when a class in the ontology is not a table in the database, but a record in a database table.
Optimizing D 2 R’sperformance Ø Define primary keys wherever possible and create indexes. Ø Indicate directions in d 2 rq: joins. Ø Set d 2 rq: auto. Reload. Mapping to false whenever not needed. Ø Use hint properties: § d 2 rq: value. Max. Length § d 2 rq: value. Regex § d 2 rq: value. Contains
Limitations Performs reasonably well with basic triple patterns, performance deteriorates when SPARQL features such as OPTIONAL, FILTER and LIMIT are used. Ø Does not have reasoning capability. Reasoning can be added by using the D 2 RQ engine within Jena. Ø Integration of multiple databases or other data sources using D 2 RQ alone is not possible. Ø Read-only, cannot perform INSERT, DELETE or UPDATE operations. Ø Cannot handle complicated database structures like VIEWS. Ø
Other tools/applications for publishing databases on Semantic Web Virtuoso RDF View: § Uses table to class and column to predicate approach. § RDB data are represented as virtual RDF graphs. § Customization of mapping possible. Ø Triplify: § Maps HTTP-URI requests to relational database queries expressed in SQL. § No SPARQL support. Ø
Tools/Applications continued… R 2 O: § XML based declarative mapping language. Ø Dart. Grid Semantic Web toolkit: § Provides a visual tool to define mapping. Ø RDBTo. Onto § User oriented tool that creates static mapping (RDF dump). Ø Asio Semantic Bridge for Relational Databases (SBDR) and Automapper: § Uses table to class approach. Ø
A note of thanks to… Ø Prof. Peter Fox Ø Patrick West Ø Eric Rozell Ø Ankesh Khandelwal Ø Evan Patton
- Slides: 20