Enterprise Information Integration using Semantic Web Technologies RDF
Enterprise Information Integration using Semantic Web Technologies: RDF as the Lingua Franca David Booth, Ph. D. HP Software Semantic Technology Conference 20 -May-2008 In collaboration with Steve Battle, HP Labs Latest version of these slides: http: //dbooth. org/2008/stc/slides. ppt © 2008 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice
Disclaimer This work reflects research and is presented for discussion purposes only. No product commitment whatsoever is expressed or implied. Furthermore, views expressed herein are those of the author and do not necessarily reflect those of HP. 2
Outline PART 0: The problem PART 1: RDF: The lingua franca for information exchange • • − Why 1. 2. 3. 4. − How 1. 2. 3. 4. RDF message semantics REST-based SPARQL endpoints XML with GRDDL transformations Aggregators PART 2: POC: A SPARQL adaptor for UCMDB • − − 3 Focus on semantics Easier data integration Easier to bridge other formats/models Looser coupling What is UCMDB SPARQL adaptor
PART 0 The problem 4
Problem 1: Integration complexity • Multiple producers/consumers need to share data • Tight coupling hampers independent versioning Compliance Management Discovery Provisioning Release Management Release Managers Incident Management Change Management Compliance Managers Operation Centers Source Control Monitoring Networking Engineers Storage Administrators Unix System Administrators Networking Administrators 5 Windows System Administrators Ticketing
Problem 2: Babelization • Proliferation of data models (XML schemas, etc. ) • Parsing issues influence data models No consistent semantics • Data chaos • Tower of Babel, Abel Grimmer (1570 -1619) 6
PART 1 RDF: The lingua franca for information exchange 7
Why? Four reasons. . . 8
Why? 1. Focus on semantics • XML: − Schema is focused on how to serialize • Constrains more than the model − Parent/child and sibling relationships are not named • Are their semantics documented? E. g. , does sibling order matter? • RDF: − One URI per concept − Syntax independent • 9 Who cares about syntax?
Why? 2. Easier data integration 10
Why? 2. Easier data integration • 11 Blue App has model
Why? 2. Easier data integration • Red App has model • Need to integrate Red & Blue models 12
Why? 2. Easier data integration • Step 1: Merge RDF • Same nodes (URIs) join automatically 13
Why? 2. Easier data integration • Step 2: Add relationships and rules • (Relationships are also RDF) 14
Why? 2. Easier data integration • Step 3: Define Green model • (Making use of Red & Blue models) 15
Why? 2. Easier data integration • What the Blue app sees: − No difference! 16
Why? 2. Easier data integration • What the Red app sees • No difference! 17
Why? 3. RDF helps bridge other formats/models • Producers and consumers may use different formats/models Rules can specify transformations • Inference engine finds path to desired result model • A 1 X A 2 A 3 RDF Model Transform Y B 1 B 2 C 1 Z 18 Ontologies &&Ontologies Rules &Rules C 2
Why? 4. Looser coupling • Without breaking consumers: − Ontologies can be mixed and extended − Triples can be added • 19 Producer & consumer can be versioned more independently
Example of looser coupling Red. Cust and Green. Cust ontologies added • Blue app is not affected • (Blue app) Consumer 20 Producer
How? Four ways. . . 21
How? 1. RDF message semantics Interface contract specifies RDF, regardless of serialization • RDF pins the semantics • Consumer 22 RDF Producer
How? 2. REST-based SPARQL endpoints Consumer 23 RDF SPARQL HTTP Producer
REST-based SPARQL endpoints • Why REST: − HTTP is ubiquitous − Simpler than SOAP-based Web services (WS*) − Looser process coupling 24
REST-based SPARQL endpoints • Why SPARQL: − One endpoint supports multiple data needs • Each consumer gets what it wants − Insulates consumers from internal model changes • Inferencing transforms data to consumer's desired model • Looser data coupling 25
How? 3. XML with GRDDL transformations • GRDDL is a W 3 C standard • GRDDL permits RDF to be "gleaned" from XML − XML document or schema specifies desired GRDDL transformation − GRDDL transformation produces RDF from XML document − Mostly intended for getting microformat and other data/metadata from HTML pages 26
Using GRDDL for XML document semantics • Each XML format can be viewed as a custom serialization of RDF! − GRDDL transformation produces semantics of the XML document Helps bridge XML and RDF worlds • Same XML document can be consumed by: • − Legacy XML app − RDF app • App interface contract can specify RDF − Serializations can vary − Semantics are pinned by RDF 27
Using GRDDL for XML document semantics Service Normalize to RDF Client Core App Processing Serialize as XML/other/RDF See: http: //dbooth. org/2007/rdf-and-soa-paper. htm 28 RDF Engine / Store
How? 4. Aggregators • Gets data from multiple sources • Provides data to consumers A 1 X A 2 SPARQL A 3 Y B 1 Aggregator B 2 C 1 Z 29 Ontologies &&Ontologies Rules &Rules C 2
Aggregator • Conceptual component − Not necessarily a separate physical service • Handles mechanics of getting data − Different adaptors for different sources • REST, WS*, Relational, XML, etc. • Diverse data models − Might do caching and query distribution (federation) • Provides model transformation − Plug in ontologies and inference rules as needed 30
PART 2 Proof-of-Concept: A SPARQL adaptor for UCMDB 31
IT Service Management (ITSM) • Manage IT environment • Configuration Management Data Base (CMDB) is central 32
The HP Universal CMDB (UCMDB) Goal: • Maintain a comprehensive and current record of all configuration items (CIs) and their relationships CMDB : Configuration Management DB 33
Example: host information Host properties http: //cmdb. mercury. com#nt. 35014541 One particular host machine 34
SPARQL adaptor Uses existing SOAP interface to UCMDB • Enables SPARQL queries • Results can be RDF • No model transformation (yet) • SPARQL adaptor SOAP interface HP UCMDB 35
Architecture of SPARQL adaptor Run Time SPARQL adaptor SOAP interface SPARQL compile TQL submit RDF lift XML HP UCMDB Database Design Time CMDB metadata 36 export OWL
UCMDB ontology • The HP UCMDB ontology defines CI types and relationship hierarchies. • Derived automatically from HP UCMDB metadata. 37
Jena based implementation • Jena, ARQ, Joseki developed at HP Labs*. Jena : Semantic Web toolkit ARQ : Query Engine Joseki : SPARQL server RDF, OWL, inference SPARQL query algebra, evaluation SPARQL protocol * http: //www. hpl. hp. com/semweb/ 38
Query returning a table Select the names of host servers on the network with addresses from 192. 168. 81. 0 SELECT ? host_name WHERE { [ a object: network ] attr: network_netaddr "192. 168. 81. 0" ; link: member [ a object: host ; attr: host_dnsname ? host_name ] } host_name "ILDTRD 129" ^^<http: //www. w 3. org/2001/XMLSchema#string> "JONI" ^^<http: //www. w 3. org/2001/XMLSchema#string> "MBADIR-IL" ^^<http: //www. w 3. org/2001/XMLSchema#string> 39
Query returning an RDF subgraph Describe a network (192. 168. 81. 0) with host servers containing a DB. 40
Example RDF result set Database Host 41
Outline PART 0: The problem PART 1: RDF: The lingua franca for information exchange • • − Why 1. 2. 3. 4. − How 1. 2. 3. 4. RDF message semantics REST-based SPARQL endpoints XML with GRDDL transformations Aggregators PART 2: POC: A SPARQL adaptor for UCMDB • − − 42 Focus on semantics Easier data integration Easier to bridge other formats/models Looser coupling What is UCMDB SPARQL adaptor
Questions? © 2008 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice
- Slides: 43