From XML to Semantic Web Changqing Li Tok
From XML to Semantic Web Changqing Li Tok Wang Ling Department of Computer Science School of Computing National University of Singapore
Outline • • Introduction Background Preliminary A genetic model to organize ontologies The translations Related work Conclusion 2
Background – Ontology description languages • Resource Description Framework (RDF) [8] – Organizes information in a Subject-Verb-Object (SVO) (or Resource-Property-Resource triples) form [8] Ora Lassila and Ralph R. Swick: Resource description framework (RDF). 1999. 3
Background – Ontology description languages • • • RDF Schema (RDFS) [1] DARPA Agent Markup Language (DAML) [10] Ontology Inference Layer (OIL) [6] DAML+OIL [4] Web Ontology Language (OWL) [3] – They are all based on the RDF syntax – They all define more primitives to describe information, e. g. “rdfs: sub. Class. Of”, “owl: equivalent. Class” etc. [1] Dan Brickley and R. V. Guha. Resource Description Framework (RDF) Schema Specification 1. 0, W 3 C Candidate Recommendation 27 March 2000. [3] Frank van Harmelen, Jim Hendler, Ian Horrocks, Deborah L. Mc. Guinness, Peter F. Patel chneider and Lynn Andrea Stein. OWL Web Ontology Language Reference. [4] Frank van Harmelen, Peter F. Patel-Schneider, and Ian Horrocks. Reference description of the DAML+OIL (March 2001) ontology markup language [6] I. Horrocks, D. Fensel, J. Broekstra, S. Decker, M. Erdmann, C. Goble, F. van Harmelen, M. Klein, S. Staab, R. Studer, and E. Motta. The Ontology Inference Layer OIL. [10] Lynn Andrea Stein, Dan Connolly, and Deborah Mc. Guinness. DAML Ontology language specification. October 2000 4
Preliminary • • ORA-SS [9] Object class Relationship type Attribute of object class and relationship type – Grade is an attribute of relationship type “sc” <student id="HD 1234567"> <name>John</name> <contact_no>9876543</contact_no> <course code="CS 4321"> <name>database</name> <grade>A</grade> </course> <part_time> <position>programmer</position> </part_time> </student> The “Student. xml” student id name contact_no 2, 0: 1, 0: n sc, 2, 3: 8, 4: n course part_time sc code name grade position ORA-SS schema diagram [9] Tok Wang Ling, Mong Li Lee, Gillian Dobbie. Semistructured Database Design, Springer, 2005 5
Motivating example • Distinguish the student name and course name • The semantics is clearer when changing student to student_employee • The semantics of the relationship type name “sc” is not clear • DTD and XML Schema have the same problem, and even worse, e. g they do not have the “sc” for the relationship type <student id="HD 1234567"> <name>John</name> <contact_no>9876543</contact_no> <course code="CS 4321"> <name>database</name> <grade>A</grade> </course> <part_time> <position>programmer</position> </part_time> </student> The “Student. xml” student id name contact_no 2, 0: 1, 0: n sc, 2, 3: 8, 4: n course part_time sc code name grade position ORA-SS schema diagram. 6
A genetic model to organize ontologies • Inheritance – Ontology inherits ontology languages – Lower level ontologies inherits higher level ontologies • Block – Employee Ontology does not inherit the “home_phone” from Person Ontology • Atavism – “home_phone” is an atavism in Employee. Working. Home Ontology • Mutation – “contact_number” of Person Ontology is a mutation in Employee Ontology RDF Inheritance RDFS Inheritance OWL Course Inheritance Person Inheritance Ontology contact_ number per: home _phone Ontology home_phone Inheritance … Employee Block per: contact _number Inheritance Ontology Inheritance Mutation office_phone … Student Inheritance Ontology Inheritance Employee. Woring. Home Student. Employee Ontology Atavism … Ontology hierarchy per. home_phone Block emp: contact _number • Multiple inheritance – Student. Employee Ontology inherits both Student and Employee ontologies – Student. Employee Ontology inherits the “contact_number” from Student Ontology Atavism – the concepts of grandparent ontology blocked by parent ontology is reused in this ontology. 7 Mutation – same name but different semantics in parent and child ontology. …
Our translations • Semantic translation • Structural translation • Schematic translation 8
Semantic translation • The Semantic Translation (Sem. T) from an XML file or schema to a semantic web file or schema in this paper means that the XML elements, attributes and values are replaced with concepts from ontologies. • Rule Sem. T 1 (Rule for that only one matched concept is returned from ontologies). The XML (schema) element, attribute or value is replaced with this only returned concept 9
Semantic translation • Rule Sem. T 2 to 4 are used for that more than one matched concepts are returned • Rule Sem. T 2 (Rule for Multiple Inheritance and Block). If the child ontology inherits several parent ontologies, the concept from that unblocked parent ontology is selected for the replacement. • Rule Sem. T 3 (Rule for Atavism). If the concept of the grandparent or ancestor ontology is an atavism in the grandchild or descendant ontology, the concepts in the grandchild or descendant ontology are used for the replacement. • Rule Sem. T 4 (Rule for Mutation). If a concept in the parent ontology is a mutation in the child ontology, the concept in the child ontology is used for the replacement. Example 1. If an XML is about student employee, the Student. Emplyee ontology is specified for search, and the ancestor ontologies of this specified ontology will be searched also. The “contact_number” from Student Ontology is used for replacement. 10
Semantic translation • Rule Sem. T 5 (Rule for that no matched concept are returned from ontologies). If the element, attribute or value cannot be found in the ontologies, our system suggests adding new concepts into the ontologies (adding new concepts needs the confirmation from the domain expert). 11
Semantic translation • Rule Sem 6 (Rule for Numbers). If the values in the XML are numbers, such as the contact_no “ 9876543”, they need not be searched in ontologies. • Rule Sem 7 (Rule for Person Names). If the values in the XML are person names (or company names etc. ), such as “John”, they need not be searched in ontologies. 12
Structural translation • Structural Translation (Str. T) in this paper refers to the translation of an XML file or schema to a file or schema complying with the RDF structure i. e. SVO format. • Rule Str. T 1 (Rule for checking structure). For any path of the XML from the root to the leaf, if the nesting is not resource, property, resource etc. interleaved, this XML does not satisfy the RDF structure. • Rule Str. T 2 (Rule for modifying structure). If resources or properties are required to be inserted in the XML to satisfy the RDF structure, the resources or properties are searched in the ontology hierarchy based on the domain and range of properties (not based on name). 13
Schematic translation • Schematic Translation (Sch. T) in this paper means that some features of the XML schema are translated to follow the RDF, RDFS and OWL languages. • Rule Sch. T 1 (Rule for ID and ID reference). For the object identifier of ORA-SS or the ID attribute of DTD, it will be translated to the “rdf: ID” (an identification primitive of RDF) and the value for the primary key will be kept unchanged. We use the “rdf: resource” to refer to the referenced object. • • Rule Sch. T 2 (Rule for default and fixed values). If the value of an attribute is a default or fixed value, it is kept unchanged. • Rule Sch. T 3 (Rule for order sensitive, composite and disjunctive attributes). The order sensitive attribute is translated to the “rdf: Seq”, the composite attribute to the “rdf: Bag”, and the disjunctive attribute to the “rdf: Alt”. • Rule Sch. T 4 (Rule for cardinality). The cardinality to constraint the objects and attributes is kept unchanged after translation. Thus the structure information of the original XML schema can be kept. 14
The ORA-SS after the three-step translations • “stu_emp”, “per”, “cou” and “emp” are namespaces [11] to refer to Student_Employee, Person, Course and Employee ontologies • “rdf” is the namespace to refer to the RDF ontology language stu_emp: Student_Employee rdf: ID per: name stu: contact_number cou: take, 2, 3: 8, 4: n cou: Course emp: part_time 2, 0: 1, 0: n emp: Job cou: take rdf: ID cou: name cou: grade emp: position The ORA-SS schema diagram after the three-step translations. [11] Namespaces in XML, World Wide Web Consortium 14 -January-1999. http: //www. w 3. org/TR/REC-xml-names/ 15
The XML file after the three-step translations <stu_emp: Student_Employee rdf: id="HD 1234567"> <per: name>John</per: name> <stu: contact_number>9876543 </stu: contact_number> <cou: take> <cou: Course rdf: id="CS 4321"> <cou: name>cou: database</cou: name> <cou: grade>A</cou: grade> </cou: Course> </cou: take> <emp: part_time> <emp: Job> <emp: position>emp: programmer </emp: position > </emp: Job> </emp: part_time> </stu_emp: Student_Employee> The XML file after semantic and structural translation 16
Related work – Tools to translate present web to semantic web • SHOE Knowledge Annotator [5] – Annotate HTML – Manual tool • Aero. DAML [7] – Annotate HTML – Automatic tool – A single predefined ontology which includes all the concepts for different domains in [5] Jeff Heflin and James Hendler. A Portrait of the Semantic Web in Action. IEEE Intelligent ystems, 16(2), 2001. [7] Paul Kogut, and William Holmes. Aero. DAML: Applying Information Extraction to Generate DAML Annotations from Web Pages. K-CAP 2001 Workshop, October 21, 2001. 17
Related work – Tools to translate present web to semantic web • Onto. Parser [2] – Translate XML to satisfy the RDF structure – Only structural translation • The translation is very simple which only add some <terms>, <rdf: Seq>, <rdf: li> etc. among the elements to make the elements are resource, property interleaved. • The translation is not based on the semantics of the elements, thus the semantics between two elements (between two resources or two properties) are still not clear after translation. [2] Avigdor Gal , Ami Eyal, Haggai Roitman, Hasan Jamil, Ateret Anaby-Tavor, and Giovanni Modica. Onto. Parser: an XML 2 RDF translator of Onto. Builder ontologies, Onto. Builder project. 2004. 18
Related work – Tools to translate present web to semantic web • Our translation – Semantic translation • Onto. Parser does not have this translation. • The search to ontologies is only at some related paths of the genetic model, less concepts need to be traversed and less confused concepts are returned (compared to Aero. DAML). • Automatic translation (compared to the manual tool SHOE Knowledge Annotator) – Structural translation • Aero. DAML and SHOE Knowledge Annotator does not have this translation. • Our structural translation are based on the semantic translation. The inserted resources or properties have clearer semantics, not just <terms>, etc. (compared to Onto. Parser). – Schematic translation • Discuss how to process the default, fixed value, cardinality etc. constraints in the XML schemas, which is absent in Aero. DAML, SHOE Knowledge Annotator and Onto. Parser. 19
Conclusion • Three translations to translate XML to semantic web – Semantic translation – Structural translation – Schematic translation • Schemas are translated firstly, then the XML files confirming to the schemas can be translated easily, which improves the efficiency of translation. • Organize ontologies based on the genetic model. – The searching to ontologies is only at several related paths of the genetic model, thus less concepts need to be traversed and less confused concepts will be returned, and the rules introduced in this paper make the semantics of the returned concepts clearer. 20
Reference [1] Dan Brickley and R. V. Guha. Resource Description Framework (RDF) Schema Specification 1. 0, W 3 C Candidate Recommendation 27 March 2000. [2] Avigdor Gal , Ami Eyal, Haggai Roitman, Hasan Jamil, Ateret Anaby-Tavor, and Giovanni Modica. Onto. Parser: an XML 2 RDF translator of Onto. Builder ontologies, Onto. Builder project. 2004. [3] Frank van Harmelen, Jim Hendler, Ian Horrocks, Deborah L. Mc. Guinness, Peter F. Patel chneider and Lynn Andrea Stein. OWL Web Ontology Language Reference. [4] Frank van Harmelen, Peter F. Patel-Schneider, and Ian Horrocks. Reference description of the DAML+OIL (March 2001) ontology markup language [5] Jeff Heflin and James Hendler. A Portrait of the Semantic Web in Action. IEEE Intelligent ystems, 16(2), 2001. [6] I. Horrocks, D. Fensel, J. Broekstra, S. Decker, M. Erdmann, C. Goble, F. van Harmelen, M. Klein, S. Staab, R. Studer, and E. Motta. The Ontology Inference Layer OIL. [7] Paul Kogut, and William Holmes. Aero. DAML: Applying Information Extraction to Generate DAML Annotations from Web Pages. K-CAP 2001 Workshop, October 21, 2001. [8] Ora Lassila and Ralph R. Swick: Resource description framework (RDF). 1999. [9] Tok Wang Ling, Mong Li Lee, Gillian Dobbie. Semistructured Database Design, Springer, 2005 [10] Lynn Andrea Stein, Dan Connolly, and Deborah Mc. Guinness. DAML Ontology language specification. October 2000 [11] Namespaces in XML, World Wide Web Consortium 14 -January-1999. http: //www. w 3. org/TR/REC-xml-names/ 21
- Slides: 21