Development and Alignment of a Domain Specific Ontology

  • Slides: 18
Download presentation
Development and Alignment of a Domain. Specific Ontology for Question Answering Shiyan Ou 1,

Development and Alignment of a Domain. Specific Ontology for Question Answering Shiyan Ou 1, Viktor Pekar 1, Constantin Orasan 1, Christian Spurk 2, Matteo Negri 3 1 Research Group in Computational Linguistics, University of Wolverhampton, UK 2 German Research Centre for Artificial Intelligence Gmb. H (DFKI), Germany 3 Fondazione Bruno Kessler – FBK, Italy

Structure 1. 2. 3. 4. Introduction to QALL-ME The QALL-ME ontology Alignment to Word.

Structure 1. 2. 3. 4. Introduction to QALL-ME The QALL-ME ontology Alignment to Word. Net and SUMO How the ontology is used for data encoding 5. Conclusions 2

Introduction to QALL-ME § QALL-ME (Question Answering Learning technologies Multilingual Multimodal Environment) is an

Introduction to QALL-ME § QALL-ME (Question Answering Learning technologies Multilingual Multimodal Environment) is an EU-funded project which aims to establish a shared infrastructure for multilingual and multimodal question answering in the domain tourism. § Project’s website: http: //qallme. fbk. eu/ § In the QALL-ME system § users pose natural language questions in several languages (both in textual and speech modality) using a variety of input devices (e. g. mobile phones), and § returns a list of specific answers formatted in the most appropriate modality, ranging from small texts, maps, videos, and pictures. § A domain-specific ontology for the tourism domain was developed and shared among all the partners. 3

The ontology in the project WP 3: Multilingual question interpretation WP 4 QALL-ME ontology

The ontology in the project WP 3: Multilingual question interpretation WP 4 QALL-ME ontology Annotation of entities Indexing of data Retrieval of data WP 5: Multilingual answer extraction WP 9: Evaluation See more in O 39: Multilingual Resources (Ambasadeurs) at 13: 05 4

Design of the ontology § Analysis of data from content providers § Analysis of

Design of the ontology § Analysis of data from content providers § Analysis of users’ requirements § Inspired by similar ontologies such as Harmonise, e. Tourism, Hi-Touch, TAGA, GETESS: § Harmonise and e. Tourism: focus on static information (e. g. accommodation and events/activities), rather than dynamic information related to travel business (e. g. customers and itineraries) as the TAGA and Hi-Touch ontologies do. § Similar to e. Tourism as is written in OWL rather RDFs § but wider coverage than each individual existing ontology § Introspection 5

Technical details of the ontology § Encoded using OWL DL, since it has more

Technical details of the ontology § Encoded using OWL DL, since it has more expressive power than OWL Lite and has more efficient reasoning support than OWL Full § Used Protégé-OWL as the editor and Racer. Pro 7 as the reasoner § The ontology contains § 122 classes (concepts), § 55 datatype properties and § 52 object properties which indicate the relationships among the 122 classes. § 15 top-level classes. § The class hierarchy has a maximum depth of 4. 6

Part of the ontology (cinema/movies) 7

Part of the ontology (cinema/movies) 7

Ontology alignment § § The QALL-ME ontology was designed as a model of the

Ontology alignment § § The QALL-ME ontology was designed as a model of the narrow knowledge domain of tourism. The QALL-ME ontology was complemented with information from Word. Net (and implicitly Multi. Word. Net) and SUMO via alignment The QALL-ME ontology is being changed so fully manual alignment was not a solution Fully automatic alignment is not precise enough, but maybe semi-automatic alignment is a solution 8

Ontology alignment (II) § The alignment relied on: 1. String similarity of element identifiers

Ontology alignment (II) § The alignment relied on: 1. String similarity of element identifiers (e. g. chalet_1, Site. Facility. For. Children facility_*) 2. Structural similarity for disambiguation (e. g. uses the semantic distance to aligned concepts) 3. Definition similarity for disambiguation (similarity between comments in the ontology and Word. Net glosses is used) 4. Structural similarity for unmatched concepts is calculated to all the nouns in Word. Net 9

Ontology alignment (III) § The overall accuracy of the fully automatic alignment is clearly

Ontology alignment (III) § The overall accuracy of the fully automatic alignment is clearly suboptimal (precision of 49% and recall of 31%), § Error analysis: § We noticed that for concept names with unambiguous matches in Word. Net the algorithm performs without any errors § The poor disambiguation performance is due to the very different depths of the two ontologies § Only a few concepts have comments which are useful for definition similarity § Semi-automatic alignment requires under 30 minutes to obtain “perfect” alignment 10

Example of alignment QALL-ME SUMO WN 2. 1 gloss Accommodation @inhabits =02647858 living quarters

Example of alignment QALL-ME SUMO WN 2. 1 gloss Accommodation @inhabits =02647858 living quarters provided for public convenience; "overnight accommodations are available" Chalet @Building =02973228 a Swiss house with a sloping roof and wide eaves or a house built in this style Post. Office @Organization =08034771 an independent agency of the federal government responsible for mail delivery 11

Semantic annotation and database organization § The ontology was used to encode the data

Semantic annotation and database organization § The ontology was used to encode the data § Annotated data from the content providers was converted to RDF triplets § The RDF documents can be stored in databases or plain text files § The Jena RDF API was used for the operations 12

Semantic annotation and database organization XML Schema Define QALL-ME Ontology Determine XML Documents Determine

Semantic annotation and database organization XML Schema Define QALL-ME Ontology Determine XML Documents Determine Transform RDF Documents Convert Download Database HTML Parser World Wide Web 13

Content retrieval § For retrieval SPARQL is used § SPARQL is a query language

Content retrieval § For retrieval SPARQL is used § SPARQL is a query language for accessing RDF graphs by the W 3 C RDF Data Access Working Group § SPARQL provides interoperability between languages 14

What movie starring Halle Berry is on in Birmingham? Class: Movie. Show Property: is.

What movie starring Halle Berry is on in Birmingham? Class: Movie. Show Property: is. In. Site, Range: Cinema Property: has. Postal. Address, Range: Postal. Address Property: is. In. Destination, Range: Destination Property: name, Range: string <Birmingham> Property: has. Event. Content, Range: Movie Property: name, Range: string <unknown> Property: has. Star, Range: Star Property: name, Range: string <Halle Berry> 15

PREFIX qme: http: //qallme. itc. it/ontology/qallme-tourism. owl# PREFIX xsd: http: //www. w 3. org/2001/XMLSchema#

PREFIX qme: http: //qallme. itc. it/ontology/qallme-tourism. owl# PREFIX xsd: http: //www. w 3. org/2001/XMLSchema# SELECT ? movie. Name WHERE { ? Movie. Show qme: is. In. Site ? Cinema qme: has. Postal. Address ? Postal. Address qme: is. In. Destination ? Destination qme: name “Birmingham”^^<xsd: string> ? Movie. Show qme: has. Event. Content ? Movie qme: name ? movie. Name. ? Movie qme: has. Star ? Star qme: name “Halle Berry”^^<xsd: string> } 16

Conclusions § The QALL-ME ontology was specifically designed for the domain of tourism §

Conclusions § The QALL-ME ontology was specifically designed for the domain of tourism § The ontology is playing an important role in several parts of the project § The current ontology went through several revisions before reaching the current stage (and it may change again!!!) § Both the ontology and its alignment to Word. Net and SUMO will be made freely available on the project’s website 17

Thank you ! Project’s website: http: //qallme. fbk. eu/ 18

Thank you ! Project’s website: http: //qallme. fbk. eu/ 18