Integrating Language Understanding agents into the Semantic Web
Integrating Language Understanding agents into the Semantic Web Akshay Java, Tim Finin, Sergei Nirenburg 11/04/2005
Outline • • Motivation: Language Understanding Agents Ontological Semantics Bridging the Knowledge Gap Preliminary Evaluation Sem. News: An Application Testbed Conclusion Q&A
Motivation • Intelligent agents need knowledge and information. • Majority of content on the web remains in NL text. • SW can benefit NLP tools in their language understanding task Facts from NL Text Images WWW Audio video Web of documents NLP Tools Natural Language RDF/OWL Semantic Web Ontologies Instances triples structured information Web of data
Motivation Provides RDF version of the news. Language Understanding Agents
Ontological Semantics Onto. Sem is a Natural Language Processing System that processes the text and converts them into facts. Supported by a constructed world model encoded in a rich Ontology.
Ontological Semantics Text Meaning Representation (TMR) Input Text Preprocessor Grammar: Ecology Morphology Syntax Static Knowledge Resources Syntactic Analyzer Lexicon and Onomasticon Semantic Analyzer Ontology and Fact Repository
Mapping Onto. Sem to web based KR • Onto. Sem ontology is a frame based representation ONTOLOGY : : = CONCEPT+ CONCEPT : : = ROOT | OBJECT-OR-EVENT | PROPERTY SLOT : : = PROPERTY | FACET | FILLER • Translating Onto. Sem Ontology deals with mapping its semantics into corresponding OWL representation. • Onto. Sem’s supporting fact repositories are also mapped to OWL. • The text meaning representation of the sentences is now converted to OWL.
Mapping Onto. Sem to web based KR Fact Repository NL Text Lexicon Onto. Sem TMR Ontology Onto. Sem 2 OWL TMRs In OWL Ontology
Mapping Rules for Classes Onto. Sem LISP version (make-frame patent ( definition (value (common "the exclusive right to make, use or sell an invention, which is granted to the inventor"))) ( is-a (value (common intangible-asset legal-right)))) OWL Version: • • • • <owl: Class rdf: about="&ontosem; patent"> <rdfs: sub. Class. Of> <owl: Class rdf: about="&ontosem; intangible-asset"> </owl: Class> </rdfs: sub. Class. Of> <owl: Class rdf: about="&ontosem; legal-right"> </owl: Class> </rdfs: sub. Class. Of> <rdfs: comment>he exclusive right to make, use or sell an invention, which is granted to the inventor </rdfs: label> </owl: Class>
Mapping Rules for Properties • Properties can be • Object. Property owl: Object. Property • Datatype Property owl: Datatype. Property • • • Property hierarchy is defined by owl: sub. Property. Of Domain maps to rdfs: domain Range maps to rdfs: range Restrictions are handled using owl: Restriction Numeric datatypes are handled using XSD
Mapping Rules for Properties… (make-frame controls (domain (sem (common physical-event physical-object social-event social-role))) (range (sem (common actualize artifact natural-object social-role))) (is-a (value (common relation))) (inverse (value (common controlled-by))) (definition (value (common "A relation which relates concepts to what they can control"))))
Mapping Rules for Properties… (make-frame <owl: Object. Property rdf: ID= "controls"> <rdfs: domain> <owl: Class> <owl: union. Of rdf: parse. Type="Collection"> <owl: Class rdf: about="#physical-event"/> <owl: Class rdf: about="#physical-object"/> <owl: Class rdf: about="#social-event"/> <owl: Class rdf: about="#social-role"/> </owl: union. Of> </owl: Class> </rdfs: domain> <rdfs: range> <owl: Class> <owl: union. Of rdf: parse. Type="Collection"> <owl: Class rdf: about="#actualize"/> <owl: Class rdf: about="#artifact"/> <owl: Class rdf: about="#natural-object"/> <owl: Class rdf: about="#social-role"/> </owl: union. Of> </owl: Class> </rdfs: range> <rdfs: sub. Property. Of> <owl: Object. Property rdf: about="#relation"/> </rdfs: sub. Property. Of> <owl: inverse. Of rdf: resource="#controlled-by"/> <rdfs: label> "A relation which relates concepts to what they can control" </rdfs: label> </owl: Object. Property> (domain (range (is-a (inverse
Mapping Rules for Facets are a way to restricting the fillers that can be used for a particular slot • SEM and VALUE • Maps them using owl: Restriction on a particular property. • RELAXABLE-TO • Add this to the classes present in owl: Restriction and add this information in the annotation. • DEFAULT • No clear way to represent non-monotonic reasoning and closed world assumptions in Semantic Web. • DEFAULT-MEASURE • similar to DEFAULT Facet, not handled. • DEFAULT, DEFAULT-MEASURE used relatively less frequently • NOT • Not facet can be handled using owl: disjoint. Of • INV • need not be handled since is-a slot is already mapped to owl: inverse. Of
Mapping Rules Property Related Constructs Case Frequency Mapped Using 1 domain 617 rdfs: domain 2 domain with not facet 16 owl: disjoint. With 3 range 406 rdfs: range 4 range with not facet 5 owl: disjoint. With 5 inverse 260 owl: inverse. Of
Mapping Rules Facet related constructs Case Frequency Mapped Using 1 value 18217 owl: Restriction 2 sem 5686 owl: Restriction 3 relaxable-to 95 annotation 4 default 350 Not handled 5 default-measure 612 Not handled 6 not 134 owl: disjoint. With 7 inv 1941 Not required
Translating TMR 2 OWL Translating TMRs involves instantiation of concepts mapped in OWL. Example: (COME-1740 (TIME (VALUE (COMMON (FIND-ANCHOR-TIME)))) (DESTINATION (VALUE (COMMON CITY-1740))) (AGENT (VALUE (COMMON POLITICIAN-1740))) (ROOT-WORDS (VALUE (COMMON (ARRIVE)))) (WORD-NUM (VALUE (COMMON 2))) (INSTANCE-OF (VALUE (COMMON COME))) <ontosem: come rdf: about="COME-1740"> <ontosem: destination rdf: resource="#CITY-1740"/> <ontosem: agent rdf: resource="#POLITICIAN-1740"/> </ontosem: come>
Evaluation Built Ontology translation tool using Jena API Total Triples Generated ~ 102189 (including bnode) Time to build the Model ~ 10 -40 sec Time to do RDFS Inference ~ 10 sec Swoop Pellet Wonderweb http: //w 3 c. org/RDF/Validator/ Time to do OWL Micro ~ 40 sec Time to do OWL Full ~ ? ? DL Expressivity: ELUIH EL - Conjunction and Full Existential Quantification After Translation U - Union H - Role Hierarchy Total Number of Classes: 7747 (Defined: 7747, Imported: 0) I - Role Inverse Total Number of Datatype Properties: 0 (Defined: 0, Imported: 0) OWL FULL Total Number of Object Properties: 604 (Defined: 604, Imported: 0) Total Number of Annotation Properties: 1 (Defined: 1, Imported: 0) Total Number of Individuals: 0 (Defined: 0, Imported: 0) NOTE: This is using no Restrictions
Evaluation • Syntactic Correctness: was checked using OWL/RDF validators. • Semantic Validation: Full semantic validation even for subsets of OWL is difficult. • Meaning Preservation: some subset of the native representation features such as DEFAULTS, modality, case roles may be underrepresented or not handled. • Feature Minimization: Complex features could be difficult for reasoners to handle hence we can perform the translations at each of the levels – OWL Lite, OWL DL, OWL Full. • Translation Complexity: Onto. Sem is an extensive and large ontology (~8000 concepts). Translation itself is done syntactically but in general translation might require reasoning which could be an issue.
Reasoning Capabilities Finding Transitive Closures (RDFS reasoning) Buildfile: build. xml init: compile: dist: [jar] Building jar: /home/aks 1/software/eclipse/workspace/ontojena/dist/lib/ontojena. jar Inferred Triples run: [java] MODEL OK [java] Resource: http: //ontosem. org/#fire-engine [java] - (http: //ontosem. org/#fire-engine rdfs: sub. Class. Of http: //ontosem. org/#fire-engine) [java] - (http: //ontosem. org/#fire-engine rdfs: sub. Class. Of http: //ontosem. org/#all) [java] - (http: //ontosem. org/#fire-engine rdfs: sub. Class. Of http: //ontosem. org/#physical-object) [java] - (http: //ontosem. org/#fire-engine rdfs: sub. Class. Of http: //ontosem. org/#inanimate) [java] - (http: //ontosem. org/#fire-engine rdfs: sub. Class. Of http: //ontosem. org/#wheeled-vehicle) [java] - (http: //ontosem. org/#fire-engine rdfs: sub. Class. Of http: //ontosem. org/#engine-propelled-vehicle) [java] - (http: //ontosem. org/#fire-engine rdfs: sub. Class. Of http: //ontosem. org/#wheeled-engine-vehicle) [java] - (http: //ontosem. org/#fire-engine rdfs: sub. Class. Of http: //ontosem. org/#artifact) [java] - (http: //ontosem. org/#fire-engine rdfs: sub. Class. Of http: //ontosem. org/#object) [java] - (http: //ontosem. org/#fire-engine rdfs: sub. Class. Of http: //ontosem. org/#land-vehicle) [java] - (http: //ontosem. org/#fire-engine rdfs: sub. Class. Of http: //ontosem. org/#truck) [java] - (http: //ontosem. org/#fire-engine rdfs: label ' "a truck with equipment for fighting fires"') [java] - (http: //ontosem. org/#fire-engine rdf: type owl: Class) [java] fire-engine recognized as subclas of vehicle BUILD SUCCESSFUL Total time: 10 seconds real 0 m 11. 144 s user 0 m 9. 530 s sys 0 m 0. 190 s [aks 1@trishuli ontojena]$ vehicle Land-vehicle Engine-propelled--vehicle Wheeled-engine-vehicle Truck Fire-engine
An Application Testbed: Sem. News • Sem. News: Semantically Search and Browser news • Aggregators collect the RSS news descriptions form various sources. • The sentences are processed by Onto. Sem and are converted into Text Meaning Representations (TMRs) • Provides intelligent agents with the latest news in a machine readable format http: //semnews. umbc. edu
Fact Repository Interface Language Processing Data Aggregators 1 11 2 RSS Aggregator Ontology & Instance browser Onto. Sem 3 4 News Feeds FR TMRs Text Search 12 RDQL Query 13 Swoogle Index 14 6 5 Onto. Sem 2 OWL Dekade Editor Onto. Sem Ontology (OWL) Knowledge Editor Environment 9 7 8 Inferred 10 Triples TMR Semantic Web Tools http: //semnews. umbc. edu Semantic RSS 15
Agent understandable news Provides RDF version of the news. http: //semnews. umbc. edu
Semantacizing RSS View structured representation of the RSS news story. Future versions would enable editing the facts and provide provenance information http: //semnews. umbc. edu
News stories are ontologically linked Find news stories by browsing through the Onto. Sem ontology. http: //semnews. umbc. edu
Tracking Named Entities Find stories about a specific named entity. http: //semnews. umbc. edu
Browsing Facts Fact repository explorer for named entity ‘Mexico’ shows that it has a relation ‘nationality-of’ with CITIZEN-235 Fact repository explorer for instance CITIZEN 235 shows that the citizen is an agent of ESCAPE-EVENT http: //semnews. umbc. edu
Querying the semanticized RSS RDQL Queries Provides structured querying over text converted into RDF representat ion. http: //semnews. umbc. edu
Semantic Alerts can be specified as ontological concepts/ keywords / RDQL queries. Subscribe to results of structured queries http: //semnews. umbc. edu
Conclusions • Integrating language processing agents into the SW would publish SW annotations and documents that capture the text’s meaning. • Migrating from native non-web based representation to SW representation may be loss-full but is still useful for many applications. • Sem. News application testbed demonstrates some scenarios that can benefit from language understanding agents.
Q&A Thank you. http: //ebiquity. umbc. edu http: //semnews. umbc. edu
References Software Used [1] Onto. Sem http: //ilit. umbc. edu/ [2] RDF Validation service http: //w 3 c. org/RDF/Validator [3] Jena Toolkit http: //jena. sourceforge. net/ [4] Swoop Ontology Viewer http: //www. mindswap. org/2004/SWOOP/ [5] Pellet OWL DL Reasoner http: //www. mindswap. org/2003/pellet/ [6] Wonder Web OWL Validator http: //phoebus. cs. man. ac. uk: 9999/OWL/Validator Papers [1] Sergei Nirenburg and Victor Raskin, Ontological Semantics, Formal Ontology and Ambiguity [2] Sergei Nirenburg and Victor Raskin, Ontological Semantics, MIT Press, Forthcoming [3] Sergei Nirenburg, Ontological Semantics: Overview, Presentation CLSP JHU, Spring 2003 [4] Marjorie Mc. Shane, Sergei Nirenburg, Stephen Beale, Margalit Zabludowski, The Cross Lingual Reuse and Extension of knowledge Resources in Ontological Semantics [5] P. J Beltran-Ferruz, P. A Gonzalez-Calero, P. Gervas Converting Mikrokosmos frames into Description Logics. [6] Sergei Nirenburg, Ontology Tutorial, ILIT UMBC Mailing Lists [1] Jena Developers jena-dev@yahoogroups. com [2] pellet users pellet-users@lists. mindswap. org [3] Semantic web semanticweb@yahoogroups. com [4] W 3 c RDF Interest www-rdf-interest@w 3. org [5] W 3 c Semantic web semantic-web@w 3. org
Backup slides
Static Knowledge Sources • • • Ontology 8000 concepts Avg 16 properties each English Lexicon 45000 entries Spanish Lexicon 40000 entries Chinese Lexicon 3000 entries Fact repository 20000 facts [Sergei Nirenburg, Ontological Semantics: Overview, Presentation CLSP JHU, Spring 2003]
Text Meaning Representation (TMR)
Text Meaning Representation (TMR) He asked the UN to authorize the war. REQUEST-ACTION-69 AGENT HUMAN-72 THEME ACCEPT-70 BENEFICIARY ORGANIZATION-71 SOURCE-ROOT-WORD ask TIME (< (FIND-ANCHOR-TIME)) ACCEPT-70 THEME WAR-73 THEME-OF REQUEST-ACTION-69 SOURCE-ROOT-WORD authorize ORGANIZATION-71 HAS-NAME United-Nations BENEFICIARY-OF REQUEST-ACTION-69 SOURCE-ROOT-WORD UN Example from [Marjorie Mc. Shane, Sergei Nirenburg, Stephen Beale, Margalit Zabludowski, The Cross Lingual Reuse and Extension of knowledge Resources in Ontological Semantics] HUMAN-72 HAS-NAME Colin Powell AGENT-OF REQUEST-ACTION-69 SOURCE-ROOT-WORD he ; reference resolution has been carried out WAR-73 THEME-OF ACCEPT-70 SOURCE-ROOT-WORD war
PROPERTY FACET The Onto. Sem Ontology FILLER
- Slides: 36