Human Language Technology in Ontology Engineering Ontology Learning

  • Slides: 39
Download presentation
Human Language Technology in Ontology Engineering Ontology Learning from Text Paul Buitelaar DFKI Gmb.

Human Language Technology in Ontology Engineering Ontology Learning from Text Paul Buitelaar DFKI Gmb. H Language Techology Lab DFKI Competence Center Semantic Web Saarbrücken, Germany © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Overview Þ HLT and Ontology Engineering Automated Linguistic Analysis Ontology Learning from Text Further

Overview Þ HLT and Ontology Engineering Automated Linguistic Analysis Ontology Learning from Text Further Issues: Evaluation Conclusions © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Ontology Lifecycle Populating Validating Creating Deploying Evolving Maintaining © Paul Buitelaar: Knowledge. Web Summer

Ontology Lifecycle Populating Validating Creating Deploying Evolving Maintaining © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

HLT in the Ontology Lifecycle HLT for Ontology Learning and Population from Text Ontology

HLT in the Ontology Lifecycle HLT for Ontology Learning and Population from Text Ontology Learning Development & Evolution Linguistic Analysis to Extract Classes / Relations Classes, Relations/Properties Ontology (Knowledge) Ontology Population Knowledge Base Generation Instances Linguistic Analysis to Extract Instances Human Language Technology = Automated Linguistic Analysis © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004 Documents (Text)

Automated Linguistic Analysis © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Automated Linguistic Analysis © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Linguistic Analysis: Example The Dell computer with a flat screen had to be rejected

Linguistic Analysis: Example The Dell computer with a flat screen had to be rejected because of a failure in the motherboard. flat screen Dell computer has-a reject has-a animate-entity failure motherboard location-of © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Levels of Linguistic Analysis Lexical Analysis Word Class: Part-of-Speech (also Semantic Class) Word Structure:

Levels of Linguistic Analysis Lexical Analysis Word Class: Part-of-Speech (also Semantic Class) Word Structure: Morphology Phrase Analysis Sentence Structure: Phrases (if ‘shallow’: Chunks) Semantic Units Dependency Structure Analysis Sentence Meaning: Predicate Argument Structure (Clause) Semantic Structure © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Part-of-Speech, Morphology Part-of-Speech Þ e. g. : noun, verb, adjective, preposition, … Po. S

Part-of-Speech, Morphology Part-of-Speech Þ e. g. : noun, verb, adjective, preposition, … Po. S tag sets may have between 10 and 50 (or more) tags Morphology 1. Most languages have inflection and declination, e. g. : Singular/Plural Present/Past computer, computers reject, rejected Many languages have also complex (de)composition, e. g. : Flachbildschirm (flat screen) > flach + Bildschirm > flach + Bild + Schirm © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Phrases, Terms, Named Entities Semantic Units 1. Phrases (e. g. nominal - NP, prepositional

Phrases, Terms, Named Entities Semantic Units 1. Phrases (e. g. nominal - NP, prepositional - PP) NP PP NP (recursive) a flat screen with a flat screen the Dell computer with a flat screen a failure in the motherboard Terms (domain-specific phrases) Dell computer with a flat screen Named Entities (phrases corresponding to dates, names, …) COMPANY PERSON Dell Computer Corporation Michael Dell © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Dependency Structure (I) Semantic Structure Dependencies between Predicates and Arguments the Dell computer with

Dependency Structure (I) Semantic Structure Dependencies between Predicates and Arguments the Dell computer with a flat screen had to be rejected PRED: reject ARG 1: ENTITY ARG 2: ‘the Dell computer with a flat screen’ ‘Logical Form’ : reject(x, y) & animate-entity(x) & computer(y) & … Þ Dependency Structure Analysis is based on: Sub-categorization Frames reject : : Subj: NP, Obj: NP Selection Restrictions reject : : Subj: NP: ANIMATE-ENTITY, Obj: NP: ENTITY © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Dependency Structure (II) The Dell computer that has been rejected was claimed to have

Dependency Structure (II) The Dell computer that has been rejected was claimed to have suffered from handling. reject(e 1, x 1, y 1) & animate-entity(x 1) & Dell_computer(y 1) & claim(e 2, x 2, e 3) & animate-entity(x 2) & suffer_from(e 3, y 1, y 2) & handling (y 2) PRED claim < NULL, XCOMP > PRED computer MOD Dell claim y 1 SUBJ y 1 ADJUNCT PRED reject < NULL, SUBJ > SUBJ y 1 PRED suffer < SUBJ, OBL-from > SUBJ y 1 OBL-from handling suffer ADJUNCT MOD reject Dell XCOMP SUBJ y 1 Lexical Functional Grammar (LFG) © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004 y 1 : computer OBL-from handling

Ontology Learning from Text © Paul Buitelaar: Knowledge. Web Summer School, Spain - July

Ontology Learning from Text © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Some History Lexical Knowledge Extraction of lexical semantic representations (word meaning) from Machine Readable

Some History Lexical Knowledge Extraction of lexical semantic representations (word meaning) from Machine Readable Dictionaries – 70‘s/80‘s Extraction of semantic lexicons from corpora for Information Extraction systems - 80‘s/90‘s, e. g. CRYSTAL (Soderland) Answer extraction in Question Answering, e. g. Webclopedia (Hovy) Thesaurus Extraction Similar work, (complex, multilingual) term extraction e. g. Sextant (Grefenstette); DR-Link (Liddy) Ontology Learning from Text Similar work, (domain-specific) term / relation extraction e. g. Text. To. Onto (Maedche & Staab), Onto. Learn (Velardi et al. ) Discussed here: Onto. LT (Buitelaar, Olejnik & Sintek) © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Text. To. Onto Association Rules © Paul Buitelaar: Knowledge. Web Summer School, Spain -

Text. To. Onto Association Rules © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Onto. Learn Domain-Specific Word. Net Tuning and Extension © Paul Buitelaar: Knowledge. Web Summer

Onto. Learn Domain-Specific Word. Net Tuning and Extension © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Onto. LT: Some Background Text. To. Ontology Learning from Text Þ Taxonomy Extraction, Document

Onto. LT: Some Background Text. To. Ontology Learning from Text Þ Taxonomy Extraction, Document Clustering String-based, Document Level Onto. Learn Þ “Unnamed” Relation Extraction, Word Clustering Stemming & Part-of-Speech, Token Level Þ Extraction of Terms, “Named” Relations Pred-Arg & Head-Mod Structure, Term Level Text in Ontology Engineering Þ Textual Grounding of Concepts Retain Linguistic Contexts and Realizations Þ Text-based Ontology Monitoring Compare Language Use over Time © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Onto. LT: Some Background Ontology Learning from Text Þ Taxonomy Extraction, Document Clustering String-based,

Onto. LT: Some Background Ontology Learning from Text Þ Taxonomy Extraction, Document Clustering String-based, Document Level Þ “Unnamed” Relation Extraction, Word Clustering Stemming & Part-of-Speech, Token Level Þ Extraction of Terms, “Named” Relations Pred-Arg & Head-Mod Structure, Term Level Text in Ontology Engineering Onto. LT Þ Textual Grounding of Concepts Retain Linguistic Contexts and Realizations Þ Text-based Ontology Monitoring Compare Language Use over Time © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Onto. LT What is it? Onto. LT provides a middleware solution in ontology development

Onto. LT What is it? Onto. LT provides a middleware solution in ontology development that enables the ontology engineer to bootstrap or extend a domain-specific ontology from a relevant text collection How does it work? 1. 2. 3. 4. 5. automatic linguistic annotation automatic statistical preprocessing interactive definition of mapping rules interactive user validation of candidates automatic integration into an ontology © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Onto. LT: Architecture © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Onto. LT: Architecture © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Linguistic Annotation <sentence … > … An 40 Kniegelenkpräparaten wurden mittlere Patellarsehnendrittel mit einerligament

Linguistic Annotation <sentence … > … An 40 Kniegelenkpräparaten wurden mittlere Patellarsehnendrittel mit einerligament neuen third) (mid patellar Knochenverblockungstechnik in einem zweistufigen Bohrkanal femoral fixiert. <text> … … <token id="t 5" pos="ADJA" str="mittlere"> <text> <phrases> <lemma id="t 5. l 1">mittler</lemma> … </token> <phrases> <phrase id="p 2" from="t 5" to="t 6" type="NP"> <clauses> <token id="t 6" pos="NN" str="Patellarsehnendrittel"> … <mod from="t 5" to="t 5" /> <clause from="p 1" to="p 5" pred="p 5" type="pass"> <lemmaid="cl 1" id="t 6. l 1">patellar</lemma> <phrases> <head from="t 6" to="t 6" /> phrase="none" /> <arg id="a 1" type="SUBJ" <lemma id="t 6. l 2">Sehne</lemma> </phrase> <arg id="a 2" type="IOBJ" phrase="p 1"/> <lemma id="t 6. l 3">Drittel</lemma> <clauses> … <arg id="a 3" type="DOBJ" phrase="p 2" /> </token> … </phrases> …</clause> </clauses> </sentence> © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y,

Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y, e. g. Head. Noun, Modifier, Subject, … Concat. List combined through AND, OR, NOT, EQUAL Operators Create. Cls Add. Slot Create. Inst Fill. Slot create a new class with super-class add a slot with range to a new or existing class introduce an instance for a new or existing class set the value of a slot of an instance © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y,

Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y, e. g. Head. Noun, Modifier, Subject, … Concat. List combined through AND, OR, NOT, EQUAL Operators Create. Cls Add. Slot Create. Inst Fill. Slot create a new class with super-class add a slot with range to a new or existing class introduce an instance for a new or existing class set the value of a slot of an instance © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Example Experiment Ontology Extraction for Neurology Þ Neurology Section of a Medical Corpus Medical

Example Experiment Ontology Extraction for Neurology Þ Neurology Section of a Medical Corpus Medical Scientific Journal Abstracts – Much. More Project Þ XML-based Linguistic Annotation Po. S, Lemmatization, Phrases, Pred-Arg Structure Þ Statistical Preprocessing (chi-square) Select Domain-Relevant Linguistic Entities Þ Definition of Mapping Rules Define Operators for Selected Linguistic Entities Þ Generate & Validate Class/Slot Candidates Select Candidates for Integration in Neurology Ontology Þ Generate “Ontology Fragments” for Neurology © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Further Issues Future Development Þ Organization of Class/Slot Candidate List Inference & Clustering -

Further Issues Future Development Þ Organization of Class/Slot Candidate List Inference & Clustering - “Graph Restructuring” Þ Extend Statistical Preprocessing Multiple Reference Corpora Extended Frequency Information Þ Include Machine Learning Approach Semi-Automatic Definition of Mapping Rules Performance Evaluation Þ Guidelines ECAI 04 Workshop on OLP Þ Benchmark Challenge within PASCAL No. E © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Evaluation: What? -- Subtasks Þ Classes (Multilingual) Term Extraction Named-Entity Recognition Similarity Thesaurus Term,

Evaluation: What? -- Subtasks Þ Classes (Multilingual) Term Extraction Named-Entity Recognition Similarity Thesaurus Term, Document Clustering Þ Class-Hierarchy (Taxonomy) Thesaurus Extraction Term, Document Clustering Þ Class-Properties (Relations) Relation Extraction ? Formal Properties of Relations (Properties) Þ Class-Instances (Individuals) (Multilingual) Term Extraction Named-Entity Recognition Term, Document Classification © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Evaluation: How? By Sub-Task – Evaluation of: Þ Classes – Term, NE Extraction, Clustering

Evaluation: How? By Sub-Task – Evaluation of: Þ Classes – Term, NE Extraction, Clustering Þ Class-Hierarchy – Thesaurus Extraction Þ Class-Properties – Relation Extraction Þ Class-Instances – Term, NE Extraction, Classification By Application – Evaluation of: Þ Ontology Learning and Population – Gold Standard Þ IR, QA – Precision /Recall Increase with Ontology? Þ Interactive QA – Increased User Satisfaction? Þ Information Access – Increased User Performance? © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004

Conclusions Stay Tuned Þ Onto. LT Release To be Announced on Protégé-Discussion List http:

Conclusions Stay Tuned Þ Onto. LT Release To be Announced on Protégé-Discussion List http: //protege. stanford. edu/mailing-lists Þ Evaluation Ontology Learning & Population (OLP) Challenge Within PASCAL No. E - First Task Spring 2005 ECAI 04 Workshop: Evaluation of Text-based OLP http: //olp. dfki. de/ECAI 04/cfp. htm © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004