Human Language Technology in Ontology Engineering Ontology Learning
- Slides: 39
Human Language Technology in Ontology Engineering Ontology Learning from Text Paul Buitelaar DFKI Gmb. H Language Techology Lab DFKI Competence Center Semantic Web Saarbrücken, Germany © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Overview Þ HLT and Ontology Engineering Automated Linguistic Analysis Ontology Learning from Text Further Issues: Evaluation Conclusions © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Ontology Lifecycle Populating Validating Creating Deploying Evolving Maintaining © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
HLT in the Ontology Lifecycle HLT for Ontology Learning and Population from Text Ontology Learning Development & Evolution Linguistic Analysis to Extract Classes / Relations Classes, Relations/Properties Ontology (Knowledge) Ontology Population Knowledge Base Generation Instances Linguistic Analysis to Extract Instances Human Language Technology = Automated Linguistic Analysis © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004 Documents (Text)
Automated Linguistic Analysis © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Linguistic Analysis: Example The Dell computer with a flat screen had to be rejected because of a failure in the motherboard. flat screen Dell computer has-a reject has-a animate-entity failure motherboard location-of © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Levels of Linguistic Analysis Lexical Analysis Word Class: Part-of-Speech (also Semantic Class) Word Structure: Morphology Phrase Analysis Sentence Structure: Phrases (if ‘shallow’: Chunks) Semantic Units Dependency Structure Analysis Sentence Meaning: Predicate Argument Structure (Clause) Semantic Structure © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Part-of-Speech, Morphology Part-of-Speech Þ e. g. : noun, verb, adjective, preposition, … Po. S tag sets may have between 10 and 50 (or more) tags Morphology 1. Most languages have inflection and declination, e. g. : Singular/Plural Present/Past computer, computers reject, rejected Many languages have also complex (de)composition, e. g. : Flachbildschirm (flat screen) > flach + Bildschirm > flach + Bild + Schirm © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Phrases, Terms, Named Entities Semantic Units 1. Phrases (e. g. nominal - NP, prepositional - PP) NP PP NP (recursive) a flat screen with a flat screen the Dell computer with a flat screen a failure in the motherboard Terms (domain-specific phrases) Dell computer with a flat screen Named Entities (phrases corresponding to dates, names, …) COMPANY PERSON Dell Computer Corporation Michael Dell © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Dependency Structure (I) Semantic Structure Dependencies between Predicates and Arguments the Dell computer with a flat screen had to be rejected PRED: reject ARG 1: ENTITY ARG 2: ‘the Dell computer with a flat screen’ ‘Logical Form’ : reject(x, y) & animate-entity(x) & computer(y) & … Þ Dependency Structure Analysis is based on: Sub-categorization Frames reject : : Subj: NP, Obj: NP Selection Restrictions reject : : Subj: NP: ANIMATE-ENTITY, Obj: NP: ENTITY © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Dependency Structure (II) The Dell computer that has been rejected was claimed to have suffered from handling. reject(e 1, x 1, y 1) & animate-entity(x 1) & Dell_computer(y 1) & claim(e 2, x 2, e 3) & animate-entity(x 2) & suffer_from(e 3, y 1, y 2) & handling (y 2) PRED claim < NULL, XCOMP > PRED computer MOD Dell claim y 1 SUBJ y 1 ADJUNCT PRED reject < NULL, SUBJ > SUBJ y 1 PRED suffer < SUBJ, OBL-from > SUBJ y 1 OBL-from handling suffer ADJUNCT MOD reject Dell XCOMP SUBJ y 1 Lexical Functional Grammar (LFG) © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004 y 1 : computer OBL-from handling
Ontology Learning from Text © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Some History Lexical Knowledge Extraction of lexical semantic representations (word meaning) from Machine Readable Dictionaries – 70‘s/80‘s Extraction of semantic lexicons from corpora for Information Extraction systems - 80‘s/90‘s, e. g. CRYSTAL (Soderland) Answer extraction in Question Answering, e. g. Webclopedia (Hovy) Thesaurus Extraction Similar work, (complex, multilingual) term extraction e. g. Sextant (Grefenstette); DR-Link (Liddy) Ontology Learning from Text Similar work, (domain-specific) term / relation extraction e. g. Text. To. Onto (Maedche & Staab), Onto. Learn (Velardi et al. ) Discussed here: Onto. LT (Buitelaar, Olejnik & Sintek) © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Text. To. Onto Association Rules © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Onto. Learn Domain-Specific Word. Net Tuning and Extension © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Onto. LT: Some Background Text. To. Ontology Learning from Text Þ Taxonomy Extraction, Document Clustering String-based, Document Level Onto. Learn Þ “Unnamed” Relation Extraction, Word Clustering Stemming & Part-of-Speech, Token Level Þ Extraction of Terms, “Named” Relations Pred-Arg & Head-Mod Structure, Term Level Text in Ontology Engineering Þ Textual Grounding of Concepts Retain Linguistic Contexts and Realizations Þ Text-based Ontology Monitoring Compare Language Use over Time © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Onto. LT: Some Background Ontology Learning from Text Þ Taxonomy Extraction, Document Clustering String-based, Document Level Þ “Unnamed” Relation Extraction, Word Clustering Stemming & Part-of-Speech, Token Level Þ Extraction of Terms, “Named” Relations Pred-Arg & Head-Mod Structure, Term Level Text in Ontology Engineering Onto. LT Þ Textual Grounding of Concepts Retain Linguistic Contexts and Realizations Þ Text-based Ontology Monitoring Compare Language Use over Time © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Onto. LT What is it? Onto. LT provides a middleware solution in ontology development that enables the ontology engineer to bootstrap or extend a domain-specific ontology from a relevant text collection How does it work? 1. 2. 3. 4. 5. automatic linguistic annotation automatic statistical preprocessing interactive definition of mapping rules interactive user validation of candidates automatic integration into an ontology © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Onto. LT: Architecture © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Linguistic Annotation <sentence … > … An 40 Kniegelenkpräparaten wurden mittlere Patellarsehnendrittel mit einerligament neuen third) (mid patellar Knochenverblockungstechnik in einem zweistufigen Bohrkanal femoral fixiert. <text> … … <token id="t 5" pos="ADJA" str="mittlere"> <text> <phrases> <lemma id="t 5. l 1">mittler</lemma> … </token> <phrases> <phrase id="p 2" from="t 5" to="t 6" type="NP"> <clauses> <token id="t 6" pos="NN" str="Patellarsehnendrittel"> … <mod from="t 5" to="t 5" /> <clause from="p 1" to="p 5" pred="p 5" type="pass"> <lemmaid="cl 1" id="t 6. l 1">patellar</lemma> <phrases> <head from="t 6" to="t 6" /> phrase="none" /> <arg id="a 1" type="SUBJ" <lemma id="t 6. l 2">Sehne</lemma> </phrase> <arg id="a 2" type="IOBJ" phrase="p 1"/> <lemma id="t 6. l 3">Drittel</lemma> <clauses> … <arg id="a 3" type="DOBJ" phrase="p 2" /> </token> … </phrases> …</clause> </clauses> </sentence> © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y, e. g. Head. Noun, Modifier, Subject, … Concat. List combined through AND, OR, NOT, EQUAL Operators Create. Cls Add. Slot Create. Inst Fill. Slot create a new class with super-class add a slot with range to a new or existing class introduce an instance for a new or existing class set the value of a slot of an instance © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y, e. g. Head. Noun, Modifier, Subject, … Concat. List combined through AND, OR, NOT, EQUAL Operators Create. Cls Add. Slot Create. Inst Fill. Slot create a new class with super-class add a slot with range to a new or existing class introduce an instance for a new or existing class set the value of a slot of an instance © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Example Experiment Ontology Extraction for Neurology Þ Neurology Section of a Medical Corpus Medical Scientific Journal Abstracts – Much. More Project Þ XML-based Linguistic Annotation Po. S, Lemmatization, Phrases, Pred-Arg Structure Þ Statistical Preprocessing (chi-square) Select Domain-Relevant Linguistic Entities Þ Definition of Mapping Rules Define Operators for Selected Linguistic Entities Þ Generate & Validate Class/Slot Candidates Select Candidates for Integration in Neurology Ontology Þ Generate “Ontology Fragments” for Neurology © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
© Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Further Issues Future Development Þ Organization of Class/Slot Candidate List Inference & Clustering - “Graph Restructuring” Þ Extend Statistical Preprocessing Multiple Reference Corpora Extended Frequency Information Þ Include Machine Learning Approach Semi-Automatic Definition of Mapping Rules Performance Evaluation Þ Guidelines ECAI 04 Workshop on OLP Þ Benchmark Challenge within PASCAL No. E © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Evaluation: What? -- Subtasks Þ Classes (Multilingual) Term Extraction Named-Entity Recognition Similarity Thesaurus Term, Document Clustering Þ Class-Hierarchy (Taxonomy) Thesaurus Extraction Term, Document Clustering Þ Class-Properties (Relations) Relation Extraction ? Formal Properties of Relations (Properties) Þ Class-Instances (Individuals) (Multilingual) Term Extraction Named-Entity Recognition Term, Document Classification © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Evaluation: How? By Sub-Task – Evaluation of: Þ Classes – Term, NE Extraction, Clustering Þ Class-Hierarchy – Thesaurus Extraction Þ Class-Properties – Relation Extraction Þ Class-Instances – Term, NE Extraction, Classification By Application – Evaluation of: Þ Ontology Learning and Population – Gold Standard Þ IR, QA – Precision /Recall Increase with Ontology? Þ Interactive QA – Increased User Satisfaction? Þ Information Access – Increased User Performance? © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
Conclusions Stay Tuned Þ Onto. LT Release To be Announced on Protégé-Discussion List http: //protege. stanford. edu/mailing-lists Þ Evaluation Ontology Learning & Population (OLP) Challenge Within PASCAL No. E - First Task Spring 2005 ECAI 04 Workshop: Evaluation of Text-based OLP http: //olp. dfki. de/ECAI 04/cfp. htm © Paul Buitelaar: Knowledge. Web Summer School, Spain - July 2004
- Cuadro comparativo de e-learning b-learning y m-learning
- Informative signals example
- Second language vs foreign language
- Language acquisition and language learning
- What is an ontologist
- Suggested upper merged ontology
- Ontological vs epistemological
- Protege owl tutorial
- Ontology, epistemology, axiology
- Ontology in research
- Ontological vs epistemological
- Ontology alignment
- Types of ontology
- Dicty base
- Ontology in biology
- Financial industry business ontology
- Barry smith ontology
- Schema .org
- Basic formal ontology
- Ontology kurssi
- Fibo financial ontology
- Ontology research methods
- Ontology editors
- Ontology creation
- Metu class
- Pizza ontology
- Dolce ontology
- Vivo ontology
- Resources events agents
- Provo ontology
- Ontology alignment
- Gene ontology
- Ontology meaning
- Business ontology
- Ontology rdf
- Gene ontology project
- Ontology
- Ontology 101
- Rdf schema example
- Ontology alignment