Ontology Entity Instance Population Overview of Current Progress




















- Slides: 20
Ontology Entity Instance Population: Overview of Current Progress 2011. 05. 06
Table of Contents 1. Introduction - Goal 2. System Architecture - Overall Architecture Word. Net-based Query Expansion Hierarchical Clustering-based bootstrapping Demo 3. Known Problems 2 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
1. Introduction
Main Goal § Vital component for progress towards semantic LOD 1. Build a backbone ontology 2. Align each dataset to backbone ontology This step requires ontology population from backbone to other ontologies 4 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
Local Goal § Building ontology entity instance population system ISA INPUT ISA SYSTEM OUTPUT ISA Cleric instance. Of Pope_Stephen_VI instance. Of Jalaluddin_Hasan … Source of Information (Wikipedia/Free text) 5 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
Local Goal: Proposed System Architecture User inputs ontology/ Target concept name Ontology Populator Using Wikipedia Articles Intrinsic propertyprojected token database extracted from Wikipedia Ontology Populator Using free text Source of Information (Free text) Current Progress Populated Ontology Contributions: 1. Automatic supply of initial seeds (which are already good population results by themselves) 2. “Intrinsic tokens” of those populated results could act as good context vector for ontology populator using free text 6 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
2. System Architecture
Overall System Architecture Concept Label Expander Label head/ modifier extractor Instance Populator Tree traversing-based bootstrapper for new lexical clues System Output: Instances of the given concept User Input: Concept Label Synonym/hyponym Retrieval Lexical Matchingbased Instance Retriverer Word. Net Intrinsic Property Storage (From Wikipedia) 8 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
Overall System Architecture Query: Concept Label Expander Instance Populator “soccer player” Label head/ modifier extractor Tree traversing-based bootstrapper for new lexical clues System Output: Instances of the given concept User Input: Concept Label Synonym/hyponym Retrieval Lexical Matchingbased Instance Retriverer Word. Net Intrinsic Property Storage (From Wikipedia) 9 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
Overall System Architecture Query: {H: “player” Query: Concept Label. M: Expander {“soccer”}}Instance Populator “soccer player” Label head/ modifier extractor Tree traversing-based bootstrapper for new lexical clues System Output: Instances of the given concept User Input: Concept Label Synonym/hyponym Retrieval Lexical Matchingbased Instance Retriverer Word. Net Intrinsic Property Storage (From Wikipedia) 10 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
Overall System Architecture Query: Concept Label Expander Instance Populator “soccer player” Label head/ modifier extractor Tree traversing-based bootstrapper for new lexical clues System Output: Instances of the given concept User Input: Concept Label Synonym/hyponym Retrieval Lexical Matchingbased Instance Retriverer Word. Net Soccer == association football ‘Participant’ is hyponym of ‘player’ Expanded Query Set: Intrinsic Property Storage {{H: “player”, M: {“soccer”}}, (From Wikipedia) {H: “player”, M: {“association”, “football”}}, {H: “participant”, M: {“soccer”}}, {H: “participant”, M: {“association”, “football”}}} 11 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
Overall System Architecture Query: Concept Label Expander Instance Populator “soccer player” Label head/ modifier extractor Tree traversing-based bootstrapper for new lexical clues System Output: Instances of the given concept User Input: Concept Label Synonym/hyponym Retrieval Lexical Matchingbased Instance Retriverer Word. Net Expanded Query Set: {{H: “player”, M: {“soccer”}}, {H: “player”, M: {“association”, “football”}}, {H: “participant”, M: {“soccer”}}, {H: “participant”, M: {“association”, “football”}}} Intrinsic Property Storage (From Wikipedia) Query Hit! COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. David Beckham Authority: …, Player, … Hub: …, Association, Football, … 12 2020 -11 -23
Overall System Architecture Query: Concept Label Expander Instance Populator “soccer player” Label head/ modifier extractor Gathers not-known clues Ex. “footballer” Tree traversing-based bootstrapper for new lexical clues System Output: Instances of the given concept User Input: Concept Label Synonym/hyponym Retrieval Lexical Matchingbased Instance Retriverer Word. Net Intrinsic Property Storage (From Wikipedia) 13 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
Overall System Architecture Query: Concept Label Expander Instance Populator In Progress “soccer player” Label head/ modifier extractor Tree traversing-based bootstrapper for new lexical clues System Output: Instances of the given concept User Input: Concept Label Synonym/hyponym Retrieval Lexical Matchingbased Instance Retriverer Word. Net Instance of concept “Soccer Player”: Intrinsic Property Storage (From Wikipedia) - David Beckham - Diego Maradona -… Implemented (Demo Available at Lab Homepage: http: //semanticweb. kaist. ac. kr/home/index. php/Intrinsic. Property) 14 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
Steps 1. User inputs a concept name. Ex. “soccer player” 2. Distinguish head word and modifiers, and create a query based on the result. Ex. Q = {H: player, M: {“soccer}} 3. Expand the query using Word. Net. Exp. Q = { {H: “player”, M: {“soccer”}}, {H: “player”, M: {“association”, “football”}}, {H: “participant”, M: {“soccer”}}, {H: “participant”, M: {“association”, “football”}}} 4. Refine the result using hierarchy clustering-based bootstrapping approach 15 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
Expanding Query using Word. Net § Expand the lexical form of the query using Word. Net § Its modifiers and heads are expanded seperately § Expansion is based on the hyponym relation and synonym relation of synsets § For the various possible synsets of a lexicon, the system chooses most frequently shown meaning (the frequency is provided by Word. Net) 16 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
Hierarchical Clustering-based bootstrapping § Purpose: 1. Finding more clues for population Ex. “footballer” for query “soccer player” 2. Removing wrongly-extracted results § Approach: Hierarchical clustering-based approach - Why? – By analyzing internal structure of the concept, (1) Can find out a subconcept, which could directly be used as clue, and (2) Wrongly-extracted results will automatically be clustered in one sub-cluster 17 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
Steps of hierarchical clustering-based bootstrapping 1. Does hierarchical clustering of gathered instances based on their similarity - 2. Similarity measure to distinguish instances of the concept is used Traverse the resultant tree in BFS manner, to remove wrongly-extracted result & populate sub-branches - Similarity measure to distinguish the concept instance and instances outside the concept is Authority clue: cleric, pope Cluster #1 used <cleric(0. 62), pope(0. 43), saint(0. 38), papacy(0. 29), bishop(0. 24), archbishop(0. 23), …> Non-Dominant branch Cluster #2 Cluster #3 <papacy(0. 52), cleric(0. 49), pope(0. 48), archbishop(0. 3), bishop(0. 23), saint(0. 15), …> <cleric(0. 66), saint(0. 44), pope(0. 41), bishop(0. 25), archbishop(0. 19), catholic(0. 12), …> New clue: Gather those instances with “papacy” Cluster #2 - Subtree Cluster #2 – Subcluster #1 Compare centroid: Determine whether to populate or remove Cluster #2 – Subcluster #2 • Currently working on the refinement of the system 18 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23
3. Known Problems
Known Problems (1) How can we map “intrinsic property” to those extracted result? - Is the term appropriate? Suggestion: How about to define a new term “representative property”, meaning that those extracted terms “representative” that entity? (2) Direct evaluation of intrinsic tokens - Is there any way to directly evaluate those extracted “intrinsic token”s? (3) Developing system which extracts instances from free text 20 COPYRIGHT © 2009, Cognitive Informatics LAB, ALL RIGHTS RESERVED. 2020 -11 -23