Semantically Conceptualizing and Annotating Tables ASWC 08 Semantically
Semantically Conceptualizing and Annotating Tables ASWC’ 08 Semantically Conceptualizing and Annotating Tables Stephen Lynn & David W. Embley Data Extraction Research Group Department of Computer Science Brigham Young University Supported by the
Semantically Conceptualizing and Annotating Tables Overview § Context § Wo. K: Web of Knowledge § TANGO: Table ANalysis for Generating Ontologies § MOGO: Mini-Ontology Generat. Or § Semantic Enrichment via MOGO § Implementation § Experimentation § Enhancements § Challenges & Opportunities ASWC’ 08
Semantically Conceptualizing and Annotating Tables Wo. K: a Web of Knowledge ASWC’ 08
Semantically Conceptualizing and Annotating Tables ASWC’ 08 TANGO repeatedly turns raw tables into conceptual miniontologies and integrates them into a growing ontology. fleck velter gonsity (ld/gg) hepth (gd) burlam 1. 2 120 falder 2. 3 230 multon 2. 5 400 Growing Ontology
Semantically Conceptualizing and Annotating Tables ASWC’ 08 MOGO TANGO repeatedly turns raw tables into conceptual miniontologies and integrates them into a growing ontology. fleck velter gonsity (ld/gg) hepth (gd) burlam 1. 2 120 falder 2. 3 230 multon 2. 5 400 Growing Ontology MOGO generates mini-ontologies from interpreted tables.
Semantically Conceptualizing and Annotating Tables ASWC’ 08 MOGO Overview § Table § Interpretation § Yields a canonical table § Canonical Table MOGO § Concept/Value Recognition § Relationship Discovery § Constraint Discovery § Yields a semantically enriched conceptual model § Mini-ontology § Integration into a growing ontology
Semantically Conceptualizing and Annotating Tables ASWC’ 08 Sample Input Region and State Information Location Northeast Delaware Maine Northwest Oregon Washington Sample Output Population (2000) 2, 122, 869 817, 376 1, 305, 493 9, 690, 665 3, 559, 547 6, 131, 118 Latitude Longitude 45 44 -90 -93 45 43 -120
Semantically Conceptualizing and Annotating Tables Concept/Value Recognition § Lexical Clues § Labels as data values § Data value assignment § Data Frame Clues § Labels as data values § Data value assignment § Default § Recognize concepts and values by syntax and layout ASWC’ 08
Semantically Conceptualizing and Annotating Tables Concept/Value Recognition § Lexical Clues § Labels as data values § Data value assignment § Data Frame Clues § Labels as data values § Data value assignment § Default § Recognize concepts and values by syntax and layout Concepts and Value Assignments Location Region State Northeast Northwest Delaware Maine Oregon Washington ASWC’ 08
Semantically Conceptualizing and Annotating Tables ASWC’ 08 Concept/Value Recognition § Lexical Clues § Labels as data values § Data value assignment § Data Frame Clues § Labels as data values § Data value assignment Year § Default 2002 2003 § Recognize concepts and values by syntax and layout Concepts and Value Assignments Location Region State Population Latitude Longitude Northeast Northwest Delaware Maine Oregon Washington 2, 122, 869 817, 376 1, 305, 493 9, 690, 665 3, 559, 547 6, 131, 118 45 44 45 43 -90 -93 -120
Semantically Conceptualizing and Annotating Tables Relationship Discovery § Dimension Tree Mappings § Lexical Clues § Generalization/Specialization § Aggregation § Data Frames § Ontology Fragment Merge ASWC’ 08 2000
Semantically Conceptualizing and Annotating Tables Relationship Discovery § Dimension Tree Mappings § Lexical Clues § Generalization/Specialization § Aggregation § Data Frames § Ontology Fragment Merge ASWC’ 08
Semantically Conceptualizing and Annotating Tables ASWC’ 08 Constraint Discovery § § Generalization/Specialization Computed Values Functional Relationships Optional Participation Region and State Information Location Northeast Delaware Maine Northwest Oregon Washington Population (2000) 2, 122, 869 817, 376 1, 305, 493 9, 690, 665 3, 559, 547 6, 131, 118 Latitude Longitude 45 44 -90 -93 45 43 -120
Semantically Conceptualizing and Annotating Tables ASWC’ 08 Validation § Concept/Value Recognition § Correctly identified concepts § Missed concepts § False positives § Data values assignment § Relationship Discovery § Valid relationship sets § Invalid relationship sets § Missed relationship sets § Constraint Discovery § Valid constraints § Invalid constraints § Missed constraints Precision Recall F-measure Concept Recognition 87% 94% 90% Relationship Discovery 73% 81% 77% Constraint Discovery 89% 91% 90%
Semantically Conceptualizing and Annotating Tables Concept Recognition § Counted: § Correct/Incorrect/Missing Concepts § Correct/Incorrect/Missing Labels § Data value assignments ASWC’ 08
Semantically Conceptualizing and Annotating Tables Relationship Discovery § Counted: § Correct/incorrect/missing relationship sets § Correct/incorrect/missing aggregations and generalization/specializations ASWC’ 08
Semantically Conceptualizing and Annotating Tables Constraint Discovery § Counted: § Correct/Incorrect/Missing: § Generalization/Specialization constraints § Computed value constraints § Functional constraints § Optional constraints ASWC’ 08
Semantically Conceptualizing and Annotating Tables Concept Recognition § Successes § 98% of concepts identified § Missing label identification § 97% of values assigned to correct concept § Common problems § Finding an appropriate label § Duplicate concepts ASWC’ 08
Semantically Conceptualizing and Annotating Tables ASWC’ 08 Relationship Discovery § Recall of 92% for relationship sets § Missing aggregations and gen. /spec. ’s (only found in label nesting) § Unnecessary rel. sets generated (are computable)
Semantically Conceptualizing and Annotating Tables Constraint Discovery § F-measure of 98% for functional relationship sets § Computed value discovery § Funtional/non-functional lists in cells ASWC’ 08
Semantically Conceptualizing and Annotating Tables ASWC’ 08 MOGO Contributions § Tool to generate mini-ontologies § Accuracy encouraging Precision Recall F-measure Concept Recognition 87% 94% 90% Relationship Discovery 73% 81% 77% Constraint Discovery 89% 91% 90%
Semantically Conceptualizing and Annotating Tables ASWC’ 08 Opportunities & Challenges: MOGO § Enhancements § Check for inter-label relationships § Check for more complex computations § Check for lists in cells §… § Wish List § Data-frame library § Atomic knowledge components § Instance recognizers § Library of molecular components § Semi-automatic construction of a Word. Net-like resource for knowledge components
Semantically Conceptualizing and Annotating Tables ASWC’ 08 Summary § MOGO § Semantic Enrichment § Encouraging Results § But More Possible § Broader Implications ~ Vision & Challenges § TANGO § Wo. K § Web of Data § Semantic Annotation § User-friendly Query Answering www. deg. byu. edu embley@cs. byu. edu
Semantically Conceptualizing and Annotating Tables ASWC’ 08 Opportunities & Challenges: TANGO § Table Interpretation § Transforming tables to F-logic [Pivk 07] § Layout-independent table representation [Jha 08] § Table interpretation by sibling tables [Tao 07] § Semantic Enhancement / Ontology Generation § Naming unnamed table concepts [Pivk 07] § MOGO [Lynn 09] § Semi-automatic Ontology Integration § Ontology Matching [Euzenat 07] § Ontology-mapping tools [Falconer 07] § Direct and indirect schema mappings for TANGO [Xu 06]
Semantically Conceptualizing and Annotating Tables ASWC’ 08 Opportunities & Challenges: Wo. K § Web of Data § “The Semantic Web is a web of data. ” [W 3 C] § Upcoming special issue of Journal of Web Semantics § “Enabling a Web of Knowledge” [Tao 09] § Information Extraction § Domain-independent IE from web tables [Gatterbauer 07] § Open IE [Banko 07] §…
Semantically Conceptualizing and Annotating Tables Opportunities & Challenges: Wo. K §… § Semantic Annotation wrt Ontologies § Linking Data to Ontologies [Poggi 08] § TISP [Tao 07] § FOCIH [Tao 09] § Reasoning & Query Answering § Description Logics [Baadar 03] § NLIDB Community § Ask. Ontos [Ding 06] § Ser. FR [Al-Muhammed 07] ASWC’ 08
Semantically Conceptualizing and Annotating Tables References ASWC’ 08 § [Al-Muhammed 07] Al-Muhammed and Embley, “Ontology-Based Constraint Recognition for Free-Form Service Requests”, Proceedings of the 23 rd International Conference on Data Engineering, 2007. § [Baader, Calvanese, Mc. Guinness, Nardi and Patel-Schneider, The Description Logic Handbook, Cambridge University Press, 2003. § [Banko 07] Banko, Cafarella, Soderland, Broadhead and Etzioni, “Open Information Extraction from the Web”, Proceedings of the International Joint Conference on Artificial Intelligence, 2007. § [Ding 06] Ding, Embley and Liddle, “Automatic Creation and Simplified Querying of Semantic Web Content: An Approach Based on Information-Extraction Ontologies”, Proceedings of the First Asian Semantic Web Conference, 2006. § [Euzenat 07] Eusenat and Shvaiko, Ontology Matching, Springer Verlag, 2007. § [Falconer 07] Falconer, Noy and Storey, “Ontology Mapping—A User Survey”, Proceedings of the Second International Workshop on Ontology Mapping, 2007. § [Gatterbauer 07] Gatterbauer, Bohunsky, Herzog and Pollak, “Towards Domain-Independent Information Extraction from Web Tables”, Proceedings of the Sixteenth International World Wide Web Conference, 2007. § [Jha 07] Jha and Nagy, “Wang Notation Tool: Layout Independent Representation of Tables”, Proceedings of the 19 th International Conference on Pattern Recognition, 2007. § [Pivk 07] Pivk, Sure, Cimiano, Gams, Rajkovič and Studer, “Transforming Arbitrary Tables into Logical Form with TARTAR”, Data & Knowledge Engineering, 2007. § [Poggi 08] Poggi, Lembo, Calvanese, De. Giacomo, Lenzerini and Rosati, “Linking Data to Ontologies”, Journal on Data Semantics, 2008. § [Tao 07] Tao and Embley, “Automatic Hidden-Web Table Interpretation by Sibling page Comparison”, Proceedings of the 26 th International Conference on Conceptual Modeling, 2007. § [Tao 09] Tao, Embley and Liddle, “Enabling a Web of Knowledge”, Technical Report : tango. byu. edu/papers, 2009. § [Xu 06] Xu and Embley, “A Composite Approach to Automating Direct and Indirect Schema Mappings”, Information Systems, 2006.
- Slides: 27