YAGO 2 Exploring and Querying World Knowledge in

  • Slides: 18
Download presentation
YAGO 2 : Exploring and Querying World Knowledge in Time, Space, Context, and Many

YAGO 2 : Exploring and Querying World Knowledge in Time, Space, Context, and Many Languages Published by : Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, Edwin Lewis-Kelham, Gerard de Melo, Gerhard Weikum -Max Planck Institute for Informatics, Germany Presented By : Sumana Venkatesh

ABSTRACT Yago 2 - extension of Yago knowledge base with a focus on temporal

ABSTRACT Yago 2 - extension of Yago knowledge base with a focus on temporal and spatial knowledge. Automatically built from Wikipedia(e. g. , categories, redirects, infoboxes), Geo. Names, and Word. Net (e. g. , synsets). Contains nearly 10 million entities and events, as well as 80 million facts representing general world knowledge. The wealth of spatio-temporal information in YAGO can be explored either graphically or through a special time- and space-aware query language.

OVERVIEW Introduction Extraction Architecture Temporal Dimension Geo-Spatial Dimension Textual Dimension Demo Critique Conclusion

OVERVIEW Introduction Extraction Architecture Temporal Dimension Geo-Spatial Dimension Textual Dimension Demo Critique Conclusion

INTRODUCTION Success of Wikipedia and algorithmic advances in information extraction have revived interest in

INTRODUCTION Success of Wikipedia and algorithmic advances in information extraction have revived interest in large-scale knowledge bases. KB’s include DBpedia , Know. It. All , Wiki. Taxonomy and YAGO. Commercial services include freebase. com, trueknowledge. com, and wolframalpha. com. Describe millions of individual entities, their mappings into semantic classes, and relationships between entities.

Why YAGO 2? RDF model can store additional fields, however current KB’s are blind

Why YAGO 2? RDF model can store additional fields, however current KB’s are blind to temporal dimension. Store birth date, death date but does not give an overview of timespan associated with the person and the events that the person can participate in. Problems at spatial level Store location and located. In relations but no geographical dimension to events and entities. We require a KB which is a comprehensive anchoring of current ontologies along with both geo-spatial and temporal dimension. Ex: “the era of Elizebeth 1”

EXTRACTION ARCHITECTURE The new YAGO 2 architecture is based on declarative rules that are

EXTRACTION ARCHITECTURE The new YAGO 2 architecture is based on declarative rules that are stored in text files. The rules take the form of subject-predicate-object triples. The rules themselves are a part of the YAGO 2 knowledge base.

TYPES OF RULES Factual rules - Declarative translations of all the manually defined exceptions

TYPES OF RULES Factual rules - Declarative translations of all the manually defined exceptions and facts from YAGO Knowledge Base. definitions of all relations, their domains and ranges, and the definition of the classes that make up the YAGO hierarchy of literal types. Implication rules - if certain facts appear in the knowledge base, then another fact shall be added. Whenever the YAGO 2 extractor detects that it can match facts to the templates of the subject, it generates the fact that corresponds to the object and adds it to the knowledge base.

TYPES OF RULES Replacement rules - if a part of the source text matches

TYPES OF RULES Replacement rules - if a part of the source text matches a specified regular expression, it should be replaced by a given string. Cleaning HTML tags, normalizing numbers, eliminating administrative Wikipedia categories & articles. “{{GA}}” replace “[[Georgia]]” Extraction rules - a part of the source text matches a specified regular expression, a sequence of facts shall be generated. Apply to patterns found in Wikipedia infoboxes, categories, article titles, headings, links or references.

TEMPORAL DIMENSION For YAGO 2, we can derive the temporal properties of objects from

TEMPORAL DIMENSION For YAGO 2, we can derive the temporal properties of objects from the data in the knowledge base. All entities that has a meaningful time of existence is taken into consideration. Ex: People born and pass away, Countries Created and dissolved Four major entity types with the relations that indicate their time span: 1. 2. 3. 4. People (with was. Born. On-Date and died. On. Date), Groups (Ex: music bands, football clubs, universities, or companies; with was. Created. On. Date and was. Destroyed. On. Date), Artifacts (Ex: buildings, songs or cities; with was. Created. On. Date and was. Destroyed. On. Date) and Events (Ex: sports competitions like olympics; with the relations started. On. Date and ended. On. Date).

Generic Entity-Time Relations Two generic entity-time relations: starts. Existing. On. Date - temporal start

Generic Entity-Time Relations Two generic entity-time relations: starts. Existing. On. Date - temporal start point ends. Existing. On. Date - temporal end point of an entity The type-specific relations was. Born. On. Date, died. On. Date etc. -> sub properties of the two generic relations Facts can have temporal dimension. Ex: “ Barack. Obama holds. Political. Position President. Of. The. United. States” The YAGO 2 extractors can find occurrence times of facts from the Wikipedia infoboxes. Ex: Birth, Death, Marriage and Divorce. Ex: “ Bob. Dylan was. Born. In Duluth ” “ Elvis. Presley died. In Memphis ”

GEO-SPATIAL DIMENSION Entities possessing a permanent spatial extent on Earth Ex: countries, cities, mountains,

GEO-SPATIAL DIMENSION Entities possessing a permanent spatial extent on Earth Ex: countries, cities, mountains, and rivers. Class yago. Geo. Entity -> Groups together all entities with a permanent physical location on earth. Position of geo-entity -> Described by geographical coordinates including latitudes and longitudes. yago. Geo. Coordinates -> data type to store geographical coordinates has. Geo. Coordinates -> Relation that links each geo-entity to its coordinates

Geo-Entities Geo-entities Wikipedia + geonames. org Geo. Names contain located. In hierarchy. “Berlin is

Geo-Entities Geo-entities Wikipedia + geonames. org Geo. Names contain located. In hierarchy. “Berlin is located. In Germany” Assigns a class to each location. Berlin : “Capital of a political entity” Matching Algorithms take care of duplication of classes that exist while we integrate data from Wikipedia and Geo. Names (Textual similarity + Geographical coordinates).

Entities, Facts and Location assigned to Entities + facts Ex: Location of Wood. Stock

Entities, Facts and Location assigned to Entities + facts Ex: Location of Wood. Stock is White Lake, NY which is an instance of yago. Geo. Entity. Events -> Ex: Sports Meet(happened. In relation) Groups/Organizations Ex: company headquarters or university campus locations Artifacts Ex: Mona Lisa in the Louvre (is. Located. In relation)

TEXTUAL DIMENSION For each entity, YAGO 2 contains contextual information. It includes relation has.

TEXTUAL DIMENSION For each entity, YAGO 2 contains contextual information. It includes relation has. Context and its sub-properties like • has. Wikipedia. Anchor. Text (linking an entity to a string that occurs as anchor text in the entity's article), • has. Wikipedia. Category (holds the name of a category in which Wikipedia places the article), • has. Citation. Title (holds the title of a reference on the Wikipedia) For individual entities, multilingual translations are extracted from inter-language links in Wikipedia, allowing us to query individuals using non-English names.

DEMO - SPOTLX To facilitate querying Triple + 3 additional Dimensions 6 Tuple presentation

DEMO - SPOTLX To facilitate querying Triple + 3 additional Dimensions 6 Tuple presentation SPOTLX (SPARQL like Query form) Subject(s), Predicate(P), Object(O), Time(T), Location(L) and Context(X)

CONCLUSION Methodology for enriching large knowledge bases of entity-relationship-oriented facts along the dimensions of

CONCLUSION Methodology for enriching large knowledge bases of entity-relationship-oriented facts along the dimensions of time and space. This has been demonstrated by presenting YAGO 2 which has 80 million facts of near human quality. Spatio-temporal knowledge is a crucial asset for many applications including entity linkage across independent sources (e. g. , in the Linked Data cloud) and Semantic Search.

REFERENCES(1) [1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z.

REFERENCES(1) [1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. DBpedia: A Nucleus for a Web of Open Data. In ISWC + ASWC, 2007. [2] M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open Information Extraction from the Web. In IJCAI, 2007. [3] C. Bizer, T. Heath, and T. Berners-Lee. Linked Data the story so far. International Journal on Semantic Web and Information Systems, 5(3): 1{22, 2009. [4] G. de Melo and G. Weikum. Towards a Universal Wordnet by Learning from Combined Evidence. In CIKM, 2009. [5] F. Giunchiglia, V. Maltese, F. Farazi, and B. Dutta. Geo. Word. Net: A Resource for Geo-spatial Applications. In ESWC, 2010. [6] C. Gutierrez, C. A. Hurtado, and A. Vaisman. Introducing Time into RDF. IEEE TKDE, 19: 207{218, 2007.

REFERENCES(2) [7] J. Hoart, F. M. Suchanek, K. Berberich, and G. Weikum. YAGO 2:

REFERENCES(2) [7] J. Hoart, F. M. Suchanek, K. Berberich, and G. Weikum. YAGO 2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Research report, Max-Planck-Institut f ur Informatik, 2010. [8] M. Koubarakis and K. Kyzirakos. Modeling and Querying Metadata in the Semantic Sensor Web: The Model st. RDF and the Query Language st. SPARQL. In The Semantic Web: Research and Applications, volume 6088 of LNCS, pages 425{439. 2010. [9] S. P. Ponzetto and M. Strube. Deriving a Large-Scale Taxonomy from Wikipedia. In AAAI, 2007. [10] F. M. Suchanek, G. Kasneci, and G. Weikum. YAGO: A Core of Semantic Knowledge. In WWW, 2007. [11] O. Udrea, D. Reforgiato, and V. Subrahmanian. Annotated RDF. ACM Tr. Comp. Log. , 11(2), 2010.