Marko Grobelnik Janez Brank Bla Fortuna Igor Mozeti
Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič Contextualizing Ontologies With Ontolight: A Pragmatic Approach
Outline Ontology Ontolight Definition Grounding Population Applications Integration in Onto. Gen Demo
What is ontology? Ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts. Generally it consist of Classes: sets, collections, or types of objects Instances: the basic or "ground level" objects Relations: ways that objects can be related to one another It can be used … as schema for knowledge management system, … to reason about the objects within that domain, etc.
Sample Ontology
Examples of Real-world Ontologies Agro. Voc Multilingual thesaurus for the field of Agriculture, Forestry, Fisheries, Food Security and related stuff Consists of � terms in different languages, � thesaurus relationships between terms � Broader, narrower, related ASFA Thesaurus used for annotating bibliography related to aquatic science literature Euro. Voc Multilingual thesaurus used by European institutions Acquis Communitarian corpus is annotated by Euro. Voc Cyc Knowledge base, formalization of fundamental human knowledge Dmoz – The Open Directory Project Worlds largest directory of WWW, maintained by volunteer editors
What is Ontolight? Simple model covering most of the well known light-weight ontologies Stores ontology like a rich graph Defined as: List of languages used for lexical terms (covers multliliguality) List of class-types (types of nodes in the graph) List of classes (nodes in the graph) List of relation types (types of links in the graph) List of relations (links in the graph) Grounding model � A function which proposes a set of classes for a given instance � Classification in machine learning
Grounding Mutliclassification model trained on the instances of ontology In case of Dmoz web pages In case of Euro. Voc EU legislation We used centroid-based classifier Calculates a centroid vector for each class Uses knowledge of hierarchy Classification performed by k. NN algorithm Highly scalable – can handle 100 s of thousands of classes
Population Takes instance as an input Output is a list of suggested classes Example from Euro. Voc Instance: “Slovenia and Croatia are having a fishing industry” Output: 1 0. 201 Croatia 2 0. 171 Fisheries policy 3 0. 162 Slovenia 4 0. 161 Fishing area 5 0. 159 National independence 6 0. 159 Fishing regulations 7 0. 156 Fishery management 8 0. 147 Fisheries structure 9 0. 147 Fishing fleet 10 0. 144 Community fisheries
Onto. Gen Ontology visualization Concept hierarchy Ontology construction and learning Semi-Automatic: Text-mining methods provide suggestions and insights into the domain The user can interact with parameters of text-mining methods All the final decisions are taken by the user List of suggested sub-concepts Selected concept Selected instance Data-Driven: Most of the aid provided by the system is based on some underlying data provided by the system Instances are described by features extracted from the data (e. g. bag-of-words vectors) Concept’s details Keywords Concept’s instance management
Contextualized ontology generation Ontolight is integrated with Ontogen Helps at new ontology generation by means of existing ontologies User loads Ontolight into Ontogen at start Suggestion methods: Concept suggestion �Offers concepts from loaded Ontolight as possible sub-concepts Name suggestion �Offers names of concepts from Ontolight as possible concept names All suggestions are integrated in semi-automatic manner
Concept suggestion User selects concept User selects Ontolight Onto. Gen classifies each document into context – Ontolight ontology Concepts with most documents are provided as suggestions to the user
Name suggestion User selects concept Onto. Gen classifies each document into context – loaded Ontolight ontologies Names of concepts with most classified documents are provided as suggestions to the user
Demo Agro. Voc and Euro. Voc applied to Yahoo finance data
- Slides: 14