Evaluating Ontology Search Towards Benchmarking in Ontology Search
Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas Eigner Competence Center Semantic Web & Language Technology Lab DFKI Gmb. H Saarbrücken, Germany © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Overview n Ontology Search q n Onto. Select q q n Browse (ontologies, labels, classes, properties) Search by topic Evaluating Ontology Search q q n Knowledge reuse (integration with Ontology Learning) Benchmark (evaluation) data set Experiment (compare SWOOGLE, Onto. Select) Conclusions © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Ontology Search n There are more and more ontologies published on the (Semantic) Web q n Opens up possibilities for reuse of knowledge q n Available as RDFS or OWL files (also still DAML) Access through ontology search engines and/or (manual/automatic) organization in ontology libraries But: increasingly harder to find the right one for your application q Increasing research in ontology search/selection (Alani et al. , Buitelaar et al. , Ding et al. , Sabou et al. ) – SWOOGLE, Onto. Select, Watson © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Onto. Select n Ontology Library and Search Engine http: //olp. dfki. de/Onto. Select q q Monitors the web for ontologies with automatic harvesting and indexing Browse and search n n q Ontology publishing n q On ontologies, classes, properties and (multilingual) labels Ontology search integrates relevance feedback over Wikipedia for search term Submit ontologies - will be automatically integrated Statistics n On formats, languages, labels used, ontology publishing Paul Buitelaar, Thomas Eigner, Thierry Declerck Onto. Select: A Dynamic Ontology Library with Support for Ontology Selection In: Proc. of the Demo Session at the International Semantic Web Conference, Hiroshima, Japan, Nov. 2004. © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Onto. Select – Browse © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Ontology Search © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Keyword as Wikipedia Topic © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Keyword Expansion (Extraction) Relevance Feedback from Wikipedia © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Ranked Results (Browsable) © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Search Criteria n Relevance criteria address ontology content, structure, status: q Coverage - Term Matching n q Structure - Properties Relative to Classes n q How many of the terms in a text collection are covered by labels for classes and properties? How detailed is the knowledge structure that the ontology represents? Connectedness - Number of Included Ontologies n Is the ontology connected to other ontologies and how well established are these? © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Evaluation – Benchmark n n Benchmark: 15 Wikipedia topics and 57 manually assigned ontologies out of 1056 cached through Onto. Select 15 Wikipedia topics were selected out of the set of all (37284) class/property labels in Onto. Select, by: q Filtering out labels that did not correspond to a Wikipedia page > 5658 labels / topics q 5658 labels were used as search terms in SWOOGLE to filter out labels that returned less than 10 ontologies (out of the 1056 in Onto. Select) > 3084 labels / topics q Out of 3084 labels we manually selected useful topics, e. g. we left out very short labels (‘v’) and very abstract ones (‘thing’) > 50 topics q We randomly selected 15 for which we manually checked the ontologies retrieved from Onto. Select and SWOOGLE > 15 topics with 57 assigned ontologies © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Evaluation – Benchmark by Topic n 15 (Wikipedia) topics with number of assigned ontologies: q q q Atmosphere (2) Biology (11) City (3) n n n q q q http: //www. mindswap. org/2003/owl/geo. Features. owl http: //www. glue. umd. edu/ katyn/CMSC 828 y/location. daml http: //www. daml. org/2001/02/geofile-ont Communication (10) Economy (1) Infrastructure (2) Institution (1) Math (3) Military (5) Newspaper (2) Oil (0) Production (1) Publication (6) Railroad (1) Tourism (9) © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Evaluation – Experiment n Comparison of (average) results between SWOOGLE and Onto. Select n Use Onto. Select benchmark q q q n 15 topics (queries) 57 assigned ontologies (relevance assessments) 1056 ontologies (data set) Use different configurations for Onto. Select q q q With/without keyword expansion/extraction With/without class names (in addition to labels) With/without property labels Weighting of relevance criteria … © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Evaluation – Results © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Evaluation – Weighting of ‚title‘ © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
Conclusions n Conclusions on evaluation are too early q q q n Many more configurations (weights) to compare Extend the benchmark Comparison with other ontology search engines Main contribution of the presented work q q First comprehensive benchmark for topic-driven evaluation of ontology search (Extended) Benchmark will be made publicly available http: //olp. dfki. de/Onto. Select © Paul Buitelaar – EON@ISWC 07, November 2007, Busan, South-Korea
- Slides: 16