Metadata The Semantic Web Directories and Thesauri XML
Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF CS 3352
Sources of Knowledge for finding documents [De. Rose 99] l “The user, including their current explicit query and any historical or profile information the system may have gained earlier. l The documents in the library or on the web, Text, image including their nominal "content" and whatever Mark-up, Links, metadata has been attached Catalogue database l CS 3352 The world, about which the system may have Ontologies, certain information, such as dictionaries and Thesauri thesauri of natural language terms; basic Knowledge knowledge of object categories ("dog is-a animal"), and much more…”
What is metadata? l Data cataloging resources – Administrative cataloguing: acquisition history, author… – Structural: size, image format… l Data describing the content and meaning of resources royal UK male trophy presenter, footballer trophy winner CS 3352
Metadata Representation Expressive, so we can say what we want; Compositional, so that we can build complex terms out of simple pieces; Controlled, so we only say consistent and coherent things; Incremental, so we can keep adding descriptions CS 3352
Dublin Core l A standard for metadata defined by the digital library community Others: MARC, VRA… l 15 Elements: l – – – l. Core Title Creator Date Identifier Relation Subject Publisher Type Source Coverage Description Contributor Format Language Rights elements defined in RFC 2413: lhttp: //src. doc. ic. ac. uk/computing/internet/rfc 2413. txt lhttp: //www. ariadne. ac. uk lhttp: //www. ukoln. ac. uk CS 3352 From : Metadata for images, Michael Day http: //www. ukoln. ac. u
Metadata on the web yesterday l Meta tags CS 3352
<? xml version="1. 0" encoding="utf-8"? > <book isbn="0836217462"> <title>Being a Dog Is a Full-Time Job</title> <author>Charles M. Schulz</author> <character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950 -10 -04</since> <qualification> extroverted beagle </qualification> </character> <name>Peppermint Patty</name> <since>1966 -08 -22</since> <qualification>bold, brash and tomboyish</qualification> </character> </book> Metadata on the Web yesterday CS 3352
Metadata on the web yesterday CS 3352
World Wide Web l Tim Berners-Lee reprise… “. . . a goal of the Web was that, if the interaction between person and hypertext could be so intuitive that the machine-readable information space gave an accurate representation of the state of people's thoughts, interactions, and work patterns, then machine analysis could become a very powerful management tool, seeing patterns in our work and facilitating our working together through the typical problems which beset the management of large organizations. ” Berners-Lee 1996 CS 3352
Web = Data+Information-Knowledge Browse the Links Search using Words steamer, tank Search using experience Link structure is content – rhetorical narratives Search using indexes Metadata and classifications CS 3352
? “Find a very successful European team -based sports person” Steve Redgrave’s home page Resource describing the Olympic Games CS 3352 • Metadata • Knowledge • Inference Resource describing UK soccer players and their careers Resource listing sporting competitions including FA Cup and Superbowl Resource that lists teams that have won the FA Cup
Country Event nationality People par ticipat es rt pa Sport n pa ts partof =1 1 Tennis Soccer player wins FA Cup once CS 3352 Rower =4 Soccer player Tournament Rowing nts ipa tic Sports Person par Europe Competition holds ici UK win Sports Tournament Coxless Fours Rower win Olympic Games UK Rower win Olympic Games > 2 times Soccer Tennis Tournament FA Cup Wimbledon
A Shared Understanding l Metadata – Data describing the content and meaning of resources – But everyone must speak the same language… l Terminologies – Shared and common vocabularies – For search engines, agents, curators, authors and users – But everyone must mean the same thing… l Ontologies – Shared and common understanding of a domain – Essential for exchange and discovery CS 3352
Ontologies “The [reusable] specification of conceptualizations, used to help programs and humans share knowledge” [Gruber 93] l An ontology will include: – a vocabulary of terms, and – some specification of their meaning – structure on the domain and constrain the possible interpretations of terms [Uschold 99] – precise notion of what meaning means Ontologies provide: l a shared and common understanding of a domain that can be communicated across people and applications l CS 3352
Ontology Precise notion of what meaning means l l l CS 3352 formal, explicit, rigour unambigious agents not just people machine computable from machine-readable to machine-understandable. use knowledge representation and reasoning to supply the meaning
What is an Ontology? Catalog/ ID Thesauri “narrower term” relation Terms/ glossary CS 3352 Informal is-a Frames Formal General (properties) Logical is-a constraints Formal instance Value Restrs. Disjointness Inverse, part-of… From Debbie Mc. Guinness
Ontologies and E-Anything Simple ontologies provide: l Controlled shared vocabulary (search engines, authors, users, databases, programs all speak same language) l Organization (and navigation support) l Expectation setting (left side of many web pages) l Browsing support (tagged structures such as Yahoo!) l Search support (query expansion approaches such as Find. UR, e-Cyc) l Sense disambiguation l Conflict detection l Structured, comparative search l Generalization/ Specialization l … CS 3352 From Debbie Mc. Guinness
The Semantic Web lhttp: //www. semanticweb. org CS 3352
Metadata on the web tomorrow l Resources annotated with metadata using knowledge as a shared vocabulary – Metadata held outside the resource l Knowledge structures for holding the ontology – XML DTDs Product classifications – Directories l Home > Recreation > Sports > Events > International Games > Olympic Games > l l W 3 C: RDF and RDFS – Resource Description Framework Topic maps l DAML+OIL CS 3352 l
XML is not good for describing ontologies XML defines grammars to verify and structure documents l The grammar enforces constraints on tags l Different grammars define the same content l XML lacks a semantic model – it only has a surface model which is a tree. l <course date=“. . . ”> <title>. . . </title> <teacher>. . . </teacher> <name>. . . </name> <http>. . . </http> <students>. . . </students> </course> course title teacher students name http • node = label + attr/values + contents CS 3352
XML is not good for describing ontologies l Meaning of XML documents is intuitively clear – “semantic” markup tags are domain terms l But computers do not have intuition – Tag names per se do not provide semantics – The semantics are encoded outside the XML specification XML makes no commitment on: Domain specific ontological vocabulary Ontological modelling primitives requires pre-arranged agreement on & Feasible for closed collaboration l – agents in a small & stable community – pages on a small & stable intranet CS 3352
XML DTDs and XML Schema DTD does not distinguish between objects and relations l XML Schema’s type extension mechanism is a red herring – it can’t be used to model ontological subtypes l XML has been used as a serialisation syntax for other markup languages – e. g. SMIL, XOL <class> <name> person </name> </class> <slot> <name>year-of-birth</name> <domain. person</domain> <slot-cardinality>1</slot-cardinality> </slot> l CS 3352
Requirements for an Ontology-language l Well designed – Useful and proven modelling primitives – Intuitive to human users – Can say simple things simply – Expressive enough to capture many ontologies – Efficient, sound and complete reasoning support l Well defined – clear syntax - read ontologies – Formal semantics – understand (process) ontologies - to facilitate machine interpretation of that semantics; – Expressive enough to capture many ontologies l Compatible – Easy mapping to/from other ontology languages – Maximum compatibility with XML and RDF(S); CS 3352
Sem Web Research Issues l Ontology creation – Millions of ontologies will be built – Ontology Engineering is difficult and time-consuming – Ontology Learning – Scalable RDF Repositories (all is built on top of the same data model !) l Infrastructure – Scalable reasoning services for different languages – Resource-ID Management – Versioning of ontologies and corresponding metadata CS 3352
Sem Web Research Issues l Metadata Management – – – l legacy data (HTML, XML, . . . ) -> legacy data migration: Annotation of Web documents (HTML, PDF, . . . ) Semi-automation using information extraction XML-Wrapper / Transformer Database Converter / Exporter Maintenance of Metadata, ontologies and resources – sources, ontologies, and metadata have to be maintained in a consistent way l organizational process is needed l tools are needed l Metadata have to reflect changes of the sources l metadata have to reflect changes of the ontologies CS 3352
Selected Semantic Web Projects l COHSE – http: //inanna. ecs. soton. ac. uk/cohse/ l Ontobroker – http: //ontobroker. aifb. uni-karlsruhe. de/ l SHOE – http: //www. cs. umd. edu/projects/plus/SHOE/ CS 3352
- Slides: 26