Semantic Web The Story So Far Ian Horrocks
Semantic Web The Story So Far Ian Horrocks <ian. horrocks@comlab. ox. ac. uk> Oxford University Computing Laboratory
The Semantic Web
What is it? • Web “invented” by Tim Berners-Lee (amongst others) – (Conceptual) simplicity of web has contributed to success, but is also a limiting factor • Tim has ambitious goals for future of the web – Objective is to overcome existing limitations “… a consistent logical web of data …” “… information is given well-defined meaning …” • This vision of the future of the Web has become known as the Semantic Web
Why do we want it? Many tasks are difficult or impossible using existing web: Rev. Alan M. Gates, Associate Rector of the Church of the Holy Spirit, Lake Forest, Illinois
Why do we want it? Many tasks are difficult or impossible using existing web: • Complex queries involving background knowledge – Find information about “animals that use sonar but are neither bats nor dolphins”, e. g. , Barn Owl • Locating information in data repositories – Travel enquiries – Prices of goods and services – Results of human genome experiments • Finding and using “web services” – Given DNA sequence, identify genes, determine proteins they produce, and hence biological processes they control
What is the Problem? Consider a typical web page: • Markup consists of: – rendering information (e. g. , font size and colour) – Hyper-links to related content • Semantic content is accessible to humans, but not (easily) to computers…
How Will It Work? • Add semantic annotations to web resources Dr. Alan <Person>Alan Rector, Professor Rector</Person>, of Computer <Job>Professor Science, University of Computer of Manchester Science</Job>, University of Manchester Rev. Alan <Person>Alan M. Gates, M. Associate Rector Gates</Person>, of the Church of the <Job>Associate Holy Spirit, Lake Rector</Job> Forest, Illinois of the Church of the Holy Spirit, Lake Forest, Illinois
How Will It Work? Now. . . that should clear up a few things around here
Giving Semantics to Annotations • Agree on meaning of a set of annotation tags • E. g. , Dublin Core – Limited flexibility and extensibility – Limited number of things can be expressed • Agree on language used to define meanings • E. g. , an ontology language – Flexible and extensible • New terms can be formed by combining existing ones – Meaning (semantics) of such terms is formally specified
The Web Ontology Language OWL
Web Ontology Language OWL • Semantic Web led to requirement for a “web ontology language” • set up Web-Ontology (Web. Ont) Working Group – Web. Ont developed OWL language – OWL based on earlier languages RDF, OIL and DAML+OIL – OWL now a W 3 C recommendation (i. e. , a standard) • OWL is a family of 3 languages: OWL Lite, OWL DL and OWL Full • OIL, DAML+OIL and OWL (DL & Lite) based on Description Logics – Has facilitated development of wide range of high quality tools & infrastructure • OWL now language of choice in many applications
What Are Description Logics? • A family of logic based Knowledge Representation formalisms – Descendants of semantic networks and KL-ONE – Describe domain in terms of concepts (classes), roles (properties, relationships) and individuals – Operators allow for composition of complex concepts – Names can be given to complex concepts, e. g. : Happy. Parent ´ Parent u 8 has. Child. (Intelligent t Athletic)
Why (Description) Logic? • OWL exploits results of 15+ years of DL research – Well defined (model theoretic) semantics – Most DLs are subsets of C 2, i. e. , decidable fragments of FOL
Why (Description) Logic? • OWL exploits results of 15+ years of DL research – Well defined (model theoretic) semantics – Formal properties well understood (complexity, decidability) I can’t find an efficient algorithm, but neither can all these famous people. [Garey & Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, 1979. ]
Why (Description) Logic? • OWL exploits results of 15+ years of DL research – Well defined (model theoretic) semantics – Formal properties well understood (complexity, decidability) – Known reasoning algorithms
Why (Description) Logic? • OWL exploits results of 15+ years of DL research – Well defined (model theoretic) semantics – Formal properties well understood (complexity, decidability) – Known reasoning algorithms – Implemented systems (highly optimised) KAON 2 Pellet CEL
Class/Concept Constructors • Concept can be thought of as a FOL formula with one free variable
Knowledge Base / Ontology Axioms
OWL RDF/XML Exchange Syntax E. g. , Parent u 8 has. Child. (Intelligent t Athletic): <owl: Class> <owl: intersection. Of rdf: parse. Type=" collection"> <owl: Class rdf: about="#Parent"/> <owl: Restriction> <owl: on. Property rdf: resource="#has. Child"/> <owl: all. Values. From> <owl: union. Of rdf: parse. Type=" collection"> <owl: Class rdf: about="#Intelligent"/> <owl: Class rdf: about="#Athletic"/> </owl: union. Of> </owl: all. Values. From> </owl: Restriction> </owl: intersection. Of> </owl: Class>
Ontology based Information Systems • Similar to relational databases – Ontology ¼ schema; instances ¼ data • Some important (dis)advantages + (Relatively) easy to maintain and update schema • Both schema and data are “self organising” + Query answers reflect both schema and data + Able to answer both intensional and extensional queries – Semantics may be counter-intuitive or even inappropriate • Open -v- closed world; axioms -v- constraints – Query answering (logical entailment) much more difficult • Can lead to scalability problems
Ontology based Information Systems • Similar to relational databases – Ontology ¼ schema; instances ¼ data • Some important (dis)advantages + (Relatively) easy to maintain and update schema • Both schema and data are “self organising” + Query answers reflect both schema and data + Able to answer both intensional and extensional queries – Semantics may be counter-intuitive or even inappropriate • Open -v- closed world; axioms -v- constraints – Query answering (logical entailment) much more difficult • Can lead to scalability problems Very useful, but don’t expect miracles!
Ontologies and Reasoning
Support for Ontology Engineering • Developing and maintaining quality ontolgies is very challenging • Users need tools and services, e. g. , to help check if ontology is: – Meaningful — all named classes can have instances
Support for Ontology Engineering • Developing and maintaining quality ontolgies is very challenging • Users need tools and services, e. g. , to help check if ontology is: – Meaningful — all named classes can have instances – Correct — captures intuitions of domain experts
Support for Ontology Engineering • Developing and maintaining quality ontolgies is very challenging • Users need tools and services, e. g. , to help check if ontology is: – Meaningful — all named classes can have instances – Correct — captures intuitions of domain experts – Minimally redundant — no unintended synonyms Banana split Banana sundae
Support for Ontology Engineering • Range of new “non-standard” services supporting, e. g. : – Modular design and integration • What is the effect of merging O 2 into O 1? • In general, check that O 1 [ O 2 ² C iff O 1 ² C for any concept C constructed using vocabulary occurring in O 1 – Module Extraction • Extract a (small) module from O capturing all “relevant” information about some vocabulary V • In general, find O’ µ O s. t. O’ ² C iff O ² C for any concept C constructed using terms from V – Bottom-up design • Find a (small and specific) concept describing a set of individuals • In general, find most specific C s. t. O ² C(i 1) Æ … Æ C(in) – Where C may be “small” and/or in a sub-language (of O)
Support for Ontology Engineering • Range of new “non-standard” services supporting, e. g. : – Error diagnosis and repair
Support for Query Answering • In an Ontology based Information System (OIS), Query answering ¼ computing logical entailment – Reasoner needed in order to answer queries, e. g. : • C is a sub-class of D iff O ² 8 x. C(x) ! D(x) • a is an instance of C iff O ² C(a) OIS with no reasoner ¼ DBMS with no query engine
Example Applications
e-Science • E. g. , for “in silico” investigations and “hypothesis testing” – Comparing data (e. g. , on proteins) to (model of) biological knowledge – Characteristics of proteins captured in an ontology O • Goal is to identify protein instances based on characteristics
e-Science • E. g. , for “in silico” investigations and “hypothesis testing” – Comparing data (e. g. , on proteins) to (model of) biological knowledge – Characteristics of proteins captured in an ontology O • Goal is to identify protein instances based on characteristics – Equivalent to answering queries of form: O ² P(i)? for protein P and instance i – Result may be discovery of new kinds of protein • And these may be potential drug targets if unique to a pathenogen – Result may also be discovery of errors in model • Which may reflect gaps/errors in existing knowledge
Healthcare • UK NHS has a £ 6. 2 billion “Connecting for Health” IT programme • Key component is Care Records Service (CRS) – “Live, interactive patient record service accessible 24/7” – Patient data distributed across local centres in 5 regional clusters, and a national DB • Detailed records held by local service providers • Diverse applications support radiology, pharmacy, etc • Applications exchange messages containing “semantically rich clinical information” • Summaries sent to national database – SNOMED-CT ontology provides common vocabulary for data • Clinical data uses terms drawn from ontology
SNOMED • Over 400, 000 concepts
SNOMED • • Over 400, 000 concepts Schema only — no instances Language used is a (well known) fragment of OWL NHS version extended with 1, 000 s of additional classes – OWL reasoner (Fa. CT++) used to classify and check ontology • Currently takes ¼ 4 hours – 180 missing sub. Class relationships were found, e. g. : • Periocular_dermatitis sub. Class. Of Disease_of_face • Fibrin_measurement sub. Class. Of Coagulation_factor_assay
SNOMED • Vocabulary is extensible at point of use: “post coordination” – Users (e. g. clinicians) may add/define new vocabulary – Terminology service (reasoner) used to insert in ontology • Typical new term: – almond_allergy ´ “allergy caused_by almond” – OWL reasoner (Fa. CT++) used to classify new term • Takes <10 ms – Classified as a kind of “nut allergy” • Clearly of crucial importance to recognise patients with allergy caused by almond as kinds of patient with nut allergy
Recent Developments
Improving Scalability • Optimisation techniques – Improve performance of DL reasoners, e. g. , [Tsarkov et al, JAR, 2007] • New reasoning techniques – Reduction to disjunctive Datalog [Motik et at, KR-04] – Hybrid DL-DB systems [Horrocks et al, CADE-05] – Hypertableau based algorithms [Motik et al, CADE-07] • Polynomial time algorithms for sub-ALC logics – Graph based techniques for EL+ [Baader et al, IJCAI-05] – Database techniques for DL-Lite [Calvanese et al, AAAI-05]
Extending Tools and Infrastructure • Editors/environments – Oiled, Protégé, Swoop, Top. Braid, Ontotrack, …
Extending Tools and Infrastructure • Editors/environments – Oiled, Protégé, Swoop, Top. Braid, Ontotrack, … • Reasoning systems – Cerebra, Fa. CT++, Kaon 2, Pellet, Racer, CEL, … Pellet KAON 2 CEL
Extending Tools and Infrastructure • Editors/environments – Oiled, Protégé, Swoop, Top. Braid, Ontotrack, … • Reasoning systems – Cerebra, Fa. CT++, Kaon 2, Pellet, Racer, CEL, … • Design methodologies – Modularity, foundational ontologies, etc. Entity Endurant Quality Substantial Perdurant Event Achievement Stative Accomplishment
Increasing Expressive Power • Database style keys [Lutz et al, JAIR 2004] • Rule language extensions – W 3 C RIF WG (see http: //www. w 3. org/2005/rules/) – First order extensions (e. g. , SWRL) [Horrocks et al, JWS, 2005] – Hybrid language extensions, e. g. , [Eiter et al, KR-04; Motik et al, ISWC-04; Rosati, Jo. WS, 2005] – LP/F-Logic/Common Logic [Chen et al, JLP, 1993; de Bruijn et al, WWW-05] • Other extensions – Temporal, Fuzzy, … • OWL 1. 1 extension to OWL
OWL 1. 1 • Is an extension of OWL – Addresses deficiencies identified by users and developers (at OWLED workshop) • Is based on more expressive DL: SROIQ – (OWL is based on SHOIN) • W 3 C working group now chartered – Will develop recommendation based on existing member submission • Already supported by popular OWL tools – Protégé, Swoop, Top. Braid, Fa. CT++, Pellet
What’s New in OWL 1. 1? Four kinds of features: • More expressive logic – qualified cardinality restrictions, e. g. : Object. Min. Cardinality(2 friend. Of hacker) – property chain inclusion axioms, e. g. : Sub. Object. Property. Of(Sub. Object. Property. Chain(parent brother) uncle) – local reflexivity restrictions, e. g. : Object. Exists. Self(likes) – [for narcissists] reflexive, irreflexive, symmetric, and antisymmetric properties, e. g. : Reflexive. Object. Property(knows); Irreflexive. Object. Property(husband. Of) – disjoint properties, e. g. : Disjoint. Object. Properties(child. Of spouse. Of)
What’s New in OWL 1. 1? Four kinds of features: • More expressive datatypes – User-defined datatypes using facets from XML Schema Datatypes, e. g. : Sub. Class. Of(Adult Data. Some. Values. From(age Datatype. Restriction(xsd: integer min. Inclusive "18"^^xsd: integer)) – Simple relationships between values of functional data-valued properties, e. g. : Data. Some. Values. From(shoe. Size IQ greater. Than)
What’s New in OWL 1. 1? Four kinds of features: • Metamodelling and annotations – Names can be used as any or all of an individual, a class, or a property – Allows for a restricted form of metamodelling (“punning”), e. g. : sub. Class. Of(Snow. Leopard Big. Cat) Class. Assertion(Snow. Leopard Endangered. Species) – Annotations of axioms as well as entities Class. Assertion(Comment(“source: WWF”) Snow. Leopard Endangered. Species)
What’s New in OWL 1. 1? Four kinds of features: • Syntactic sugar (make things easier to say) – Disjoint unions, e. g. : Disjoint. Union(Element Earth Wind Fire Water) – Negative assertions, e. g. : Negative. Object. Property. Assertion(Ian has. Child Mary) Negative. Data. Property. Assertion (Ian has. Age 21)
Tractable Fragments • OWL defines only one fragment (OWL Lite) – And it isn’t very tractable! • OWL 1. 1 defines several different fragments with useful computational properties – E. g. , reasoning complexity in range LOGSPACE to PTIME – Smaller fragments implementable using RDBs
Tractable Fragments
Summary • Semantic Web aims to make web content more accessible to automated processes – Adds semantic annotations to web resources • OWL Ontologies provide vocabulary for annotations – Terms have well defined meaning • OWL now being used in a wide range of applications – e-Science, medicine, geography, geology, … • Reasoning enabled tools are of crucial importance – For both design and deployment of ontologies • Active research area – Expressive power, scalability, methodologies, tools, …
Thank you for listening
Thank you for listening FRAZZ: © Jeff Mallett/Dist. by United Feature Syndicate, Inc. Any questions?
- Slides: 51