State of the Art for Ontology Repositories Frank

  • Slides: 42
Download presentation
State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.

State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf. gov Presentation to Ontology Summit NIST Gaithersburg, MD April 28, 2008

Disclaimer Opinions expressed in this talk are solely those of the author, and do

Disclaimer Opinions expressed in this talk are solely those of the author, and do not reflect the positions of either the National Science Foundation, CISE, IIS or Lawrence Berkeley National Laboratory. April 28, 2008 F. Olken, Ontology Summit 2008 2

This talk: I will address key issues in the design and implementation of ontology

This talk: I will address key issues in the design and implementation of ontology repositories and some of the major technologies being used to address these issues. April 28, 2008 F. Olken, Ontology Summit 2008 3

Outline What is an ontology repository? Why doe one want one? Macro vs. Micro

Outline What is an ontology repository? Why doe one want one? Macro vs. Micro Issues Implementation Issues April 28, 2008 F. Olken, Ontology Summit 2008 4

Implementation Issues Ontology acquisition, ingestion Macro vs. micro issues Centralized vs. Decentralized Ontology representation

Implementation Issues Ontology acquisition, ingestion Macro vs. micro issues Centralized vs. Decentralized Ontology representation Ontology search, query Ontology Integration Auxiliary tools SOA, etc. April 28, 2008 F. Olken, Ontology Summit 2008 5

What is an Ontology Repository? System for storing, searching, retrieving multiple ontologies Support for

What is an Ontology Repository? System for storing, searching, retrieving multiple ontologies Support for ontology integration Variously: Tools for ontology creation, editing, visualization Tools for ontology annotation, curation, . . April 28, 2008 F. Olken, Ontology Summit 2008 6

Multiple Ontologies This is the source of the hardest problems in building ontology repositories:

Multiple Ontologies This is the source of the hardest problems in building ontology repositories: Scale Diverse ontology representations Ontology integration (mapping) Namespace issues Complex provenance issues April 28, 2008 F. Olken, Ontology Summit 2008 7

Why would you want an OR? You need to deal with multiple ontologies Usual

Why would you want an OR? You need to deal with multiple ontologies Usual reasons for ontologies: Natural Language Processing support Data Integration, Exchange Data semantics Support for DB queries DB, application design Classification / Indexing of documents, etc. Creation / maintenance /use of controlled vocabularies April 28, 2008 F. Olken, Ontology Summit 2008 8

Ontology Acquisition Manual acquisition and loading e. g. XMDR Useful if ontology representations are

Ontology Acquisition Manual acquisition and loading e. g. XMDR Useful if ontology representations are very diverse. Spidering the web to find ontologies (e. g. , Nutch) Google (etc. ) search to find ontologies How does one recognize an ontology? Use of OWL, RDF, CL, etc. Lots of is-a, part-of relations. . . Comments that assert file is an ontology April 28, 2008 F. Olken, Ontology Summit 2008 9

Ontology Ingestion Parsing ontology, syntactic validation Consistency checking (no cycles in partial orders: taxonomies,

Ontology Ingestion Parsing ontology, syntactic validation Consistency checking (no cycles in partial orders: taxonomies, partonomies) Conversion to common representation (? ) Syntactic translation Semantic translation e. g. , CWA vs. OWA Indexing, transitive closure computations, . . . April 28, 2008 F. Olken, Ontology Summit 2008 10

Centralized vs. Federated Architectures Centralized: collect ontologies into one place High startup, maintenance costs

Centralized vs. Federated Architectures Centralized: collect ontologies into one place High startup, maintenance costs Fast retrieval, facilitates integration Federated: ontologies stay put Low startup, maintenance costs Less performance, reliability More requirements on ontology sites Hybrid Centralize ontology level metadata, indices Leave individual ontologies in place April 28, 2008 F. Olken, Ontology Summit 2008 11

Macro vs. Micro-level Issues Macro-level Searching across a collection of ontologies and their metadata

Macro vs. Micro-level Issues Macro-level Searching across a collection of ontologies and their metadata Micro-level Searching, inferencing, within individual ontologies April 28, 2008 F. Olken, Ontology Summit 2008 12

Macro & Micro similarities Most (not all) macro and micro level issues are essentially

Macro & Micro similarities Most (not all) macro and micro level issues are essentially the same and can use the same technologies for implementation. April 28, 2008 F. Olken, Ontology Summit 2008 13

Macro-level Support Over collections of ontologies Use an ontology of ontologies e. g. ,

Macro-level Support Over collections of ontologies Use an ontology of ontologies e. g. , taxonomy of subject matter Ontology of ontology metadata April 28, 2008 F. Olken, Ontology Summit 2008 14

Ontology Search Text-based search Natural language definitions Symbols E. g. , Lucene, UIMA Semantic

Ontology Search Text-based search Natural language definitions Symbols E. g. , Lucene, UIMA Semantic Search Over ontology representation (RDF, OWL, CL) e. g. , SPARQL, etc. e. g. , faceted search (e. g. , Siderean) e. g. , navigation over taxonomies, etc. April 28, 2008 F. Olken, Ontology Summit 2008 15

Ontology Representations Text Frames (OBO) Graphs (RDF) Logics (OWL-DL, OWL Full, CL) April 28,

Ontology Representations Text Frames (OBO) Graphs (RDF) Logics (OWL-DL, OWL Full, CL) April 28, 2008 F. Olken, Ontology Summit 2008 16

Text Representation Obvious candidate for ontology representation of informal ontologies, with natural language definitions,

Text Representation Obvious candidate for ontology representation of informal ontologies, with natural language definitions, etc. . . A lowest common denominator representation for more formal ontology representations Readily supports handling diverse ontology representations (must add tags for underlying ontology representation language) Only supports text search directly April 28, 2008 F. Olken, Ontology Summit 2008 17

Frame Representations Each frame is a collection of: (slot, value) pairs or (slot, value

Frame Representations Each frame is a collection of: (slot, value) pairs or (slot, value list) Originally deployed in Lisp Secondary Storage Each frame is a BLOB Or, decompose into finer grained DB entries Current uses: OBO (open biological ontology) format April 28, 2008 F. Olken, Ontology Summit 2008 18

Graph Representations a. k. a. Semantic networks, semantic graphs Examples: RDF, RDF schemas, XLinks

Graph Representations a. k. a. Semantic networks, semantic graphs Examples: RDF, RDF schemas, XLinks List of edges, each edge: Subject Predicate (relation name, attribute name) Object (or attribute value) Very flexible Only support binary relations directly April 28, 2008 F. Olken, Ontology Summit 2008 19

Types of Graphs Trees Simple Taxonomies (isa), Partonomies (partof) Multi-faceted Classifications Taxonomies with multiple

Types of Graphs Trees Simple Taxonomies (isa), Partonomies (partof) Multi-faceted Classifications Taxonomies with multiple facets e. g. . , Vehicles: purpose, propulsion, wheels, axles, color Directed acyclic graphs Multiple inheritance Partial orders April 28, 2008 F. Olken, Ontology Summit 2008 20

Types of graphs Arbitrary directed graphs Allows arbitrary binary relationships Named graphs Allows separate

Types of graphs Arbitrary directed graphs Allows arbitrary binary relationships Named graphs Allows separate inclusion hierarchy Allow edges to point to/from subgraphs April 28, 2008 F. Olken, Ontology Summit 2008 21

Partial Orders Many ontologies are Partial Orders (i. e, directed acyclic graphs), e. g.

Partial Orders Many ontologies are Partial Orders (i. e, directed acyclic graphs), e. g. , taxonomies, partonomies, . . . Merging ontologies which are partial orders should also yield partial orders See work of Cliff Joslyn (PNNL) April 28, 2008 F. Olken, Ontology Summit 2008 22

Note: RDF are collections of edges (triples) No naked nodes allowed April 28, 2008

Note: RDF are collections of edges (triples) No naked nodes allowed April 28, 2008 F. Olken, Ontology Summit 2008 23

Graph Implementations Represent graph as: Triple store (as on previous slide) Quad store (support

Graph Implementations Represent graph as: Triple store (as on previous slide) Quad store (support named graphs) Standalone system, relational DBMS, column store April 28, 2008 F. Olken, Ontology Summit 2008 24

Quad stores & Named graphs Quad stores allow named graphs (named graph, subject, predicate,

Quad stores & Named graphs Quad stores allow named graphs (named graph, subject, predicate, object) Named graphs (quads) allow one to name subgraphs (collections of edges) and to refer to them by name Hence, subjects and objects are no longer just nodes, but may be subgraphs (collections of edges) April 28, 2008 F. Olken, Ontology Summit 2008 25

Secondary storage of graphs Long skinny relations Triples or quads Column stores (Monet DB,

Secondary storage of graphs Long skinny relations Triples or quads Column stores (Monet DB, Vertica) Multiple indices sorted by: subject, predicate, object, combinations, . . . Clusters of edges (Cogito) April 28, 2008 F. Olken, Ontology Summit 2008 26

Semantic graph query languages SPARQL is now the primary candidate Undergoing W 3 C

Semantic graph query languages SPARQL is now the primary candidate Undergoing W 3 C “standardization” April 28, 2008 F. Olken, Ontology Summit 2008 27

Logic-based Ontology Representations Description Logic (e. g. , OWL-DL) Restricted to make it decidable

Logic-based Ontology Representations Description Logic (e. g. , OWL-DL) Restricted to make it decidable and computationally tractable Typically, lacks cardinality constraints, arithmetic Datalog (Horn clause logic + recursion) Prolog based First Order Logic (e. g. , Common Logic) IKL (FOL + name propositions) April 28, 2008 F. Olken, Ontology Summit 2008 28

Logic-based representations Precise, formal semantics Expressiveness (esp. FOL) Issues of scaling, decidability, computational tractability

Logic-based representations Precise, formal semantics Expressiveness (esp. FOL) Issues of scaling, decidability, computational tractability Esp. for FOL Description Logics growing usage DL + rules languages to approx. FOL April 28, 2008 F. Olken, Ontology Summit 2008 29

Materialization of Partial Orders Partial orders = taxonomies, partonomies Typically specified as direct “edges”

Materialization of Partial Orders Partial orders = taxonomies, partonomies Typically specified as direct “edges” Immediate is-a, or part-of relations Naïve implementation requires repeated traversal of the partial order graph. Materialization of the transitive closure of the partial order (e. g. , taxonomy) can reduce query times However, initialization and maintenance are expensive in time and storage April 28, 2008 F. Olken, Ontology Summit 2008 30

Ontology Constraints Type constraints Range, domain constraints Cardinality constraints on relations DB Integrity constraints

Ontology Constraints Type constraints Range, domain constraints Cardinality constraints on relations DB Integrity constraints Functional dependencies Inclusion dependencies (foreign key constraints) Invertibility Disjointedness (of subclasses) April 28, 2008 F. Olken, Ontology Summit 2008 31

Need for Provenance Fiction: Ontologists write definitions ab initio Reality: Most “definitions” are written

Need for Provenance Fiction: Ontologists write definitions ab initio Reality: Most “definitions” are written by: April 28, 2008 Administrators (e. g. , Code of Federal Regulations) Legislatures (legislation) Judges (court decisions) Professional bodies (accounting regulations) F. Olken, Ontology Summit 2008 32

Implications for Provenance We need to track the provenance of definitions Typically this requires

Implications for Provenance We need to track the provenance of definitions Typically this requires citations to external documents May also require tracking of individual “definition” decisions. . Varying granularity requirements Individual definitions Collections of axioms, definitions Examples: see ISO 11179, XMDR April 28, 2008 F. Olken, Ontology Summit 2008 33

Other Tools Ontology Creation tools Ontology Editors Ontology Differencing tools Ontology modularization tools (clustering,

Other Tools Ontology Creation tools Ontology Editors Ontology Differencing tools Ontology modularization tools (clustering, etc. ) Ontology Export Ontology Visualization (e. g. , graph visualization) Version management Access control April 28, 2008 F. Olken, Ontology Summit 2008 34

SOA: Service Oriented Architecture Very popular Permit distributed implementations Two major alternatives: REST (Representational

SOA: Service Oriented Architecture Very popular Permit distributed implementations Two major alternatives: REST (Representational State Transfer) Built on HTTP (get, put, delete, post operators) URL/URI addresses for all objects SOAP/WSDL April 28, 2008 Based on XML Remote Procedure Calls F. Olken, Ontology Summit 2008 35

REST vs SOAP REST Simple to implement Requires little more than: HTTP server XML

REST vs SOAP REST Simple to implement Requires little more than: HTTP server XML parsers SOAP Much more software complexity Lots of software tooling from commercial vendors Better security ? April 28, 2008 F. Olken, Ontology Summit 2008 36

My advice on REST vs. SOAP: Use REST. April 28, 2008 F. Olken, Ontology

My advice on REST vs. SOAP: Use REST. April 28, 2008 F. Olken, Ontology Summit 2008 37

Ontology Repository Related Standards ISO/IEC 11179 Metadata Registries version 3. 0 of Part 3)

Ontology Repository Related Standards ISO/IEC 11179 Metadata Registries version 3. 0 of Part 3) OMG ODM Ontology Definition Metamodel ISO 13250 Topic Maps XML Topic Maps Specification (topicmaps. org) W 3 C OWL recommendations W 3 C RDF recommendations April 28, 2008 F. Olken, Ontology Summit 2008 38

Ontology Related Standards ISO/IEC 24707 Common Logic ISO TC 37 Terminology Services Standards W

Ontology Related Standards ISO/IEC 24707 Common Logic ISO TC 37 Terminology Services Standards W 3 C SKOS Simple Knowledge Organization System Reference ISO/IEC 19763 Metamodel Framework for Interoperability (Ontology metadata) April 28, 2008 F. Olken, Ontology Summit 2008 39

Recapitulation Ontology Repositories support storage, search, retrieval of multiple ontologies and ontology integration Macro-level

Recapitulation Ontology Repositories support storage, search, retrieval of multiple ontologies and ontology integration Macro-level & Micro-level support and search pose similar problems A common ontology representation is desirable, but difficult Multiple ontology representations and ontology integration are the most difficult issues aspects. April 28, 2008 F. Olken, Ontology Summit 2008 40

Acknowledgements This work was supported by NSF IPA agreement with LBNL, IRD support. My

Acknowledgements This work was supported by NSF IPA agreement with LBNL, IRD support. My earlier work on ontology repositories at LBNL was supported by EPA and DOD. The author would like to thank Joel Sachs, Mark Musen, Natasha Noy, Eric Neumann, Bob Mac. Gregor, Cliff Joslyn, Kevin Keck, Elise Kendall, Mala Mehrotra, Dan Abadi, Deb Mc. Guiness, et al. for their remarks to me about knowledge representation, ontology repositories and ontology mappings. April 28, 2008 F. Olken, Ontology Summit 2008 41

Contact Information Frank Olken National Science Foundation 4201 Wilson Blvd. , Suite 1125 Arlington,

Contact Information Frank Olken National Science Foundation 4201 Wilson Blvd. , Suite 1125 Arlington, VA 22230 Email: folken@nsf. gov Tel: 703 -292 -8930 (receptionist) Tel: 703 -292 -7350 (direct) April 28, 2008 F. Olken, Ontology Summit 2008 42