e Xtended Metadata Registries XMDR NKOS Workshop June

  • Slides: 35
Download presentation
e. Xtended Metadata Registries (XMDR) NKOS Workshop June 11, 2005 Bruce Bargmeyer bebargmeyer@lbl. gov

e. Xtended Metadata Registries (XMDR) NKOS Workshop June 11, 2005 Bruce Bargmeyer bebargmeyer@lbl. gov Chair: ISO/IEC JTC 1/SC 32 -Data Mgmt & Interchange PI: XMDR Project Lawrence Berkley National Laboratory University of California WWW. XMDR. ORG 1

XMDR Project Draws Together ISO/IEC 11179 Metadata Registries rs e s U Terminology Metadata

XMDR Project Draws Together ISO/IEC 11179 Metadata Registries rs e s U Terminology Metadata Registry Terminology Thesaurus Taxonomy CONCEPT Ontology Refers To Structured Metadata Symbolizes Data Standards Referent ISO/IEC JTC 1/SC 32 Stands For ISO TC 37 & … “Rose”, “Clip. Art” 2

Align, Coordinate, Integrate Standards/Recommendations Us ers ISO/IEC 11179 Metadata Registries Terminology Metadata Registry Terminology

Align, Coordinate, Integrate Standards/Recommendations Us ers ISO/IEC 11179 Metadata Registries Terminology Metadata Registry Terminology Thesaurus Taxonomy Ontology Structured Metadata CONCEPT Refers To Symbolizes Data Standards Referent ISO/IEC JTC 1/SC 32 Stands For ISO TC 37 & … Semantic Web “Rose”, “Clip. Art” W 3 C 3

XMDR Metadata Registries Extensions F Register (and manage) any semantics that are useful in

XMDR Metadata Registries Extensions F Register (and manage) any semantics that are useful in managing data. u u Enable users to find and register correspondences between the content (concepts & relationships) of multiple KOSs Enable registration of links between concepts and data in databases F Provide Semantic Services -- lay foundation for semantics based computing: Semantics Service Oriented Architecture, Semantic Grids, Semantics based workflows, Semantic Web … F Prepare standards proposals, especially for ISO/IEC 11179 Parts 2 & 3 F Develop a reference implementation 4

Current 11179 Metadata Registry Content E. g. , Country Identifier Data Element Concept Name:

Current 11179 Metadata Registry Content E. g. , Country Identifier Data Element Concept Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org. : Steward: Classification: Registration Authority: Others Algeria Belgium China Denmark Egypt France. . . Zimbabwe Data Elements Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others Algeria L`Algérie DZ DZA 012 Belgium Belgique BE BEL 056 China Chine CN CHN 156 Denmark Danemark DK DNK 208 Egypte EG EGY 818 France La France FR FRA 250 . . . . Zimbabwe ZW ZWE 716 ISO 3166 French Name ISO 3166 2 -Alpha Code ISO 3166 3 -Numeric Code ISO 3166 English Name 5

Use ISO/IEC 11179 MDR to Create XML Schemas, DBMS Schemas Data Element List –

Use ISO/IEC 11179 MDR to Create XML Schemas, DBMS Schemas Data Element List – Address Group 33 c Name Street Address City, State Postal Code Country <? xml version="1. 0"? > <ship. To > <name>Alice Wilson</name> <street>161 North Street</street> <city>Happy Valley</city> <state>MO</state> <zip>63105</zip> <country code>USA</country code> </ship. To> 6

XMDR: Register Ontologies Concept Geographic Area Geographic Sub-Area Country Identifier Country Name Short Name

XMDR: Register Ontologies Concept Geographic Area Geographic Sub-Area Country Identifier Country Name Short Name Mailing Address Country Name Long Name Distributor Country Name Country Code ISO 3166 2 -Character Code ISO 3166 3 -Numeric Code ISO 3166 3 - Character Code FIPS Code 7

XMDR: Register Any Knowledge Organization System (KOS) F Keywords F Glossaries F Gazetteers F

XMDR: Register Any Knowledge Organization System (KOS) F Keywords F Glossaries F Gazetteers F Thesauri F Taxonomies F Concept system (ISO TC 37) F Ontologies F Axiomititized Ontologies 8

XMDR: Register Graphs Graph Taxonomy: Graph Directed Graph Undirected Graph Directed Acyclic Graph Bipartite

XMDR: Register Graphs Graph Taxonomy: Graph Directed Graph Undirected Graph Directed Acyclic Graph Bipartite Graph Clique Partial Order Graph Faceted Classification Lattice Partial Order Tree Ordered Tree Note: not all bipartite graphs are undirected. 9

Samples of Eco & Bio Graph Data Nutrient cycles in microbial ecologies These are

Samples of Eco & Bio Graph Data Nutrient cycles in microbial ecologies These are bipartite graphs, with two sets of nodes, microbes and reactants (nutrients), and directed edges indicating input and output relationships. Such nutrient cycle graphs are used to model the flow of nutrients in microbial ecologies, e. g. , subsurface microbial ecologies for bioremediation. F Chemical structure graphs: Here atoms are nodes, and chemical bonds are represented by undirected edges. Multielectron bonds are often represented by multiple edges between nodes (atoms), hence these are multigraphs. Common queries include subgraph isomorphism. Chemical structure graphs are commonly used in chemoinformatics systems, such as Chem Abstracts, MDL Systems, etc. F Sequence data and multiple sequence alignments. DNA/RNA/Protein sequences can be modeled as linear graphs F Topological adjacency relationships also arise in anatomy. These relationships differ from partonomies in that adjacency relationships are undirected and not generally transitive. F 10

Eco & Bio Graph Data (Continued) F F F Taxonomies of proteins, chemical compounds,

Eco & Bio Graph Data (Continued) F F F Taxonomies of proteins, chemical compounds, and organisms, . . . These taxonomies (classification systems) are usually represented as directed acyclic graphs (partial orders or lattices). They are used when querying the pathways databases. Common queries are subsumption testing between two terms/concepts, i. e. , is one concept a subset or instance of another. Note that some phylogenetic tree computations generate unrooted, i. e. , undirected. trees. Metabolic pathways: chemical reactions used for energy production, synthesis of proteins, carbohydrates, etc. Note that these graphs are usually cyclic. Signaling pathways: chemical reactions for information transmission and processing. Often these reactions involve small numbers of molecules. Graph structure is similar to metabolic pathways. Partonomies are used in biological settings most often to represent common topological relationships of gross anatomy in multi-cellular organisms. They are also useful in sub-cellular anatomy, and possibly in describing protein complexes. They are comprised of part-of relationships (in contrast to is-a relationships of taxonomies). Part-of relationships are represented by directed edges and are transitive. Partonomies are directed acyclic graphs. Data Provenance relationships are used to record the source and derivation of data. Here, some nodes are used to represent either individual "facts" or "datasets" and other nodes represent "data sources" (either labs or individuals). Edges between "datasets" and "data sources" indicate "contributed by". Other edges (between datasets (or facts)) indicate derived from (e. g. , via inference or computation). Data provenance graphs are usually directed acyclic graphs. 11

A graph theoretic characterization F Readily comprehensible characterization of metadata structures F Graph structure

A graph theoretic characterization F Readily comprehensible characterization of metadata structures F Graph structure has implications for: u Integrity Constraint Enforcement u Data structures u Query languages u Combining metadata sets u Algorithms for query processing 12

Example: Tree California part-of Alameda County part-of Oakland Santa Clara County part-of Berkeley part-of

Example: Tree California part-of Alameda County part-of Oakland Santa Clara County part-of Berkeley part-of Santa Clara part-of San Jose 14

Example: Ordered Tree Paper part-of Title page Section Bibliography Note: implicit ordering relation among

Example: Ordered Tree Paper part-of Title page Section Bibliography Note: implicit ordering relation among parts of paper. 19

Example: Faceted Classification Vehicle Propulsion Facet Wheeled Vehicle Facet is-a 2 wheeled 4 wheeled

Example: Faceted Classification Vehicle Propulsion Facet Wheeled Vehicle Facet is-a 2 wheeled 4 wheeled 3 wheeled Human Powered is-a is-a Internal Combustion is-a is-a Bicycle Tricycle Auto Motorcycle 21

Example: Directed Acyclic Graph Vehicle is-a Wheeled Vehicle is-a 2 Wheeled Vehicle is-a 3

Example: Directed Acyclic Graph Vehicle is-a Wheeled Vehicle is-a 2 Wheeled Vehicle is-a 3 Wheeled Vehicle Propelled Vehicle is-a 4 Wheeled Vehicle Human Powered Vehicle is-a Internal Combustion Vehicle is-a is-a Bicycle Tricycle Auto Motorcycle 24

Example: Partial Order Graph Vehicle is-a Wheeled Vehicle 2 Wheeled Vehicle is-a 3 Wheeled

Example: Partial Order Graph Vehicle is-a Wheeled Vehicle 2 Wheeled Vehicle is-a 3 Wheeled Vehicle is-a 4 Wheeled Vehicle Human Powered Vehicle is-a Propelled Vehicle Internal Combustion Vehicle is-a is-a Bicycle Tricycle Auto Motorcycle Dashed line = inferred is-a (transitive closure) 26

Example Lattice: Powerset of 3 element set {a, b, c} {a, b} {a, c}

Example Lattice: Powerset of 3 element set {a, b, c} {a, b} {a, c} {a} {b} Denotes subset {b, c} {c} Empty Set 29

Example Bipartite Graph California Massachusetts CA MA Oregon OR States Two-letter state codes 32

Example Bipartite Graph California Massachusetts CA MA Oregon OR States Two-letter state codes 32

Example of Clique California CA Calif. CAL Here edges denote synonymy. 34

Example of Clique California CA Calif. CAL Here edges denote synonymy. 34

Example Compound Graph Colin Powell claimed Iraq had WMDs 36

Example Compound Graph Colin Powell claimed Iraq had WMDs 36

Challenges F F F F How to register & manage the various graph structures?

Challenges F F F F How to register & manage the various graph structures? u DBMS, File systems …. How to query the graph structures? u XQuery for XML u Poor to non-existent graph query languages How to get adequate performance, even in high performance computing environment User interface complexity How to manage semantic drift Versions How to interrelate graphs with other graphs and with data Granularity at which to register metadata (then point to greater detail elsewhere? ) 38

ISO/IEC 11179 Metadata Registries + XMDR F Register and manage semantics that are or

ISO/IEC 11179 Metadata Registries + XMDR F Register and manage semantics that are or can be harmonized and vetted by some Community of Interest (COI) F Provide Semantic Services F E. g. , the semantics can be referenced by RDF statements (subjects, predicates, objects) F The semantics can be used to bootstrap the Semantic Web and Semantic Computing u. A “vocabulary” that is grounded for some COI F Enable registration using formal logic as well as natural language 39

Terminolocgical & Formal (Axiomatized) Ontologies The difference between a terminological ontology and a formal

Terminolocgical & Formal (Axiomatized) Ontologies The difference between a terminological ontology and a formal ontology is one of degree: as more axioms are added to a terminological ontology, it may evolve into a formal or axiomatized ontology. Cyc has the most detailed axioms and definitions; it is an example of an axiomatized or formal ontology. Word. Net is usually considered a terminological ontology. Building, Sharing, and Merging Ontologies John F. Sowa 40

An Axiom for an Axiomatized Ontology Definition: The resource_cost_point predicate, cpr, specifies the cost_value,

An Axiom for an Axiomatized Ontology Definition: The resource_cost_point predicate, cpr, specifies the cost_value, c, (monetary units) of a resource, r, required by an activity, a, upto a certain time point, t. If a resource of the terminal use or consume states, s, for an activity, a, are enabled at time point, t, there must exist a cost_value, c, at time point, t, for the activity, a, that uses or consumes the resource, r. The time interval, ti = [ts, te], during which a resource is used or consumed byan activity is specified in the use or consume specifications as use_spec(r, a, ts, te, q) or consume_spec(r, a, ts, te, q) where activity, a, uses or consumes quantity, q, of resource, r, during the time interval [ts, te]. Hence, Axiom: ∀ a, s, r, q, ts, te, (use_spec(r, a, ts, te, q)∧ enabled(s, a, t))∨ (consume_spec(r, a, ts, te, q)∧ enabled(s, a, t))≡∃c, cpr(a, c, t, r) Cost Ontology for TOronto Virtual Enterprise (TOVE) 41

XMDR: Vocabularies – Vetted with a Community of Interest to Boostrap the Semantic Web

XMDR: Vocabularies – Vetted with a Community of Interest to Boostrap the Semantic Web and Semantic Computing db. A: e 0139 ai: Mailing. Address db. A: ma 344 ai: State. USPSCode “AB”^^ai: State. Code @prefix db. A: “http: /www. epa. gov/database. A” @prefix ai: “http: //www. epa/gov/edr/sw/Administered. Item#” 42

ISO/IEC 11179 Part 3 Metamodel (Metadata Registry Schema) Expressed in: F UML model XMDR:

ISO/IEC 11179 Part 3 Metamodel (Metadata Registry Schema) Expressed in: F UML model XMDR: Also express in: F OWL Ontology F Common Logic 43

XMDR Prototype: Modular Architecture-Initial Implemented Modules External Interface Registry. Store Registry Java Writable. Registry.

XMDR Prototype: Modular Architecture-Initial Implemented Modules External Interface Registry. Store Registry Java Writable. Registry. Store Subversion Authentication. Service Retrieval. Index Metadata. Validator Jena, Xerces Full. Text. Index Lucene Mapping. Engine 11179 OWL Ontology Generalization Logic. Based. Index Jena, OWI KS Racer Ontology Editor Protege Composition (tight ownership) Aggregation (loose ownership) 44

XMDR Content Priority List Phase 1 (V. A) National Drug File Reference Terminology DTIC

XMDR Content Priority List Phase 1 (V. A) National Drug File Reference Terminology DTIC Thesaurus (Defense Technology Info. Center Thesaurus) NCI Thesaurus National Cancer Institute Thesaurus NCI Data Elements (National Cancer Institute Data Standards Registry UMLS (non-proprietary portions) GEMET (General Multilingual Environmental Thesaurus) EDR Data Elements (Environmental Data Registry) ISO 3166 Country Codes – from EPA EDR USGS Geographic Names Information System (GNIS) 45

XMDR Content Priority List Phase 2 LOINC Logical Observation Identifiers Names and Codes ITIS

XMDR Content Priority List Phase 2 LOINC Logical Observation Identifiers Names and Codes ITIS Integrated Taxonomic Information System Getty Thesaurus of Geographic Names (TGN) SIC (Standard Industrial Classification System) NAICS (North American Industrial Classification System) NAIC-SIC mappings UNSPSC (United Nations Standard Products and Services Codes) EPA Chemical Substance Registry System EPA Terminology Reference System ISO Language Identifiers ISO 639 -3 Part 3 IETF Language Identifiers RFC 1766 Units Ontology 46

XMDR Content Priority List Phase 3 HL 7 Terminology HL 7 Data Elements GO

XMDR Content Priority List Phase 3 HL 7 Terminology HL 7 Data Elements GO (Gene Ontology) NBII Biocomplexity Thesaurus EPA Web Registry Controlled Vocabulary Bio. PAX Ontology NASA SWEET Ontologies NDRTF 47

Project Status This year: F Building XMDR Core (content & System) Next year: F

Project Status This year: F Building XMDR Core (content & System) Next year: F Extend XMDR-Core (content & system) F R&D Semantic Services in a Semantic Service Oriented Architecture Expected to be five-year project 48

XMDR Project Participants F Collaborative, interagency effort u. US Environmental Protection Agency u. United

XMDR Project Participants F Collaborative, interagency effort u. US Environmental Protection Agency u. United States Geological Survey u. National Cancer Institute u. Mayo Clinic u. US Department of Defense u. Lawrence Berkeley National Laboratory u& others F Interagency/International Cooperation on Ecoinformatics u. EPA, European Environment Agency, UNEP u. Ecoterm 49

Job Posting F Vacancy Announcement--Research position to be posted at LBNL—looking for person with

Job Posting F Vacancy Announcement--Research position to be posted at LBNL—looking for person with education and experience in semantics, terminology, system development WWW. XMDR. ORG 50

Acknowledgements and References FFrank Olken, LBNL FKevin Keck, LBNL FJohn Mc. Carthy, LBNL 51

Acknowledgements and References FFrank Olken, LBNL FKevin Keck, LBNL FJohn Mc. Carthy, LBNL 51