Ontology and Its Applications Barry Smith http ontologist
Ontology and Its Applications Barry Smith http: //ontologist. com www. ifomis. org
OVERVIEW Part I: A Brief Overview of Developments in Ontology at the Borderlines of Philosophy and Computation Part II: Ontology and Biomedical Informatics www. ifomis. org 2
IFOMIS now part of European Centre for Ontological Research, Saarbrücken, Germany www. ifomis. org 3
Institute for Formal Ontology and Medical Information Science 16 staff 2 medical informaticians 1 neurologist 1 chemist 1 radiologist 2 computer scientists 9 philosophers www. ifomis. org 4
The problem Different communities of researchers use different and often incompatible concepts / categories in expressing the results of their work www. ifomis. org 5
Example: Medicine blood is a tissue blood is a body fluid How to integrate competing conceptualizations? www. ifomis. org 6
Example: Molecular Biology GDB Genome Database of Human Genome Project Gen. Bank National Center for Biotechnology Information, Washington DC www. ifomis. org 7
What is a gene? GDB: a gene is a DNA fragment that can be transcribed and translated into a protein Gen. Bank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype www. ifomis. org 8
How to integrate competing conceptualizations for example across the granular divide between medicine and molecular biology? www. ifomis. org 9
Answer: ONTOLOGY! But what does “ontology” mean? www. ifomis. org 10
Three senses of ‘ontology’ 1. Philosophical sense: Aristotle: an inventory of the types of entities and relations in reality Quine: an inventory of ontological commitments 2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain 3. Gene Ontology sense: a controlled vocabulary for database annotation / indexing www. ifomis. org 11
Two Communities Reference Ontology Community: An ontology is an inventory of the types of entities and relations which exist in a given domain of reality KR Community: an ontology is a consensus representation of the concepts used in a given domain of discourse www. ifomis. org 12
“Ontology” as used in KR / AI had its roots in Quine’s doctrine of ontological commitment and in the ‘internal metaphysics’ of Carnap/Putnam www. ifomis. org 13
Quineanism: ontology is the study of the ontological commitments or presuppositions embodied in scientific theories (or in the beliefs of those experts, or in the databases of that company) www. ifomis. org 14
Quineanism, too, faces the integration problem If an ontology is the set of ontological commitments of a theory how can we cope with questions pertaining to the relations between the objects to which different theories are committed? Quine can tell us what there is but can he tell us how it is related together? www. ifomis. org 15
The problem of the unity of science The logical positivist solution to this problem addressed a world in which sciences are identified with printed texts What if sciences are identified with information systems or with the contents of websites? www. ifomis. org 16
The Semantic Web Initiative The Web is a vast edifice of heterogeneous data sources Needs the ability to query and integrate across different and often incompatible conceptual systems www. ifomis. org 17
How resolve such incompatibilities and make the various parts of the web interoperable? Enforce conceptual compatibility via standardized taxonomies applied to websites as meta-tags formulated within the framework of a common web language like OWL www. ifomis. org 18
Tim Berners Lee: hyperlinked vocabularies, called ‘ontologies’ will be used by Web authors ‘to explicitly define their words and concepts as they post their stuff online. ‘codes would let software "agents" analyze the Web on our behalf, making smart inferences that go far beyond the simple linguistic analyses performed by today's search engines. ’ www. ifomis. org 19
A new silver bullet www. ifomis. org 20
Metadata in Web commerce agree on a metadata standard for washing machines as concerns size, price, etc. create machine-readable databases and put them on the net consumers can query multiple sites simultaneously and search for highly specific, reliable, context-sensitive results www. ifomis. org 21
Metadata in science agree on metadata standards for molecules (genes, proteins, drugs), clinical phenomena, therapies. . . create machine-readable databases and put them on the net biomedical researchers can query multiple sites simultaneously and search for highly specific, reliable, context-sensitive results www. ifomis. org 22
A world of exhaustive, reliable metadata would be utopia (Cary Doctorow) www. ifomis. org 23
Problem 1: People lie Cheating in assigning meta-tags can confer benefits to the cheaters Metadata exists in a competitive world. Some people are crooks. Some people are cranks. www. ifomis. org 24
Semantic Web effort thus far devoted primarily to developing systems for standardized representation of web pages and web processes (= ontology of web typography) not to the harder task of developing ontologies (reliable taxonomies, term hierarchies) for the content of such web pages www. ifomis. org 25
Problem 2: People are lazy Half the pages on Geocities are called “Please title this page” www. ifomis. org 26
Problem 3: People are stupid The vast majority of the Internet's users (even those who are native speakers of English) cannot spell or punctuate Will internet users learn to accurately tag their information with whatever taxonomy and syntax they're supposed to be using? www. ifomis. org 27
even with correct XML-syntax: <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3. 471. 99. 60</TEL> <FAX>+32(0)3. 891. 99. 65</FAX> <GSM>+32(0)465. 23. 04. 34</GSM> <WEBSITE>www. newco. com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17 www. ifomis. org 28 </STREET>
errors still abound Is "Jules" the <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> first name of <LASTNAME>Deryck</LASTNAME> the person, or <COMPANY>Newco</COMPANY> of the <MEMBEROF>XTC Group</MEMBEROF> business<JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3. 471. 99. 60</TEL> card? <FAX>+32(0)3. 891. 99. 65</FAX> <GSM>+32(0)465. 23. 04. 34</GSM> <WEBSITE>www. newco. com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> www. ifomis. org 29
errors still abound Is Jules or <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> Newco the <LASTNAME>Deryck</LASTNAME> member of XTC <COMPANY>Newco</COMPANY> Group? <MEMBEROF>XTC Group</MEMBEROF> <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3. 471. 99. 60</TEL> <FAX>+32(0)3. 891. 99. 65</FAX> <GSM>+32(0)465. 23. 04. 34</GSM> <WEBSITE>www. newco. com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD> www. ifomis. org 30
errors still abound <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> Do the phone <MEMBEROF>XTC Group</MEMBEROF> numbers and <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3. 471. 99. 60</TEL> address belong <FAX>+32(0)3. 891. 99. 65</FAX> to Jules or to the <GSM>+32(0)465. 23. 04. 34</GSM> business? <WEBSITE>www. newco. com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD> www. ifomis. org 31
Problem 4: Building good ontologies/standardized taxonomies is very difficult and the constraints imposed by OWL and similar languages make the job even harder www. ifomis. org 32
Problem 5: Ontology Impedance = semantic mismatch between ontologies ‘gene’ used in websites issued by biotech companies involved in gene patenting medical researchers interested in role of genes in predisposition to smoking insurance companies www. ifomis. org 33
Problem 6: The Concept Orientation Tom Gruber: An ontology is a specification of a conceptualization Semantic Web: specify Tom’s, and Dick’s, and Harry’s conceptualizations carefully, ensure that all are formulated in a common (XML-based) syntax Presto: conceptualizations will somehow become integrated www. ifomis. org 34
even a world of exhaustive, reliable metadata would not solve the problem of integration www. ifomis. org 35
expressing different systems of concepts in a common syntactic environment does not resolve conceptual incompatibilities www. ifomis. org 36
different conceptualizations www. ifomis. org 37
need not interconnect at all www. ifomis. org 38
we cannot make incompatible terminology-systems interconnect just by looking at concepts, or knowledge or language www. ifomis. org 39
to decide which of a plurality of competing conceptualizations to accept we need some tertium quid www. ifomis. org 40
we need, in other words, to take the world itself into account www. ifomis. org 41
Compare the way biologists resolve disagreements as to whether they mean the same thing by different words: by pointing to the objects in their lab www. ifomis. org 42
www. ifomis. org 43
The Semantic Web is a machine for creating syllogisms (Clay Shirky) Humans are mortal Greeks are human Therefore, Greeks are mortal www. ifomis. org 44
Lewis Carroll No interesting poems are unpopular among people of real taste No modern poetry is free from affectation All your poems are on the subject of soapbubbles No affected poetry is popular among people of real taste No ancient poetry is on the subject of soapbubbles Therefore: All your poems are bad. www. ifomis. org 45
the promise of the Semantic Web it will improve all the areas of your life where you currently use syllogisms www. ifomis. org 46
Semantic Web compatibility problems should be solved automatically (by machine) Hence ontologies must be applications running in real time www. ifomis. org 47
Semantic Web methodology Get syntax right first (Conceptualism; weak expressive resource; weak Description Logics – to ensure computational tractability) and integration of ‘concepts’ will take care of itself but only at the price of Procrustean simplification www. ifomis. org 48
IFOMIS methodology Get ontology right first (use powerful logic to develop ontology as theory of reality and solve tractability problems later) only thus will we have some hope of genuine integration across different disciplines and data resources www. ifomis. org 49
Belnap “it is a good thing logicians were around before computer scientists; “if computer scientists had got there first, then we wouldn’t have numbers because arithmetic is undecidable” www. ifomis. org 50
It is a good thing philosophical ontology was around before Description Logics, because otherwise we would have only hierarchies of concepts together with abstract mathematical models and no universals or instances in reality… www. ifomis. org 51
Recall: GDB: a gene is a DNA fragment that can be transcribed and translated into a protein Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype www. ifomis. org 52
Ontology ‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’, ‘type’. . . ‘part’, ‘whole’, ‘function’, ‘inhere’, ‘substance’ … are ontological terms in the sense of traditional (philosophical) ontology www. ifomis. org 53
The idea of a reference ontology a theory of the kinds of entities existing in reality and of the relations between them www. ifomis. org 54
The Reference Ontology Community IFOMIS (Saarbrücken) Laboratories for Applied Ontology (Trento/Rome, Turin) Ontology Works (Baltimore) Department of Biological Structure (Seattle) Medical Ontology Research (Bethesda) The Gene Ontology / Open Biological Ontologies Consortium www. ifomis. org 55
IFOMIS’s long-term goal Build a robust high-level reference ontology THE WORLD’S FIRST INDUSTRIAL-STRENGTH PHILOSOPHY as the basis for an ontologically coherent unification of biomedical knowledge and terminology www. ifomis. org 56
Two upper-level ontologies reference BFO (Saarbrücken) – Basic Formal Ontology DOLCE (Trento/Rome) www. ifomis. org 57
Aristotle First ontologist www. ifomis. org 58
Edmund Husserl www. ifomis. org 59
Formal Ontology term coined by Husserl = theory of those ontological structures such as part-whole, universal-particular which apply to all domains whatsoever www. ifomis. org 60
Husserl’s Logical Investigations¸ 1900/01 – Aristotelian theory of universals and particulars – theory of part and whole – theory of ontological dependence – theory of boundaries and fusion www. ifomis. org 61
Formal Ontology contrasted with material or regional ontologies (compare relation between pure and applied mathematics) Husserl’s idea: If we can build a good formal ontology, this should save time and effort in building reference ontologies for each successive material domain www. ifomis. org 62
In formal ontology as in formal logic, we can grasp the properties of given structures in such a way as to establish in one go the properties of all formally similar structures www. ifomis. org 63
Compare: 1) pure mathematics (theories of structures such as order, set, function, mapping) employed in every domain 2) applied mathematics, applications of these theories = re-using the same definitions, theorems, proofs in new application domains 3) physical chemistry, biophysics, etc. = adding detail www. ifomis. org 64
Three levels of ontology 1) formal (top-level) ontology = ? ? ? 2) biomedical ontology has nothing like the technology of definitions, theorems and proofs provided by pure mathematics 2) domain ontology = UMLS Semantic Network, GO, GALEN CORE 3) terminology-based ontology = UMLS, SNOMED-CT, GALEN, FMA www. ifomis. org 65
www. ifomis. org 66
The Concept Orientation An ontology is a consensus representation of concepts www. ifomis. org 67
‘concept’ runs together: a) meaning shared in common by synonymous terms b) idea shared in common in the minds of those who use these terms c) universal, type, feature or property shared in common by entities in the world www. ifomis. org 68
There are more word meanings than there are universals / types of entities in reality unicorn devil canceled workshop prevented pregnancy imagined mammal fractured lip. . . www. ifomis. org 69
space of word meanings space of universals www. ifomis. org 70
space of word meanings space of universals www. ifomis. org 71
space of word meanings space of universals www. ifomis. org 72
space of of word meanings www. ifomis. org 73
if ontological relations are defined across the whole space of word meanings rather than across the space of universals instantiated in reality then our tools for dealing with such relations are blunted www. ifomis. org 74
meningitis is_a disease of the nervous system is a statement about universals in reality www. ifomis. org 75
A is_a B =def. ‘A’ is narrower in meaning than ‘B’ unicorn is_a one-horned mammal www. ifomis. org 76
The linguistic reading of ‘concept’ yields a smudgy view of reality, built out of relations like: ‘synonymous_with’ ‘associated_to’ www. ifomis. org 77
Fruit Similar. To Vegetable Narrower. Than Orange www. ifomis. org Synonym. With Apfelsine Goble & Shadbolt 78
The concept-based approach can provide some half-way coherent treatment of is_a relations www. ifomis. org 79
but it can’t cope at all with relations like part_of = def. composes, with one or more other physical units, some larger whole contains =def. is the receptacle for fluids or other substances www. ifomis. org 80
connected_to =def. Directly attached to another physical unit as tendons are connected to muscles. How can a meaning or concept be directly attached to another physical unit as tendons are connected to muscles ? www. ifomis. org 81
An example of the concept orientation Unified Medical Language System (UMLS) www. ifomis. org 82
UMLS Metathesaurus: 1 million biomedical concepts 2. 8 million concept names from more than 100 controlled vocabularies and classifications built by US National Library of Medicine www. ifomis. org 83
UMLS Source Vocabularies Me. SH – Medical Subject Headings … ICD International Classification of Diseases … GO – Gene Ontology … FMA – Foundational Model of Anatomy … www. ifomis. org 84
To reap the benefits of standardization we need to make ONE SYSTEM out of many different terminologies = UMLS “Semantic Network” nearest thing to an “ontology” in the UMLS www. ifomis. org 85
UMLS SN described by its authors as “An Upper Level Ontology for the Biomedical Domain” (Compare the Semantic Web initiative) www. ifomis. org 86
UMLS SN 134 Semantic Types 54 types of edges (relations) yielding a graph containing more than 6, 000 edges www. ifomis. org 87
www. ifomis. org Fragment of UMLS SN 88
www. ifomis. org 89
www. ifomis. org 90
UMLS SN Top Level entity physical object event conceptual entity organism www. ifomis. org 91
conceptual entity Organism Attribute Finding Idea or Concept Occupation or Discipline Organization Group Attribute Intellectual Product Language www. ifomis. org 92
conceptual entity idea or concept functional concept body system www. ifomis. org 93
entity physical object conceptual entity idea or concept confusion of entity and concept functional concept body system www. ifomis. org 94
Functional Concept: Body system is_a Functional Concept. but: Concepts do not perform functions or have physical parts. www. ifomis. org 95
This: is not a concept www. ifomis. org 96
Confusion of Ontology and Epistemology Physical Object Substance Food www. ifomis. org Chemical Body Substance 97
Confusion of Ontology and Epistemology Chemical Viewed Structurally www. ifomis. org Chemical Viewed Functionally 98
Chemical Viewed Structurally Inorganic Organic Chemical www. ifomis. org Chemical Viewed Functionally Enzyme Biomedical or Dental Material 99
Chemical Viewed Structurally Inorganic Organic Chemical Viewed Functionally Biomedical or Dental Material Enzyme www. ifomis. org 100
The Hydraulic Equation BP = CO*PVR arterial blood pressure is directly proportional to the product of blood flow (cardiac output, CO) and peripheral vascular resistance (PVR) www. ifomis. org 101
Confusion of Ontology and Epistemology blood pressure is an Organism Function, cardiac output is a Laboratory or Test Result or Diagnostic Procedure BP = CO*PVR thus asserts that blood pressure is proportional either to a laboratory or test result or to a diagnostic procedure www. ifomis. org 102
www. ifomis. org Fragment of UMLS SN 103
UMLS Semantic Network anatomical abnormality associated_with daily or recreational activity educational activity associated with pathologic function bacterium causes experimental model of disease www. ifomis. org 104
www. ifomis. org 105
GO: the Gene Ontology 3 large telephone directories of standardized designations for gene functions and products organized into hierarchies via is_a and part_of www. ifomis. org 106
When a gene is identified three important types of questions need to be addressed: 1. Where is it located in the cell? 2. What functions does it have on the molecular level? 3. To what biological processes do these functions contribute? www. ifomis. org 107
GO’s three ontologies biological processes molecular functions cellular components www. ifomis. org 108
GO is three ontologies cellular components molecular functions biological processes December 16, 2003: 1372 component terms 7271 function terms 8069 process terms www. ifomis. org 109
The Cellular Component Ontology (counterpart of anatomy) flagellum chromosome membrane cell wall nucleus www. ifomis. org 110
The Molecular Function Ontology ice nucleation protein stabilization kinase activity binding The Molecular Function ontology is (roughly) an ontology of actions on the molecular level of granularity www. ifomis. org 111
Biological Process Ontology Examples: glycolysis death adult walking behavior response to blue light = occurrents on the level of granularity of cells, organs and whole organisms www. ifomis. org 112
Each of GO’s ontologies is organized in a graph-theoretical structure involving two sorts of links or edges: is-a (= is a subtype of ) (copulation is-a biological process) part-of (cell wall part-of cell) www. ifomis. org 113
www. ifomis. org 114
GO is species-independent an ontology of the unchanging universal building blocks of life (substances and processes) and of the structures they form www. ifomis. org 115
www. ifomis. org 116
The Gene Ontology error prone in part because of its sloppy treatment of relations menopause part_of death www. ifomis. org 117
www. ifomis. org 118
Primary aim of GO not rigorous definition and principled classification but rather: providing a practically useful framework for keeping track of the biological annotations that are applied to gene products www. ifomis. org 119
Problem’s with GO Molecular Functions anti-coagulant activity (defined as: “a substance that retards or prevents coagulation”) enzyme activity (defined as: “a substance that catalyzes”) structural molecule (defined as: “the action of a molecule that contributes to structural integrity”) www. ifomis. org 120
GO: 0005199: structural constituent of cell wall Definition: The action of a molecule that contributes to the structural integrity of a cell wall. confuses actions, which GO includes in its function ontology, with constituents, which GO includes in its cellular component ontology www. ifomis. org 121
www. ifomis. org 122
www. ifomis. org 123
cars red cars www. ifomis. org Cadillacs cars with radios 124
Why do these problems arise? Because GO has no clear formal understanding of the role of relations in organizing an ontology (thus also no clear understanding of the difference between a function and the activity which is the realization of a function – GO runs these two together) www. ifomis. org 125
Thesis GO can realize its goal more adequately (and avoid many coding errors) by taking ontology (especially the logic of classifications and definitions) seriously www. ifomis. org 126
Digital Anatomist Foundational Model of Anatomy (Department of Biological Structure, University of Washington, Seattle) The first crack in the wall of the Concept Orientation www. ifomis. org 127
www. ifomis. org 128
Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Parietal Pleura www. ifomis. org Organ Component Mediastinal Pleura Organ Part Organ Subdivision Pleural Sac Pleural Cavity Interlobar recess is_a Tissue Pleura(Wall of Sac) pa rt_ of Anatomical Space Visceral Pleura Mesothelium of Pleura 129
Pleural Sac Mediastinal Pleura Mesothelium of Pleura of Interlobar recess Visceral Pleura rt_ Parietal Pleura(Wall of Sac) pa Pleural Cavity Tissue Cell Organelle www. ifomis. org Reference Ontology for Anatomy at every level of granularity 130
The Gene Ontology The second crack in the wall European Bioinformatics Institute, . . . Open source Transgranular Cross-Species Components, Processes, Functions www. ifomis. org 131
But: No logical structure Viciously circular definitions Poor rules for coding, definitions, treatment of relations, classifications so highly error-prone www. ifomis. org 132
New GO / OBO Reform Effort OBO = Open Biological Ontologies www. ifomis. org 133
OBO Library Gene Ontology MGED Ontology Cell Ontology Disease Ontology Sequence Ontology Fungal Ontology Plant Ontology Mouse Anatomy Ontology Mouse Development Ontology. . . www. ifomis. org 134
coupled with Relations Ontology (IFOMIS) suite of relations for biomedical ontology to be submitted to CEN as basis for standardization of biomedical ontologies + alignment of FMA and GALEN www. ifomis. org 135
www. ifomis. org 136
ENDE www. ifomis. org 137
- Slides: 137