How to Build an Ontology Barry Smith http
How to Build an Ontology Barry Smith http: //ontology. buffalo. edu/smith 1
Mission of the NCBO To create software and support services for science-based ontology development and use in the biomedical domain Science-based = ontologies for support of scientific research (taken as encompassing evidence-based medicine) Science-based = using the scientific method as part of the process of ontology development and testing 2
Scientific ontologies have special features Every term in a scientific ontology must be such that the developers of the ontology believe it to refer to some entity on the basis of the best current evidence 5
For scientific ontologies reusability is crucial compatibility with neighboring scientific ontologies is crucial it should not be too easy to add new terms to an ontology we want to introduce these features in clinical medicine. . . 6
An Ontological Square Upper-level integrating ontologies Domain ontologies 10
An Ontological Square Upper-level integrating ontologies Domain ontologies Ontologies in support of science Administrative ontologies 11
An Ontological Square Upper-level integrating ontologies Domain ontologies Ontologies in support of science BFO (Basic Formal SNOMED Ontology) Swiss. Prot DOLCE FMA Administrative ontologies (for ecommerce, etc. ) FOAF top level: person, topic, document, primary topic. . . Amazon. com ontology Library of Congress Catalog 12
Problem of ensuring sensible cooperation in a massively interdisciplinary community concept type instance model representation data 13
from Handbook of Ontology (Semantic Web approach) Retail. Price has. A Denomination Instance. Of Dollar (p. 101) SI-Unit instanceof System-of-Units (p. 40) 14
from: Ontological Engineering (Semantic Web approach) location =def. a spatial point identified by a name (p. 12) arrival. Place =def. a journey ends at a location (p. 13) facet =def. ternary relation that holds between a frame, a slot, and the facet (p. 51) 15
Entity =def anything which exists, including things and processes, functions and qualities, beliefs and actions, documents and software (Levels 1, 2 and 3) 16
First basic distinction universal vs. instance (science text vs. diary) (man vs. Maximilian) 17
Instances databases For scientific ontologies it is generalizations that are important = universals, types, kinds, species 18
Catalog vs. inventory A B C 515287 521683 521682 DC 3300 Dust Collector Fan Gilmer Belt Motor Drive Belt 19
Catalog vs. inventory 20
Catalog of Universals/Types 21
Ontology Universals Instances 22
Ontology = A Representation of Universals 23
Each node of an ontology consists of: • preferred term (aka term) • term identifier (TUI, aka CUI) • synonyms • definition, glosses, comments Ontology = A representation of universals 24
An ontology is a representation of universals We learn about universals in reality from looking at the results of scientific experiments in the form of scientific theories – which describe not what is particular in reality but what is general 25
universals substance organism animal mammal cat siamese leaf class frog instances 26
Domain =def a portion of reality that forms the subjectmatter of a single science or technology or mode of study or administrative practice. . . ; proteomics HIV epidemiology 27
Representation =def an image, idea, map, picture, name or description. . . of some entity or entities. 28
Ontologies are representational artifacts comparable to science texts 29
The Periodic Table 33
Ontologies are here 34
or here 35
What do ontologies represent? 36
Ontologies do not represent concepts in people’s heads 37
They represent universals in reality 38
“leg” is not the name of a concepts do not stand in part_of connectedness causes treats. . . relations to each other 39
instances A B C 515287 521683 521682 DC 3300 Dust Collector Fan Gilmer Belt Motor Drive Belt universals 40
Inventory vs. Catalog: Two kinds of composite representational artifacts Databases represent instances Ontologies represent universals 41
How do we know which general terms designate universals? Roughly: terms used by scientists to designate entities about which we have a plurality of different kinds of testable proposition (cell, electron. . . ) 42
Problem: fiat demarcations male over 30 years of age with family history of diabetes abnormal curvature of spine participant in trial #2030 43
Problem: roles fist patient FDA-approved drug 44
Administrative ontologies often need to go beyond universals Fall on stairs or ladders in water transport injuring occupant of small boat, unpowered Railway accident involving collision with rolling stock and injuring pedal cyclist Nontraffic accident involving motor-driven snow vehicle injuring pedestrian 45
universals vs. classes universals {a, b, c, . . . } classes 46
Class =def a maximal collection of particulars determined by a general term (‘cell’. ‘electron’), (‘ ‘restaurant in Palo Alto’, ‘Italian’) the class A = the collection of all particulars x for which ‘x is A’ is true 47
Problem The same general term can be used to refer both to universals and to collections of particulars. Consider: HIV is an infectious retrovirus HIV is spreading very rapidly through Asia 48
universals vs. classes universals {c, d, e, . . . } classes 49
Extension =def The extension of a universal A is the class: instance of the universal A (it is the class of A’s instances) (the class of all entities to which the term ‘A’ applies) 50
universals vs. classes universals defined classes 51
universals vs. classes universals populations, . . . 52
Defined class =def a class defined by a general term which does not designate a universal the class of all diabetic patients in Leipzig on 4 June 1952 53
OWL is a good representation of defined classes • sibling of Finnish spy • member of Abba aged > 50 years 54
Terminology =def. a representational artifact whose representational units are natural language terms (with IDs, synonyms, comments, etc. ) which are intended to designate universals together with defined classes. 55
universals, classes, concepts universals defined classes ‘concepts’ 56
universals < defined classes < ‘concepts’ which do not correspond to defined classes: ‘Surgical or other procedure not carried out because of patient's decision’ ‘Absent nipple’ 57
(Scientific) Ontology =def. a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent 1. universals in reality 2. those relations between these universals which obtain universally (= for all instances) lung is_a anatomical structure lobe of lung part_of lung 58
Part II How to Build an Ontology 59
How to build an ontology work with scientists to create an initial top-level classification find ~50 most commonly used terms corresponding to universals in reality arrange these terms into an informal is_a hierarchy according to this Universality principle A is_a B every instance of A is an instance of B fill in missing terms to give a complete hierarchy (leave it to domain scientists to populate the lower levels of the hierarchy) 60
Principle of Low Hanging Fruit Include even absolutely trivial assertions (assertions you know to be universally true) pneumococcal virus is_a virus Computers need to be led by the hand 61
Me. SH Descriptors Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (Me. SH Category) Social Sciences Political Systems National Socialism is_a Anthropology. . . 62
Principle Use singular nouns Terms in ontologies represent universals 63
Goal: Each term in an ontology represents exactly one universal there are universals also of collectivities: population complex of cells 64
the use-mention confusion Conceptual Entities =Def. An organizational header for concepts representing mostly abstract entities. swimming is healthy and has eight letters 65
Principle Avoid confusing between words and things Avoid confusing between concepts in our minds and entities in reality Recommendation: avoid the word ‘concept’ entirely 66
Trialbank ‘information’ = def. ‘a written or spoken designation of a concept’ 67
Trialbank ‘Heparin therapy’ is an instance of ‘written or spoken designation of a concept’ What are the problems here? 1. misuse of quotation marks 2. confusion of instances and universals 3. confusion of concept and reality 68
Plant Ontology cell = def. plant cell, consisting of protoplast and cell wall; . . . 69
Principle For the sake of interoperability with other ontologies, do not give special meanings to terms with established general meanings (Don’t use ‘cell’ when you mean ‘plant cell’) 70
ICNP: International Classification of Nursing Procedures water =def. a type of Nursing Phenomenon of Physical Environment with the specific characteristics: clear liquid compound of hydrogen and oxygen that is essential for most plant and animal life influencing life and development of human beings. 71
Principle Supply definitions wherever possible (both human-understandable natural language definitions, and equivalent formal definitions) 72
Principle Each term should have at most one definition* *which may have both natural-language and formal versions 73
The Problem of Circularity A Person = def. A person with an identity document cell = def. plant cell, consisting of protoplast and cell wall; . . . 74
Principle Avoid circular definitions (The term defined should not appear in its own definition) 75
HL 7 ‘stopping a medication’ = def. change of state in the record of a Substance Administration Act from Active to Aborted 76
Principle A definition should use terms which are easier to understand than the term defined (HL 7 creates a topsy turvy world, in which simple things are made difficult) 77
Principle Use Aristotelian definitions An A is a B which C’s. 78
Principle Do not seek to define everything 79
In every ontology some terms and some relations are primitive = they cannot be defined (on pain of infinite regress) Examples of primitive relations: identity instance_of 80
Rules formatting terms • Avoid abbreviations even when it is clear in context what they mean (‘breast’ for ‘breast tumor’) • Avoid acronyms • Avoid mass terms (‘tissue’, ‘brain mapping’, ‘clinical research’. . . ) • Treat each term ‘A’ in an ontology is shorthand for a term of the form ‘the universal A’ 83
Univocity Terms should have the same meanings on every occasion of use. (They should refer to the same universals) Basic ontological relations such as is_a and part_of should be used in the same way by all ontologies 84
Universality Ontologies should include only those relational assertions which hold universally pneumococcal virus causes pneumonia 85
Universality Often, order will matter: We can assert adult transformation_of child but not child transforms_into adult 86
Universality viral pneumonia caused by virus but not virus causes pneumonia pneumococcal virus causes pneumonia 87
Universality protocol-design earlier_than results analysis but not results analysis later_than protocol-design 88
Positivity Complements of universals are not themselves universals. Terms such as non-mammal non-membrane other metalworker in New Zealand do not designate universals in reality 89
Ontology of universals logic of terms There are no conjunctive and disjunctive universals: anatomic structure, system, or substance musculoskeletal and connective tissue disorder rheumatism, excluding the back 90
Objectivity Which universals exist in reality is not a function of our knowledge. Terms such as unknown unclassified unlocalized arthropathies not otherwise specified do not designate universals in reality. 91
Keep Epistemology Separate from Ontology If you want to say that We do not know where A’s are located do not invent a new class of A’s with unknown locations (A well-constructed ontology should grow linearly; it should not need to delete classes or relations because of increases in knowledge) 92
Keep Sentences Separate from Terms If you want to say I surmise that this is a case of pneumonia do not invent a new class of surmised pneumonias 93
Single Inheritance No kind in a classificatory hierarchy should have more than one is_a parent on the immediate higher level 94
Multiple Inheritance thing car blue thing is_a blue car 95
Multiple Inheritance is a source of errors encourages laziness serves as obstacle to integration with neighboring ontologies hampers use of Aristotelian methodology for defining terms hampers use of statistical search tools 96
Multiple Inheritance thing blue thing car is_a 1 is_a 2 blue car 97
is_a Overloading The success of ontology alignment demands that ontological relations (is_a, part_of, . . . ) have the same meanings in the different ontologies to be aligned. 98
Compositionality The meanings of compound terms should be determined 1. by the meanings of component terms together with 2. the rules governing syntax 99
Why do we need rules/standards for good ontology? Ontologies must be intelligible both to humans (for annotation and curation) and to machines (for reasoning and error-checking): the lack of rules for classification leads to human error and blocks automatic reasoning and error-checking Intuitive rules facilitate training of curators and annotators Common rules allow alignment with other ontologies 100
ontologies are legends for cartoons 101
Randomized controlled trials http: //rctbank. ucsf. edu/ontology/outline/index. htm 102
Top-Level Class Hierarchy for RCT Root Secondary-study Trial-details Trial Concept • • Generic-concept Population-concept Protocol-concept Design-concept Outcome-concept Administrative-concept Intervention-concept 103
Trial Details Root Secondary-study Trial-details • • Erratum Publication-details Trial-entry-details Administrative-details – Secondary-administrative-details – Primary-administrative-details » Executed-administrative-details » Intended-administrative-details • Conclusion-details • Background-details – Intended-background-details – Executed-background-details • • Stopping-details Retraction-details Correction-details Fraud-details 104
Top-Most Class Hierarchy for RCT Root Secondary-study Trial-details Trial Concept • • Generic-concept Population-concept Protocol-concept Design-concept Outcome-concept Administrative-concept Intervention-concept 105
Concept • Generic-concept – – Term-information Time-entity Rule-concept Situation • Population-concept – – – Subgroup Recruitment-flowchart Population Recruitment Site-enrollment • Protocol-concept – – – – – Follow-up-compliance Follow-up-activity Follow-up Protocol-change Treatment-assignment Protocol Reason Outcomes-followup Secondary-study-protocol 106
Concept • Design-concept – – – – – Survival-analysis-and-results Statistical-analysis-and-results Sample-size-calculation Trial-design Hypothesis-concept Study-objective Study-monitoring Regression-analysis-and-results Stopping-rule • Outcome-concept – – – Special-variable-information Outcome-assessment Miscellaneous-outcome-entity Result-entity Outcome-value-entity Outcome 107
Concept • Administrative-concept – – – – Publication-concept Study-site Person Ethics Study-committee Funder Institution Registry-id • Intervention-concept – – – – Blinding-concept Compliance-details Intervention-step Intervention-arm Co-intervention Intervention Compliance-result Intervention-logic 108
Top-Level Class Hierarchy for RCT Root Secondary-study Trial-details Trial Concept • • Generic-concept Population-concept Protocol-concept Design-concept Outcome-concept Administrative-concept Intervention-concept 109
What the top level should look like 110
Two kinds of entities occurrents (processes, events, happenings) continuants (objects, qualities, states. . . ) 111
Continuants (aka endurants) have continuous existence in time preserve their identity through change exist in toto whenever they exist at all Occurrents (aka processes) have temporal parts unfold themselves in successive phases exist only in their phases 112
You are a continuant Your life is an occurrent You are 3 -dimensional Your life is 4 -dimensional 113
Dependent entities require independent continuants as their bearers There is no run without a runner There is no grin without a cat 114
Dependent vs. independent continuants Independent continuants (organisms, buildings, environments) Dependent continuants (quality, shape, role, propensity, function, status, power, right) 115
All occurrents are dependent entities They are dependent on those independent continuants which are their participants (agents, patients, media. . . ) 116
BFO Top-Level Ontology Continuant Independent Continuant Occurrent (always dependent on one or more independent continuants) Dependent Continuant 117
= A representation of top-level types Continuant Occurrent biological process Independent Continuant Dependent Continuant cell component molecular function 118
Top-Level Ontology Continuant Independent Continuant Occurrent Dependent Continuant Function Side-Effect, Stochastic Process, . . . Functioning 119
Top-Level Ontology Continuant Independent Continuant Dependent Continuant Occurrent Functioning Side-Effect, Stochastic Process, . . . Function 120
Top-Level Ontology Continuant Independent Continuant Quality Dependent Continuant Function Occurrent Functioning Side-Effect, Stochastic Process, . . . Spatial Region instances (in space and time) 121
122
123
CTO will be part of OBI Ontology of Biomedical Investigations http: //obi. sourceforge. net which is in turn part of the OBO Foundry http: //obofoundry. org 124
125
126
127
128
129
Top-Level Class Hierarchy for RCT Root Secondary-study Trial-details Trial Concept • • Generic-concept Population-concept Protocol-concept Design-concept Outcome-concept Administrative-concept Intervention-concept 132
Amended Top-Level Class Hierarchy for RCT Entity Continuant Population Protocol Design Occurrent Trial Secondary-study Intervention ? ? Trial-details ? ? Outcome-concept ? ? Administrative-concept 133
Concept • Generic-concept – Term-information – Time-entity – Rule-concept » Clinical-rule Exclusion-rule Inclusion-rule » Rule-entity Recursive-rule Base-rule » Ethnicity-language-rule » Age-gender-rule » Situation 134
135
136
Concept • Protocol-concept – – – – – Follow-up-compliance Follow-up-activity Follow-up Protocol-change Treatment-assignment Protocol Reason Outcomes-followup Secondary-study-protocol 137
Amended Top-Level Class Hierarchy for RCT Entity Continuant Protocol • Secondary-study-protocol Reason Occurrent • Treatment-assignment • Follow-up – Follow-up-activity – Outcomes-follow-up • Protocol-change 138
Concept • Population-concept – – – Subgroup Recruitment-flowchart Population Recruitment Site-enrollment 139
Entity Amended Top-Level Class Hierarchy for RCT Continuant Protocol • Secondary-study-protocol Recruitment-flowchart Reason Population • Subgroup Occurrent • Priors – Recruitment – Site-enrollment – Treatment-assignment • Follow-up – Follow-up-activity – Outcomes-follow-up • Protocol-change 140
Concept • Administrative-concept – – – – Publication-concept Study-site Person Ethics Study-committee Funder Institution Registry-id 141
Continuant • Information object – Publication – Registry-ID • Study-site • Person • Institution – Study-committee – Funder ? ? ? Ethics 142
Concept • Intervention-concept – – – – Blinding-concept Compliance-details Intervention-step Intervention-arm Co-intervention Intervention Compliance-result Intervention-logic 143
Occurrent • Intervention – – Blinding Intervention-step Intervention-arm Co-intervention • ? ? ? Intervention-logic • ? ? ? Compliance-result • ? ? ? Compliance-details 144
END 167
- Slides: 133