ECO R European Centre for Ontological Research Application

  • Slides: 56
Download presentation
ECO R European Centre for Ontological Research Application of Ontology in Cancer Bioinformatics. Dr.

ECO R European Centre for Ontological Research Application of Ontology in Cancer Bioinformatics. Dr. Werner Ceusters, MD Executive Director European Centre for Ontological Research Saarland University Saarbrücken, Germany

ECO R European Centre for Ontological Research 11 th World Conference on Medical Informatics

ECO R European Centre for Ontological Research 11 th World Conference on Medical Informatics San Francisco 7 -11/9/2004 • 759 papers • 48 contain word “bioinformatics” • 124 contain “cancer” • 1 contains “cancer bioinformatics” • But: about 50 deal with cancer bioinformatics • 89 contain “ontology”

ECO R European Centre for Ontological Research • • • Ontology related Cancer Bioinformatics

ECO R European Centre for Ontological Research • • • Ontology related Cancer Bioinformatics at MEDINFO 2004 A Log Likelihood Predictor for Genomic Classification of Oral Cancer using Principle Component Analysis for Feature Selection Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development A Text Mining Approach to Enable Detection of Candidate Risk Factors Cancer-related Complementary and Alternative Medicine Online: Factors Affecting Information Retrieval (by patients) Development of the ICNP based cancer nursing information system NCI Thesaurus: Using Science-Based Terminology to Integrate Cancer Research Results Extraction of Diagnosis Related Terminological Info from Discharge Summary Automated Clinical Annotation of Tissue Bank Specimens Mining OMIM for Insight into Complex Diseases A new parameter enhancing breast cancer detection in computer aided diagnosis of X-ray mammograms Tools for the Performance of Clinical Trials Research Formal Representation of Medical Goals for Medical Guidelines

ECO R European Centre for Ontological Research Goals of Cancer Bioinformatics • To integrate

ECO R European Centre for Ontological Research Goals of Cancer Bioinformatics • To integrate molecular, biological and clinical knowledge about cancer with analytic methods from bioinformatics. • The ultimate aim is to create comprehensive prognostic and predictive models as aids to diagnosis, treatment and the design of new therapeutics.

ECO Task descriptions R • Sequence similarity searching European Centre for Ontological Research •

ECO Task descriptions R • Sequence similarity searching European Centre for Ontological Research • • • • – Nucleic acid vs nucleic acid 28 – Protein vs protein 39 – Translated nucleic acid vs protein 6 – Unspecified sequence type 29 – Search for non-coding DNA 9 Functional motif searching 35 Sequence retrieval 27 Multiple sequence alignment 21 Restriction mapping 19 Secondary and tertiary structure prediction 14 Other DNA analysis including translation 14 Primer design 12 ORF analysis 11 Literature searching 10 Phylogenetic analysis 9 Protein analysis 10 Sequence assembly 8 Location of expression 7 Miscellaneous 7 Stevens R, Goble C, Baker P, and Brass A. A Classification of Tasks in Bioinformatics 2001: 17 (2): 180 -188.

ECO R European Centre for Ontological Research Three major challenges • Analyse massive amounts

ECO R European Centre for Ontological Research Three major challenges • Analyse massive amounts of data: – Eg: high throughput technologies based upon c. DNA or oligonucleotide microarrays for analysis of gene expression, analysis of sequence polymorphisms and mutations, and sequencing • Appropriately link clinical histories to molecular or other biomarker data generated by genomic and proteomic technologies. • Development of user-friendly computer-based platforms – that can be accessed and utilized by the average researcher for searching, retrieval, manipulation, and analysis of information from large-scale datasets

ECO R European Centre for Ontological Research Words of Wisdom • “Ontology” is too

ECO R European Centre for Ontological Research Words of Wisdom • “Ontology” is too often not taken seriously, and only few people understand that. But there is hope: – The promise of Web Services, augmented with the Semantic Web, is to provide THE major solution for integration, the largest IT cost / sector, at $ 500 BN/year. The Web Services and Semantic Web trends are heading for a major failure (i. e. , the most recent Silver Bullet). In reality, Web Services, as a technology, is in its infancy. . There is no technical solution (i. e. , no basis) other than fantasy for the rest of the Web Services story. Analyst claims of maturity and adoption (. . . ) are already false. . Verizon must understand it so as not to invest too heavily in technologies that will fail or that will not produce a reasonable ROI. Dr. Michael L. Brodie, Chief Scientist, Verizon IT

ECO R Setup of this presentation European Centre for Ontological Research • Look at

ECO R Setup of this presentation European Centre for Ontological Research • Look at some popular views, statements, claims, systems, beliefs, . . . about “ontology”, and indicate where and how they fail to do justice to what ontology is actually about; • Explain the basics of the principled approach that we use and give examples of practical applications; • Some comments on the future of ontology in Buffalo and the US.

ECO R System Integrationapproaches Data Integration European Centre for Ontological Research 1. 2. 3.

ECO R System Integrationapproaches Data Integration European Centre for Ontological Research 1. 2. 3. 4. 4. 5. 6. Data Warehousing : Data from various data sources are converted, merged and stored in a centralized DBMS. (Examples) Integrated Genomic Database Hyperlinking approaches: Where links are set up between related information and data sources. SRS, Entrez (NCBI) Standardization: Efforts which address the need for a common metadata model for various application domains. Integration systems: Systems that can gather and integrate information from multiple sources. Some of these systems have a Mediator-Wrapper Architecture others are language based systems like Bio-Kleisli. Federated Database: Cooperating, yet autonomous, databases map their individual schema’s to a single global schema. Operations are preformed against the federated schema. Steve Brady

ECO R Data integration approaches European Centre for Ontological Research at least, the beginnings

ECO R Data integration approaches European Centre for Ontological Research at least, the beginnings of. . . • • • Protein interaction databases Small molecule databases Genome databases Pathway databases Protein databases Enzyme databases Gene Ontology

ECO R European Centre for Ontological Research GO deals with basic ontological notions very

ECO R European Centre for Ontological Research GO deals with basic ontological notions very haphazardly • GO’s three main term-hierarchies are: • component, function and process • But GO confuses functions with structures, and also with executions of functions • and has no clear account of the relation between functions and processes

ECO R European Centre for Ontological Research A flavour of ontology <!-- ******************************** Description

ECO R European Centre for Ontological Research A flavour of ontology <!-- ******************************** Description of a location in a lipid bilayer membrane Field description for BIND-membrane – – – not-specified = somewhere in membrane outer-surface = on the outer surface of the membrane within = within the bilayer inner-surface = on the inner surface of the membrane lumen = in the lumen that the membrane surrounds ******************************** --> <!ELEMENT BIND-membrane %ENUM; > <!ATTLIST BIND-membrane value ( not-specified | outersurface | within | inner-surface | lumen ) #REQUIRED >

ECO R Mereo-topology European Centre for Ontological Research HASOVERLAPPING -REGION HASPARTIALSPATIALOVERLAP ISSPATIAL -PARTOF ISPROPERSPAT.

ECO R Mereo-topology European Centre for Ontological Research HASOVERLAPPING -REGION HASPARTIALSPATIALOVERLAP ISSPATIAL -PARTOF ISPROPERSPAT. PART-OF HAS-DISCRETEDREGION HASSPATIAL -PART HASPROPERSPATIAL -PART HAS-SPATIALPOINTREFERENCE HASCONNECTINGREGION HASDISCONNECTEDREGION HASEXTERNALIS-NONCONNECTINGTANG. IS-TANG. REGION SPAT. - IS-SPAT. HAS-NON- HASPART-OF -EQUIV. TANG. -OF SPAT. PART IS-PARTLY- IS-INSIDE IN-CONVEX- -CONVEX -HULL-OF ISHULL-OF OUTSIDECONVEXHULL-OF ISIS-GEOINSIDE- TOPOINSIDEOF OF

ECO R European Centre for Ontological Research ca. CORE: The NCICB Cancer Informatics Infrastructure

ECO R European Centre for Ontological Research ca. CORE: The NCICB Cancer Informatics Infrastructure Backbone cancer Bioinformatics Infrastructure Objects : Biomedical objects to facilitate the communication and integration of information from the various initiatives supported by the NCICB cancer Data Standards Repository: meta-data used for cancer research NCI Enterprise Vocabulary Services : standard vocabularies for a variety of settings in the life sciences

ECO R European Centre for Ontological Research ca. BIO architecture Connectivity at programming interface

ECO R European Centre for Ontological Research ca. BIO architecture Connectivity at programming interface level, NOT content

ECO R European Centre for Ontological Research Co. Me. DIAS (France)

ECO R European Centre for Ontological Research Co. Me. DIAS (France)

ECO R European Centre for Ontological Research Genes. Trace. TM: Biological Knowledge Discovery via

ECO R European Centre for Ontological Research Genes. Trace. TM: Biological Knowledge Discovery via Structured Terminology

ECO R European Centre for Ontological Research But. . Talking to each other does

ECO R European Centre for Ontological Research But. . Talking to each other does not mean Understanding each other

ECO R Pray your computer isn’t Irish. . . European Centre for Ontological Research

ECO R Pray your computer isn’t Irish. . . European Centre for Ontological Research X: “Hallo stranger, you appear to be traveling? ” Y: “Yes, I always travel when on a journey. ” X: “And pray, what might your name be? ” Y: “It might be Sam Patch, but it isn't. ” X: “Have you been long in these parts? ” Y: “Never longer than at present— 5 feet 9. ” X: “Do you get anything new? ” Y: “Yes, I bought a new whetstone this morning. ” Copyright © 1996 Electronic Historical Publications

ECO R European Centre for Ontological Research Cancer Data Standards Repository (ca. DSR) •

ECO R European Centre for Ontological Research Cancer Data Standards Repository (ca. DSR) • One of the problems confronting the biomedical data management community is the panoply of ways that similar or identical concepts are described. • Amen !? • But more appropriate would it be to say: – THE problem confronting the biomedical data management community is that concepts are described.

ECO R Triadic models of meaning: European Centre for Ontological Research The Semiotic/Semantic triangle

ECO R Triadic models of meaning: European Centre for Ontological Research The Semiotic/Semantic triangle Reference: Concept / Sense / Model / View Sign: Language/ Term/ Symbol Referent: Reality/ Object

ECO R European Centre for Ontological Research “Ontology” • In Information Science: – “An

ECO R European Centre for Ontological Research “Ontology” • In Information Science: – “An ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. ” • In Philosophy: – “Ontology is the science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality. ”

ECO R European Centre for Ontological Research Why are concepts not enough? • Why

ECO R European Centre for Ontological Research Why are concepts not enough? • Why must our theory address also the referents in reality? – Because referents are observable fixed points in relation to which we can work out how the concepts used by different communities relate to each other ; – Because only by looking at referents can we establish the degree to which concepts are good for their purpose.

ECO NCI Enterprice Vocabulary R European Centre for Ontological Research Services environment

ECO NCI Enterprice Vocabulary R European Centre for Ontological Research Services environment

ECO R European Centre for Ontological Research NCI Thesaurus • a biomedical thesaurus created

ECO R European Centre for Ontological Research NCI Thesaurus • a biomedical thesaurus created specifically to meet the needs of the NCI • semantically modeled cancer-related terminology built using description logic

ECO R European Centre for Ontological Research Why description logics are not enough SNOMED-RT

ECO R European Centre for Ontological Research Why description logics are not enough SNOMED-RT (2000) SNOMED-CT (2003)

ECO R European Centre for Ontological Research Underspecification new-1 new-2

ECO R European Centre for Ontological Research Underspecification new-1 new-2

ECO R Use of description logics does not European Centre for Ontological Research guarantee

ECO R Use of description logics does not European Centre for Ontological Research guarantee correct representations !

ECO R European Centre for Ontological Research It’s not just a problem in Healthcare

ECO R European Centre for Ontological Research It’s not just a problem in Healthcare Ontologies for Legal Information Serving and Knowledge Management Joost Breuker, Abdullatif Elhag, Emil Petkov and Radboud Winkels

ECO R European Centre for Ontological Research Ontology versus Description Logics • In the

ECO R European Centre for Ontological Research Ontology versus Description Logics • In the Description Logic world – terms and definitions come first, – the job is to validate them and reason with them • In the realist ontology world – robust ontology (with all its reasoning power) comes first – and terms and term-hierarchies must be subjected to the constraints of ontological coherence

ECO R European Centre for Ontological Research Search for “cancer”

ECO R European Centre for Ontological Research Search for “cancer”

ECO R NCI Thesaurus Root concepts European Centre for Ontological Research Substance ? know

ECO R NCI Thesaurus Root concepts European Centre for Ontological Research Substance ? know If yes, towhy is gene Anatomic Or ? Does Structure, the NCI not Anatomic System, which category or product subsumed it ? If no, ? why are Anatomic Any itemnot classified Substance there ? by belongs drugs and chemicals not subsumed by it ?

ECO R European Centre for Ontological Research Conceptual entity • Definition: none • Semantic

ECO R European Centre for Ontological Research Conceptual entity • Definition: none • Semantic type: – Conceptual entity – Classification • Subconcepts: – Action: • definition: action; a thing done – And: • Definition: an article which expresses the relation of connection or addition, used to conjoin a word with a word, . . . – Classification • Definition: the grouping of things into classes or categories

ECO R Definition of “cancer gene” European Centre for Ontological Research

ECO R Definition of “cancer gene” European Centre for Ontological Research

ECO R NCI Thesaurus architecture European Centre for Ontological Research Findings-And. Disorders-Kind Anatomy-Kind What

ECO R NCI Thesaurus architecture European Centre for Ontological Research Findings-And. Disorders-Kind Anatomy-Kind What diseases have a diameter of over 3 cm ? Disease ISA “Kinds” “Associative” “Formal restrict the relationships subsumption” domain andproviding range or of associative “inheritance” “differentiae” relationships Breast neoplasm Breast Disease-has-associated-anatomy

ECO R Problems with C - rel - C European Centre for Ontological Research

ECO R Problems with C - rel - C European Centre for Ontological Research • Ad hoc readings of statements of the type C 1 -relationship. C 2 – Human has-part head // Human has-part finger – California is-part-of United States // California isa name – labial vein isa vein of head // labial vein isa vulval vein • Concepts not necessarily correspond to something that (will) exist(ed) – Sorcerer, unicorn, leprechaun, . . . • Definitions set the conditions under which terms may be used, and may not be abused as conditions an entity must satisfy to be what it is • Language can make strings of words look as if it were terms – “Middle lobe of left lung”

ECO R European Centre for Ontological Research NCI Metathesaurus • based on NLM's Unified

ECO R European Centre for Ontological Research NCI Metathesaurus • based on NLM's Unified Medical Language System Metathesaurus supplemented with additional cancer-centric vocabulary • a database of many biomedical terminologies, mapped where possible to NCI Thesaurus terms and shared conceptual meanings

ECO R NCI and Partner Data Sources European Centre for Ontological Research • SAGE

ECO R NCI and Partner Data Sources European Centre for Ontological Research • SAGE Data (CGAP) – NCI and Duke university SAGE experiment data • Expression Measurements (NCICB GEDP) - Probe sets • Sequence Trace Files (GAI) - EST traces and full-length m. RNA clone traces • Genetic Annotation Initiative (GAI) - SNPs • Sequence Verified Clones (as of ca. BIO version 2. 0) (NCICB internal pre-processed) - Human and mouse sequence-verified clone information • Cancer Clinical Trials (NCI CTEP and PDQ) - Trials and drug agent information • CMAP Annotation Data (CMAP) - Drug targets, anomalies • Cancer Vocabulary (NCI) - Cancer related terminology and concepts

ECO R European Centre for Ontological Research External Data Sources • Unigene (NCBI) -

ECO R European Centre for Ontological Research External Data Sources • Unigene (NCBI) - Human and mouse genes, sequences, map locations, clones, proteins and protein homologs • Homologene (NCBI) - Human and mouse gene homologs • Locus. Link (NCBI) - Genes, gene ontologies, gene aliases, taxons • Ref. Seq (NCBI) - Reference sequences • EST Data (NCICB) - Tissue-specific expression level ESTs • c. DNA library information (NCICB) - c. DNA libraries for disease and tissue • Human Genome via UCSC DAS server (UCSC) Genomic sequences, annotations, and map coordinates • Bio. Carta (Bio. Carta) - Pathways • Gene Ontology - Hierarchy of gene functions

ECO R European Centre for Ontological Research Metathesaurus traps UMLS example

ECO R European Centre for Ontological Research Metathesaurus traps UMLS example

ECO R European Centre for Ontological Research IFOMIS: Institute for Formal Ontology and Medical

ECO R European Centre for Ontological Research IFOMIS: Institute for Formal Ontology and Medical Information Science The Institute for Formal Ontology and Medical Information Science was founded in April 2002 as part of the Faculty of Medicine of the University of Leipzig utilizing a grant of the Alexander von Humboldt Foundation. It comprehends an interdisciplinary research group with members from Philosophy, Computer and Information Science, Logic, Medicine, and Medical Informatics. IFOMIS established itself as a center of theoretically grounded research in both formal and applied ontology. Its goal is to develop a formal ontology that will be applied and tested in the domain of medical and biomedical information science. In August 2004 IFOMIS moved its base of operations from Leipzig to Saarland University in Saarbrücken. IFOMIS Universität des Saarlandes Postfach 151150 D-66041 Saarbrücken Germany Secretariat Tel. : +49 (0)681 -302 -64770 Fax: +49 (0)681 -302 -64772

ECO R European Centre for Ontological Research IFOMIS’s long-term goal • Build a robust

ECO R European Centre for Ontological Research IFOMIS’s long-term goal • Build a robust high-level BFO-Med. O framework • THE WORLD’S FIRST INDUSTRIALSTRENGTH PHILOSOPHY • which can serve as the basis for an ontologically coherent unification of medical knowledge and terminology

ECO R European Centre for Ontological Research IFOMIS’ research in Formal Ontology • Formal

ECO R European Centre for Ontological Research IFOMIS’ research in Formal Ontology • Formal treatment of universals, individuals, endurants, perdurants, scales, functions, collections, . . . • Universals / Concepts • Meriology and topology • Vagueness and granularity • Applicability to domain ontologies, terminologies, . . .

ECO R European Centre for Ontological Research Reference Ontology • a theory of a

ECO R European Centre for Ontological Research Reference Ontology • a theory of a domain of entities in the world • based on realizing the goals of maximal expressiveness and adequacy to reality • sacrificing computational tractability for the sake of representational adequacy

ECO R European Centre for Ontological Research Basic Ontological Notions • Identity – How

ECO R European Centre for Ontological Research Basic Ontological Notions • Identity – How are instances of a class distinguished from each other • Unity – How are all the parts of an instance isolated • Essence – Can a property change over time • Dependence – Can an entity exist without some others

ECO R (Simplified) Logic of classes European Centre for Ontological Research • primitive: –

ECO R (Simplified) Logic of classes European Centre for Ontological Research • primitive: – entities: particulars versus universals – relation inst such that: • all classes are universals; all instances are particulars • some universals are not classes, hence have no instances: pet, adult, physician • some particulars are not instances; e. g. some mereological sums • subsumption defined resorting to instances:

ECO R European Centre for Ontological Research Basic Formal Ontology consists in a series

ECO R European Centre for Ontological Research Basic Formal Ontology consists in a series of sub-ontologies (most properly conceived as a series of perspectives on reality), the most important of which are: – Snap. BFO, a series of snapshot ontologies (Oti ), indexed by times: continuants – Span. BFO a single videoscopic ontology (Ov): occurants. Each Oti is an inventory of all entities existing at a time. Ov is an inventory (processory) of all processes unfolding through time.

ECO R European Centre for Ontological Research Occurants and continuants Picture by Vladimir Brajic

ECO R European Centre for Ontological Research Occurants and continuants Picture by Vladimir Brajic

ECO R European Centre for Ontological Research

ECO R European Centre for Ontological Research

ECO R Levels of granularity in biomedical ontology European Centre for Ontological Research Granularity

ECO R Levels of granularity in biomedical ontology European Centre for Ontological Research Granularity level Continuants Occurrents Population environment screening Person Race, age, disease, symptom ADL, working, treatment, prevention Organ Liver, lung, organ part, sign Heart beat, digestion, surgery Tissue Elasticity, Turgor, Strength Resorption, protection Cell Bone cell, Alveolar cell Cell size, bacterium Fagocytosis, Cell growth, Reparation, hormone production Subcellular Cell membrane, Protein DNA, Oncogene, Protooncogene, Virus, oncogenic molecule Transcription Splicing Mutation Gene regulation

ECO R European Centre for Ontological Research Missed subsumption detection in SNOMED-CT Missing: ISA

ECO R European Centre for Ontological Research Missed subsumption detection in SNOMED-CT Missing: ISA neoplasm of heart

ECO R Correction of MGED’s ontology upper part European Centre for Ontological Research MGEDOntology

ECO R Correction of MGED’s ontology upper part European Centre for Ontological Research MGEDOntology Sub. Class. Of MGEDCore. Ontology Sub. Class. Of Bio. Material Package Sub. Class Of Cancer Site Instance. Of Bio. Material Characteristics Sub. Class. Of Organism. Part the organism part in which additional tumors are identified remote from the primary site Primary site Instance. Of has_cancer_site has-class one-of Disease. Location Sub. Class Of The MGED Ontology is a top level container for the MGEDCore. Ontology and the MGEDExtended. Ontology. The MGED ontology describes microarray experimentsand is split into the MGEDCore. Ontology, which supports MAGE-OM v 1. 0 and is organized consistently with MAGE, and the MGEDExtended. Ontology, which expands MAGE v 1. 0 and contains concepts and relationships which are not included in MAGE. Metastatic site Anatomical location(s) of disease.

ECO Text mining and classification R European Centre for Ontological Research Generalised Possession Human

ECO Text mining and classification R European Centre for Ontological Research Generalised Possession Human Haspossessor 1 2 IS-A 1 IS-A Haspossessed Healthcare phenomenon 1 Having a healthcare phenomenon 2 IS-A Is-possessor-of Patient 3 Has-Healthcarephenomenon IS-A Cancer patient Malignant neoplasm IS-A 3 lung carcinoma Mr. Smith has a pulmonary carcinoma

ECO R European Centre for Ontological Research The near future: International Cancer Ontology Project

ECO R European Centre for Ontological Research The near future: International Cancer Ontology Project • Healthcare Informatics call 6 th FP of EU • Applying realist ontology to: – Connect relevant databases for combatting cancer, • covering all levels of granularity (from molecules to entire patients) at deep semantic level • Independent of the dataformat (text, structured, coded, . . . )

ECO R Knowledge discovery and use European Centre for Ontological Research

ECO R Knowledge discovery and use European Centre for Ontological Research

ECO R Towards a US-based “X”CORs European Centre for Ontological Research • BCOR: Buffalo

ECO R Towards a US-based “X”CORs European Centre for Ontological Research • BCOR: Buffalo Centre for Ontological Research • NCOR: National Centre for Ontological Research – Involving Stanford • Introducing realist ontology (as a sound analytical philosophical discipline) to improve ontologies (as representations).