A merged disease ontology for Clin Gen Chris

  • Slides: 43
Download presentation
A merged disease ontology for Clin. Gen Chris Mungall Nicole Vasilevsky Melissa Haendel

A merged disease ontology for Clin. Gen Chris Mungall Nicole Vasilevsky Melissa Haendel

Clin. Gen needs for a disease ontology • Curation – Need to identify precise

Clin. Gen needs for a disease ontology • Curation – Need to identify precise disease concept • Unambiguous definitions – Disease ID must persist – Disease ID must not change semantics – Disease must be findable • Precise synonyms • Intuitive hierarchy – (not so different from GO curation) • Query/Analysis – Ability to group diseases / roll up – Clinically and biologically meaningful hierarchy – Map to other systems

Our ideal world… • A single unified ontology of all diseases • Common •

Our ideal world… • A single unified ontology of all diseases • Common • Rare • Cancer – Arranged in a meaningful graph • >> a GO for disease! • We need to curate relationships to any level in the hierarchy – E. g • Diabetes – Type II diabetes » MODY • MODY 1 • Use standard ontology annotation propagation rules to drive page content, user queries – Just as we would for GO, HPO, etc Thaxton et al

OMIM Problem: Lots of options Orph NCIT DO MESH UMLS Med. Gen ICD SNOMED

OMIM Problem: Lots of options Orph NCIT DO MESH UMLS Med. Gen ICD SNOMED ICD-0 • Why don’t we just pick one? – Which is best…?

Hierarchy OMIM Rare Common Cancer Open Tracker Standard format Difficult to summarize in one

Hierarchy OMIM Rare Common Cancer Open Tracker Standard format Difficult to summarize in one slide… Many different ways to evaluate. . . X Orph NCIT DO MESH UMLS Med. Gen ICD SNOMED ICD-0 Quality X X

Each source has a lot of unique content DOID (6971) 5254 Not mapped to

Each source has a lot of unique content DOID (6971) 5254 Not mapped to Orphanet (Disease Subset) (8820) 1717 1828 6992 Not mapped to DOID

Mappings to the rescue…? • Can we just combine all together • Use mappings

Mappings to the rescue…? • Can we just combine all together • Use mappings to traverse NCIT OMIM Med. Gen DO Decipher ICD MESH SNOMED Orphanet UMLS

Why mappings are not the solution • Duplication of concepts – Which ID to

Why mappings are not the solution • Duplication of concepts – Which ID to choose when curating? Ont 1 • Proliferation of mappings – Up to (N^2)-N sets of mappings • Even more with 3 rd party mappings – These are frequently mutually inconsistent • Mapping problems – Rarely one-to-one • Different meanings and levels of specificity • Does this mapping mean equivalent, broader, narrower, or just kinda-similar? – Often stale, abandonded Ont 2 Ont 3 Ont 4 Ont 6 Ont 5

ORDO/Orphanet (yellow) DOID (blue) 4 disease resources plus mappings: Hemolytic anemia OMIM (brown) Sub.

ORDO/Orphanet (yellow) DOID (blue) 4 disease resources plus mappings: Hemolytic anemia OMIM (brown) Sub. Class. Of (solid line) MESH (grey) Xref (dashed grey line)

Our approach • Mon. DO – Make high quality precise relationships between ontologies •

Our approach • Mon. DO – Make high quality precise relationships between ontologies • [here we include OMIM etc as “flat ontologies”] • Use curation plus bayesian-ontology inference – Use these to make a coherent merged ontology • Merge concepts we believe are truly equivalent – This avoids mega-merges – I’ll present on this • Next steps – Lessons learned, next steps, EFO-Mon. DO

Approach: k. Boom Mondo curator OWL Reasoning Probability Calculation Curate Inter-ontology precise axioms (Sub.

Approach: k. Boom Mondo curator OWL Reasoning Probability Calculation Curate Inter-ontology precise axioms (Sub. Class. Of, Equivalent. To) ont 1 We start with only a subset of curated xrefs Mon. DO kboom ont 2 ont. N Loose mappings Mapping provider

Virtuous feedback loops Mondo curator Mondo tracker community Refine Inter-ontology precise axioms (Sub. Class.

Virtuous feedback loops Mondo curator Mondo tracker community Refine Inter-ontology precise axioms (Sub. Class. Of, Equivalent. To) Mon. DO v. N+1 kboom ont 1 External Ont tracker Ont curator ont 2 ont. N Loose mappings Mapping provider

Example

Example

Example

Example

Results: merging diseases into Mon. DO “Ontology” Classes (before, after Sub. Class axioms merge)

Results: merging diseases into Mon. DO “Ontology” Classes (before, after Sub. Class axioms merge) Xrefs DOID 6878 6012 7082 36656 MESH (D) 11314 4152 19036 OMIM (D) 7783 0 31242 Orphanet (D) 8740 4683 15182 20326 OMIA 4833 3120 355 DC 209 208 310 316 Medic 0 8630 3435 39757 27617 44837 Inputs: Output: Mon. DO Held back: NCIT, SNOMED, ICD 9, GARD [note: in current version, NCIT is integrated] https: //github. com/monarch-initiative/monarch-disease-ontology

Example: mucopolysaccharidosis III Input into k. BOOM

Example: mucopolysaccharidosis III Input into k. BOOM

Resolution

Resolution

Example failed resolution – due to ontology error https: //github. com/monarch-initiative/monarch-disease-ontology/issues/99 https: //github. com/Disease.

Example failed resolution – due to ontology error https: //github. com/monarch-initiative/monarch-disease-ontology/issues/99 https: //github. com/Disease. Ontology/Human. Disease. Ontology/issues/164

Example failed resolution – due to mesh duplicates https: //github. com/monarch-initiative/monarch-disease-ontology/issues/81

Example failed resolution – due to mesh duplicates https: //github. com/monarch-initiative/monarch-disease-ontology/issues/81

Implementation/Availability • Mon. DO – OBO Page • http: //obofoundry. org/ontology/mondo. html – Git.

Implementation/Availability • Mon. DO – OBO Page • http: //obofoundry. org/ontology/mondo. html – Git. Hub project (trackers, source) • https: //github. com/monarch-initiative/monarch-diseaseontology • Methods – Software • https: //github. com/monarch-initiative/kboom – Paper • http: //biorxiv. org/content/early/2016/04/15/048843

OLS View

OLS View

https: //monarchinitiative. org/disease/DOID: 12798#genes

https: //monarchinitiative. org/disease/DOID: 12798#genes

Moving forward: what worked and what didn’t? • Positives: – The merged ontology was

Moving forward: what worked and what didn’t? • Positives: – The merged ontology was generally correct and useful • Iterative improvements • Curated axioms could be incorporated into subsequent releases – We discovered a lot about different resources and how they approach classification – We found lots of issues with input sources, we provided feedback, helping make the world a better place

Moving forward: what worked and what didn’t? • Negatives: – There were a lot

Moving forward: what worked and what didn’t? • Negatives: – There were a lot more problems with sources than anticipated • E. g. DO: cancer has many problems; orphanet has many oddities • Algorithm is resilient to problems to an extent: when noise becomes too high results suffer – We were not always successful in getting our proposed fixes back into sources – Slow release cycle + attendant issues of an automated process

Beyond merging: a community ontology • We are ready to move to the next

Beyond merging: a community ontology • We are ready to move to the next stage – An ontology that can be directly edited – Well modularized • E. g. NCIT as separate module for cancer – Engineered using modern ontology engineering techniques • Plan – EFO-Mon. DO

Acknowledgments • • • Nicole Vasilesky Ian Holmes Sebastian Kohler Jim Balhoff Peter Robinson

Acknowledgments • • • Nicole Vasilesky Ian Holmes Sebastian Kohler Jim Balhoff Peter Robinson Melissa Haendel FUNDING: NIH Office of Director: 1 R 24 OD 011883; NIH-UDP: HHSN 268201300036 C

OMIM: 125850 ! Maturity-Onset Diabetes of the Young, Type 1 OMIM: 125851 ! Maturity-Onset

OMIM: 125850 ! Maturity-Onset Diabetes of the Young, Type 1 OMIM: 125851 ! Maturity-Onset Diabetes of the Young, Type 2 OMIM: 600496 ! Maturity-Onset Diabetes of the Young, Type 3 OMIM: 606391 ! Maturity-Onset Diabetes of the Young OMIM: 606392 ! Maturity-Onset Diabetes of the Young, Type 4 OMIM: 606394 ! Maturity-Onset Diabetes of the Young, Type 6 OMIM: 609812 ! Maturity-Onset Diabetes of the Young, Type 8 OMIM: 610508 ! Maturity-Onset Diabetes of the Young, Type 7 OMIM: 612225 ! Maturity-Onset Diabetes of the Young, Type 9 OMIM: 613370 ! Maturity-Onset Diabetes of the Young, Type 10 OMIM: 613375 ! Maturity-Onset Diabetes of the Young, Type 11 OMIM: 616329 ! Maturity-Onset Diabetes of the Young, Type 13 OMIM: 616511 ! Maturity-Onset Diabetes of the Young, Type 14 Example of incomplete hierarchy

2. The problem with xrefs • An xref can mean multiple things – –

2. The problem with xrefs • An xref can mean multiple things – – – Precisely Equivalent (two nodes can be safely merged) Nearly equivalent (nodes cannot be safely merged) Narrower than (xref is to more specific concept) Broader than (xref is to more general concept) Vaguely related (e. g. disease shares phenotypic features with) Some other relationship type (complication-of, leads-tosusceptibility-in, etc) • A machine doesn’t know which interpretation to choose • Why this is a problem: – We need to specify consistent rules about annotation propagation to the technical group – We need to be select the right ID to annotate to

Basic idea • Take all xrefs – Existing manually curated – Add using different

Basic idea • Take all xrefs – Existing manually curated – Add using different automated methods • Infer interpretation of xref as OWL axiom, one of: • Sub. Class. Of (A B, B A) • Equivalent. To • Sibling. Of – Use k. BOOM algorithm to find most likely combination • Merge equivalence groups

Objective: Coherent OWL Ontology Merging (OOM) • Criteria for OOM – Merged • Combines

Objective: Coherent OWL Ontology Merging (OOM) • Criteria for OOM – Merged • Combines multiple lists and classifications (terminologies and lists treated as ‘degenerate’ ontologies), Presented as a single ontology • Equivalent classes merged – Logically Connected • OWL/Description Logic constructs – e. g. Sub. Class. Of, Equivalent. Class, Some. Values. From • Not xrefs – Coherent • Logically coherent: no unsatisfiable classes • Biologically coherent: makes biological and clinical sense

Ontology 1 Ontology 2 Ontology. . Ontology n Axiom Weight Estimator mapping tool mapping

Ontology 1 Ontology 2 Ontology. . Ontology n Axiom Weight Estimator mapping tool mapping curation Inter. Ontology Mappings (Xrefs) Hypothetical Logical Axioms plus Weights (H) Probabilistic Ontology OP = <A, H> Elk Reasoner BOOM Bayes OWL Ontology Merging: Finds the set of hypothetical axioms that maximises P(OP) Merge equivalent classes Merged Coherent OWL Ontology Next iteration Weight Curation

Inputs • Disease vocabularies – – – OMIM (list) NCIT (graph) MGI DC (OMIM

Inputs • Disease vocabularies – – – OMIM (list) NCIT (graph) MGI DC (OMIM groupings) DO (graph) MESH Orphanet • Cross-ontology hard links – MGI DC links (OMIM->DC) – MEDIC (OMIM->MESH) • Xrefs – – DO->{OMIM, Orphanet, MESH, NCIT} OMIM->{Orphanet} Orphanet->{OMIM} Enhanced with automatic mappings

Generating weighted hypothetical logical axioms: Example Axiom Weight Estimator Inter. Ontology Mappings id: DOID:

Generating weighted hypothetical logical axioms: Example Axiom Weight Estimator Inter. Ontology Mappings id: DOID: 12554 hemolytic-uremic syndrome xref: Orphanet: 2134 Atypical hemolytic-uremic syndrome Hypothetical Logical Axioms plus Weights (H) Example rule: if name of A is a substring of name of B, increase prob that A Sub. Class. Of B Pr(DOID: 12554 Sub. Class. Of Orphanet: 2134) = 0. 04 Pr(DOID: 12554 Super. Class. Of Orphanet: 2134) = 0. 84 Pr(DOID: 12554 Equivalent. To Orphanet: 2134) = 0. 08 Pr( none of above ) = 0. 04

K-BOOM Algorithm for finding most likely merged ontology Find values for H that maximises

K-BOOM Algorithm for finding most likely merged ontology Find values for H that maximises P. Problem: 2^N ontologies hi : boolean representing truth value of hypothetical axiom Hi 1. Factorize calculation by dividing combined axioms into k modules (k-BOOM) 2. Use greedy algorithm; start with Most likely hypothetical axioms in Ok 3. Test each configuration using OWL Reasoner (Elk) for satisfiability (unsat => Pr=0), calc posterior probability Algorithm: i. Assert all hypothetical axioms to be true, ii. Make module from equivalence clique 4. Repeat until number of tests exceeds threshold 5. Return most likely configuration for Ok

Probability guided curator workflow: A little knowledge goes a long way • Run cycle

Probability guided curator workflow: A little knowledge goes a long way • Run cycle • Look at flagged clusters – Pr(H_i = true) << threshold • Apply biological/clinical knowledge • Override auto-generated hypothetical axiom weights with curated ones – Feedback issues to source ontologies • Repeat dialog External ontology curator Mondo curator

Evaluation • No gold standard for multiple ontology merger – Partial evaluation using held-back

Evaluation • No gold standard for multiple ontology merger – Partial evaluation using held-back Orphanet NTBT/E calls: • 6977/7986 (87% agreement) • But these are often wrong! https: //github. com/monarch-initiative/monarch-diseaseontology/issues/149 • Ad-hoc evaluation by curator – Approach: use posterior probabilities to rank modules requiring attention – This is the killer-app feature – Iteratively refine curated probabilities • https: //github. com/monarch-initiative/monarch-disease-ontology/issues/ • Results – Manual inspection and use of mondo – Detection of errors in source ontologies • E. g. duplicates in MESH • Incorrect xrefs in DO, e. g. – https: //github. com/Disease. Ontology/Human. Disease. Ontology/issues - issues #164, #163, #156, #154, #151, #150, #149, #140, #135

Summary • Multiple disease lists and vocabularies bring different perspectives and plug different gaps

Summary • Multiple disease lists and vocabularies bring different perspectives and plug different gaps – But these are only integrated by loose ‘xrefs’ • k. BOOM uses a probabilistic algorithm to merge these (hopefully) into a cohesive whole – DO+OMIM+Orphanet+DC+MESH+NCIT Mon. DO • Results – Problem areas can be honed in on quickly – Fixes can be passed upstream

Discussion • Ideally, retrospective merging would not be required – We would work as

Discussion • Ideally, retrospective merging would not be required – We would work as a single group on one ontology – Use modern ontology practices such as documenting design patterns to ensure cohrency • In practice this is hard…

Implications for AGR • Which disease sources are a priority for AGR? – Is

Implications for AGR • Which disease sources are a priority for AGR? – Is DO+OMIM sufficient? – What about other gene-disease sources? – How best to integrate MGI DC? – How much does AGR have a Mendelian bias? – Consult wider community?

Proposal for AGR: Use Mon. DO for curation and display • Curation: – Curator

Proposal for AGR: Use Mon. DO for curation and display • Curation: – Curator can choose most meaningful level of specificity and ID • E. g. NCIT for cancer, DO for general groupings, OMIM or Orphanet for specific forms of Mendelian disease • Display/queries – Use normal ontology propagation rules