Reallife ontology development lessons from the Gene Ontology

  • Slides: 47
Download presentation
Real-life ontology development: lessons from the Gene Ontology

Real-life ontology development: lessons from the Gene Ontology

 • • • What is GO? Evolution of GO Mechanisms of updating GO

• • • What is GO? Evolution of GO Mechanisms of updating GO Tools for ontology development Lessons learned

Gene Ontology • Built for a very specific purpose: “annotation of genes and proteins

Gene Ontology • Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases” • Applicable to all species

Gene Ontology - scope • Three disjoint axes: – molecular function • molecular role

Gene Ontology - scope • Three disjoint axes: – molecular function • molecular role e. g. catalytic activity, binding – biological process • broad biological phenomena e. g. mitosis, growth, digestion – cellular component • sub-cellular location e. g nucleus, ribosome, origin recognition complex

Gene Ontology • Directed acyclic graph (DAG) • Terms connected by two transitive relations

Gene Ontology • Directed acyclic graph (DAG) • Terms connected by two transitive relations (edges): – is_a – part_of

Gene Ontology • Developed by an international consortium – about 50 members • Editorial

Gene Ontology • Developed by an international consortium – about 50 members • Editorial office, 4 full-time editors (ish) • Many other part-time editors at databases • Multiple changes made a day – made live immediately

Gene Ontology • Main ontology format OBO flat file • Changes are live immediately

Gene Ontology • Main ontology format OBO flat file • Changes are live immediately – no releases • Propagated to GO database – monthly snapshots archived

Evolution of GO • Original GO created in 2000 • Three databases involved: –

Evolution of GO • Original GO created in 2000 • Three databases involved: – Fly. Base (Drosophila) – MGI (Mouse) – SGD (S. cerevisae) • Used immediately

Evolution of GO • Later databases: – – TAIR (Arabadopsis) TIGR (microbes including prokaryotes)

Evolution of GO • Later databases: – – TAIR (Arabadopsis) TIGR (microbes including prokaryotes) SWISS-PROT (several thousand species inc. human) PSU (P. falciparum) • Recent additions – ZFIN (zebrafish) – PAMGO (plant pathogens)

Evolution of GO • GO development traditionally annotationdriven – development directed by use •

Evolution of GO • GO development traditionally annotationdriven – development directed by use • Terms added as new species annotated • Terms added on as as-needed basis

Evolution of GO • Resulted in ‘organic’ structure, little formality • Ontological formality added

Evolution of GO • Resulted in ‘organic’ structure, little formality • Ontological formality added subsequently – philosophical and logical

Growth of GO

Growth of GO

Modifying the graph: • Before:

Modifying the graph: • Before:

Modifying the graph: • But then I need to annotate VW Beetles, pre-1980 •

Modifying the graph: • But then I need to annotate VW Beetles, pre-1980 • The graph no longer works, because the engine is in the boot

Modifying the graph: • After:

Modifying the graph: • After:

Mechanisms for ontology change • Small incremental changes • Initially all changes to the

Mechanisms for ontology change • Small incremental changes • Initially all changes to the ontologies made this way

Mechanisms for ontology change • Suggested changes initially submitted by email • Moved to

Mechanisms for ontology change • Suggested changes initially submitted by email • Moved to an online tracking system when this became unmanageable

Requesting changes to GO curator requests tracker • Web-based tracking system hosted at Source.

Requesting changes to GO curator requests tracker • Web-based tracking system hosted at Source. Forge. net • Public • Tracker item for each new request or question

Curator requests tracker

Curator requests tracker

Mechanisms for ontology change • Problems: – Larger questions about the higher ontology structure

Mechanisms for ontology change • Problems: – Larger questions about the higher ontology structure remain unresolved – Makes some items impossible to close – No sense of the ‘big picture’ – Large areas of the ontologies missing or incomplete because no annotations – Massive volume • needed to increase the number of editors

Mechanisms for ontology change • Larger-scale changes: – content meetings – interest groups

Mechanisms for ontology change • Larger-scale changes: – content meetings – interest groups

Content meetings • Short meetings aimed at developing specific areas of GO ontology content

Content meetings • Short meetings aimed at developing specific areas of GO ontology content – – proposals refined and discussed before meeting small number of people (10 -15) invited experts specific topics

Content meetings • Further refinements made following meeting by email • Changes are made

Content meetings • Further refinements made following meeting by email • Changes are made once consensus reached • Large number of terms typically added (500+)

Content meetings • Recent meetings: – immunology – interactions between organisms – CNS development

Content meetings • Recent meetings: – immunology – interactions between organisms – CNS development

Content meetings • Advantages – Allows a lot of detailed work to be done

Content meetings • Advantages – Allows a lot of detailed work to be done on a very specific area – Involves external expertise

Content meetings • Problems: – Expensive - everyone has to be in the same

Content meetings • Problems: – Expensive - everyone has to be in the same location – Only works for very specific topics – Long lag time getting terms into ontologies

Interest groups • Groups of experts for a specific topic – e. g. development,

Interest groups • Groups of experts for a specific topic – e. g. development, cell cycle, plants • Includes GO curators/annotators and external experts • Don’t typically meet face to face

Interest groups • Communicate via email, desktop sharing etc • Transporters area of the

Interest groups • Communicate via email, desktop sharing etc • Transporters area of the ontology recently revised this way

Interest groups • Advantages – Cheap, no travel required – Allows a lot of

Interest groups • Advantages – Cheap, no travel required – Allows a lot of detailed work to be done on a very specific area – Involves external expertise

Interest groups • Disadvantages – Harder to reach consensus when not face to face

Interest groups • Disadvantages – Harder to reach consensus when not face to face – Projects tend to drag on

Mechanisms for ontology change • Systematic changes via small working groups

Mechanisms for ontology change • Systematic changes via small working groups

Systematic changes • Projects not directly related to biological content • Systematic changes throughout

Systematic changes • Projects not directly related to biological content • Systematic changes throughout ontology • Small group of GO consortium members – meets regularly by desktop sharing, voice over IP • Experts recruited to meetings as needed

Systematic changes • Changes either – made on a branch of the ontology and

Systematic changes • Changes either – made on a branch of the ontology and merged in later • always have big problems merging branched file into main file – merged directly into live ontology after session • fast, but people get angry

is_a complete • GO contains both is_a and part_of relations • Typically, graphs a

is_a complete • GO contains both is_a and part_of relations • Typically, graphs a mixture of incomplete is_a and part_of hierarchies • A result of ‘organic’ evolution of GO • All graphs now have complete is_a paths to root

partial disjointness • Biological process terms organised by granularity: – cellular process – multicellular

partial disjointness • Biological process terms organised by granularity: – cellular process – multicellular organism process – multi-organism process • To avoid massive increase in number of paths to root, these terms are disjoint – no is_a children in common

sensu • sensu (meaning ‘in the sense of’) used to disambiguate, by taxonomic group,

sensu • sensu (meaning ‘in the sense of’) used to disambiguate, by taxonomic group, terms with identical strings but different meanings • e. g. sporulation (sensu Viridiplantae) v/s sporulation (sensu Bacteria)

sensu • Current project to remove the sensu term strings • Replace with strings

sensu • Current project to remove the sensu term strings • Replace with strings that represent the true differentiae • e. g. – cell wall (sensu Bacteria) -> peptidoglycan-based cell wall – cell wall (sensu Fungi) -> chitin- and beta-glucancontaining cell wall

Systematic changes to GO • Advantages – Fast – Efficient – Small number of

Systematic changes to GO • Advantages – Fast – Efficient – Small number of people required

Systematic changes to GO • Disadvantages – Difficult to obtain wider consensus – Changes

Systematic changes to GO • Disadvantages – Difficult to obtain wider consensus – Changes sometimes have to be undone

Useful tools for ontology development • Web. Ex – desktop sharing, can control each

Useful tools for ontology development • Web. Ex – desktop sharing, can control each others desktops • wiki – mainly internal • Skype – free international calls! • conference calls – not free

Tracking changes to GO • General tracking – files stored in cvs, all differences

Tracking changes to GO • General tracking – files stored in cvs, all differences trackable (in theory) – far from ideal - frequent discussion is should we history track, date-stamp terms?

Tracking changes to GO • Obsolete terms – formerly stored within the ontology –

Tracking changes to GO • Obsolete terms – formerly stored within the ontology – in OBO format made a special kind of deprecated term (tag is_obsolete) – Soon to create ‘replaced_by’ and ‘consider’ tags to point to live terms

Tracking changes to GO • Crediting experts – traditionally no mechanism for doing this

Tracking changes to GO • Crediting experts – traditionally no mechanism for doing this – creating abstracts for content meetings, adding tag to term – as yet no mechanism for crediting individuals

Useful tools for ontology development • OBO-Edit – ontology editor originally developed for GO

Useful tools for ontology development • OBO-Edit – ontology editor originally developed for GO – can be used for any OBO format ontology – developed by group of users

Useful tools for ontology development • Reasoner integrated into OBO-Edit – based on OBOL

Useful tools for ontology development • Reasoner integrated into OBO-Edit – based on OBOL – detects missing links, redundant links, – soon misplaced terms, automatic term creation • Validation system – typographical errors, is_a orphans, duplicate synonyms etc.

Lessons learned • An ontology doesn’t have to be perfect or complete to be

Lessons learned • An ontology doesn’t have to be perfect or complete to be used • For domain ontologies, external experts should be involved • Communication is critical • You will never please everyone