Reallife ontology development lessons from the Gene Ontology
- Slides: 47
Real-life ontology development: lessons from the Gene Ontology
• • • What is GO? Evolution of GO Mechanisms of updating GO Tools for ontology development Lessons learned
Gene Ontology • Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases” • Applicable to all species
Gene Ontology - scope • Three disjoint axes: – molecular function • molecular role e. g. catalytic activity, binding – biological process • broad biological phenomena e. g. mitosis, growth, digestion – cellular component • sub-cellular location e. g nucleus, ribosome, origin recognition complex
Gene Ontology • Directed acyclic graph (DAG) • Terms connected by two transitive relations (edges): – is_a – part_of
Gene Ontology • Developed by an international consortium – about 50 members • Editorial office, 4 full-time editors (ish) • Many other part-time editors at databases • Multiple changes made a day – made live immediately
Gene Ontology • Main ontology format OBO flat file • Changes are live immediately – no releases • Propagated to GO database – monthly snapshots archived
Evolution of GO • Original GO created in 2000 • Three databases involved: – Fly. Base (Drosophila) – MGI (Mouse) – SGD (S. cerevisae) • Used immediately
Evolution of GO • Later databases: – – TAIR (Arabadopsis) TIGR (microbes including prokaryotes) SWISS-PROT (several thousand species inc. human) PSU (P. falciparum) • Recent additions – ZFIN (zebrafish) – PAMGO (plant pathogens)
Evolution of GO • GO development traditionally annotationdriven – development directed by use • Terms added as new species annotated • Terms added on as as-needed basis
Evolution of GO • Resulted in ‘organic’ structure, little formality • Ontological formality added subsequently – philosophical and logical
Growth of GO
Modifying the graph: • Before:
Modifying the graph: • But then I need to annotate VW Beetles, pre-1980 • The graph no longer works, because the engine is in the boot
Modifying the graph: • After:
Mechanisms for ontology change • Small incremental changes • Initially all changes to the ontologies made this way
Mechanisms for ontology change • Suggested changes initially submitted by email • Moved to an online tracking system when this became unmanageable
Requesting changes to GO curator requests tracker • Web-based tracking system hosted at Source. Forge. net • Public • Tracker item for each new request or question
Curator requests tracker
Mechanisms for ontology change • Problems: – Larger questions about the higher ontology structure remain unresolved – Makes some items impossible to close – No sense of the ‘big picture’ – Large areas of the ontologies missing or incomplete because no annotations – Massive volume • needed to increase the number of editors
Mechanisms for ontology change • Larger-scale changes: – content meetings – interest groups
Content meetings • Short meetings aimed at developing specific areas of GO ontology content – – proposals refined and discussed before meeting small number of people (10 -15) invited experts specific topics
Content meetings • Further refinements made following meeting by email • Changes are made once consensus reached • Large number of terms typically added (500+)
Content meetings • Recent meetings: – immunology – interactions between organisms – CNS development
Content meetings • Advantages – Allows a lot of detailed work to be done on a very specific area – Involves external expertise
Content meetings • Problems: – Expensive - everyone has to be in the same location – Only works for very specific topics – Long lag time getting terms into ontologies
Interest groups • Groups of experts for a specific topic – e. g. development, cell cycle, plants • Includes GO curators/annotators and external experts • Don’t typically meet face to face
Interest groups • Communicate via email, desktop sharing etc • Transporters area of the ontology recently revised this way
Interest groups • Advantages – Cheap, no travel required – Allows a lot of detailed work to be done on a very specific area – Involves external expertise
Interest groups • Disadvantages – Harder to reach consensus when not face to face – Projects tend to drag on
Mechanisms for ontology change • Systematic changes via small working groups
Systematic changes • Projects not directly related to biological content • Systematic changes throughout ontology • Small group of GO consortium members – meets regularly by desktop sharing, voice over IP • Experts recruited to meetings as needed
Systematic changes • Changes either – made on a branch of the ontology and merged in later • always have big problems merging branched file into main file – merged directly into live ontology after session • fast, but people get angry
is_a complete • GO contains both is_a and part_of relations • Typically, graphs a mixture of incomplete is_a and part_of hierarchies • A result of ‘organic’ evolution of GO • All graphs now have complete is_a paths to root
partial disjointness • Biological process terms organised by granularity: – cellular process – multicellular organism process – multi-organism process • To avoid massive increase in number of paths to root, these terms are disjoint – no is_a children in common
sensu • sensu (meaning ‘in the sense of’) used to disambiguate, by taxonomic group, terms with identical strings but different meanings • e. g. sporulation (sensu Viridiplantae) v/s sporulation (sensu Bacteria)
sensu • Current project to remove the sensu term strings • Replace with strings that represent the true differentiae • e. g. – cell wall (sensu Bacteria) -> peptidoglycan-based cell wall – cell wall (sensu Fungi) -> chitin- and beta-glucancontaining cell wall
Systematic changes to GO • Advantages – Fast – Efficient – Small number of people required
Systematic changes to GO • Disadvantages – Difficult to obtain wider consensus – Changes sometimes have to be undone
Useful tools for ontology development • Web. Ex – desktop sharing, can control each others desktops • wiki – mainly internal • Skype – free international calls! • conference calls – not free
Tracking changes to GO • General tracking – files stored in cvs, all differences trackable (in theory) – far from ideal - frequent discussion is should we history track, date-stamp terms?
Tracking changes to GO • Obsolete terms – formerly stored within the ontology – in OBO format made a special kind of deprecated term (tag is_obsolete) – Soon to create ‘replaced_by’ and ‘consider’ tags to point to live terms
Tracking changes to GO • Crediting experts – traditionally no mechanism for doing this – creating abstracts for content meetings, adding tag to term – as yet no mechanism for crediting individuals
Useful tools for ontology development • OBO-Edit – ontology editor originally developed for GO – can be used for any OBO format ontology – developed by group of users
Useful tools for ontology development • Reasoner integrated into OBO-Edit – based on OBOL – detects missing links, redundant links, – soon misplaced terms, automatic term creation • Validation system – typographical errors, is_a orphans, duplicate synonyms etc.
Lessons learned • An ontology doesn’t have to be perfect or complete to be used • For domain ontologies, external experts should be involved • Communication is critical • You will never please everyone
- Dicty base
- Gene ontology
- Gene ontology project
- Gene ontology project
- Gene by gene test results
- "pearson education"
- Hình ảnh bộ gõ cơ thể búng tay
- Frameset trong html5
- Bổ thể
- Tỉ lệ cơ thể trẻ em
- Gấu đi như thế nào
- Glasgow thang điểm
- Chúa sống lại
- Các môn thể thao bắt đầu bằng tiếng nhảy
- Thế nào là hệ số cao nhất
- Các châu lục và đại dương trên thế giới
- Công thức tiính động năng
- Trời xanh đây là của chúng ta thể thơ
- Mật thư tọa độ 5x5
- Phép trừ bù
- Phản ứng thế ankan
- Các châu lục và đại dương trên thế giới
- Thơ thất ngôn tứ tuyệt đường luật
- Quá trình desamine hóa có thể tạo ra
- Một số thể thơ truyền thống
- Cái miệng bé xinh thế chỉ nói điều hay thôi
- Vẽ hình chiếu vuông góc của vật thể sau
- Biện pháp chống mỏi cơ
- đặc điểm cơ thể của người tối cổ
- Ví dụ về giọng cùng tên
- Vẽ hình chiếu đứng bằng cạnh của vật thể
- Phối cảnh
- Thẻ vin
- đại từ thay thế
- điện thế nghỉ
- Tư thế ngồi viết
- Diễn thế sinh thái là
- Các loại đột biến cấu trúc nhiễm sắc thể
- Số nguyên tố là gì
- Tư thế ngồi viết
- Lời thề hippocrates
- Thiếu nhi thế giới liên hoan
- ưu thế lai là gì
- Hươu thường đẻ mỗi lứa mấy con
- Sự nuôi và dạy con của hổ
- Sơ đồ cơ thể người
- Từ ngữ thể hiện lòng nhân hậu
- Thế nào là mạng điện lắp đặt kiểu nổi