Ontologies GO Workshop 3 6 August 2010 Ontologies

  • Slides: 27
Download presentation
Ontologies GO Workshop 3 -6 August 2010

Ontologies GO Workshop 3 -6 August 2010

Ontologies What are ontologies? p Why use ontologies? p Open Biological Ontologies (OBO), National

Ontologies What are ontologies? p Why use ontologies? p Open Biological Ontologies (OBO), National Center for Biomedical Ontology (NCBO) p Some useful ontologies… p

What Are Ontologies? "An ontology is an explicit specification of some topic. For our

What Are Ontologies? "An ontology is an explicit specification of some topic. For our purposes, it is a formal and declarative representation which includes the vocabulary (or names) for referring to the terms in that subject area and the logical statements that describe what the terms are and how they are related to each other… “Ontologies therefore provide a vocabulary for representing and communicating knowledge about some topic and a set of relationships that hold among the terms in that vocabulary” (From the Stanford Knowledge Systems Lab).

What Are Ontologies? “An ontology is a controlled vocabulary of well defined terms with

What Are Ontologies? “An ontology is a controlled vocabulary of well defined terms with specified relationships between those terms, capable of interpretation by both humans and computers. ” Bio-ontologies can be used to provide structured annotation. Biocurators are biologists who are trained to catalogue biological data (using database structures, bio-ontologies, etc).

Why use ontologies? p new sequencing technologies are increasing the rate that DNA is

Why use ontologies? p new sequencing technologies are increasing the rate that DNA is sequenced: n p Jan 2009: 20 billion bases (or letters) of highquality human DNA sequence – seven-times the length of a human genome – in 10 days. Computer analysis of the genome took another 10 days. complexity of data is also increasing How manage the data? - data sharing - from data to knowledge

Why use ontologies? p p p Bio-ontologies are used to capture biological information in

Why use ontologies? p p p Bio-ontologies are used to capture biological information in a way that can be read by both humans and computers n annotate data in a consistent way n allows data sharing across databases n allows computational analysis of high-throughput “omics” datasets Objects in an ontology (eg. genes, cell types, tissue types, stages of development) are well defined. The ontology shows how the objects relate to each other.

relationships between terms Ontologies digital identifier (computers) description (humans)

relationships between terms Ontologies digital identifier (computers) description (humans)

Ontology Relationships ontologies link terms using relationships n relations between terms are also categorized

Ontology Relationships ontologies link terms using relationships n relations between terms are also categorized and defined n GO: n n n n is a (eg. lyase activity is a catalytic activity) part of (eg. replication fork is part of chromosome) regulates negatively regulates positively regulates PO: n n n is a part of develops from http: //www. geneontology. org/GO. ont ology. relations. shtml

Relationships: the True Path Rule n n Why are relationships between terms important? TRUE

Relationships: the True Path Rule n n Why are relationships between terms important? TRUE PATH RULE: all attributes of children must hold for all parents so if a protein is annotated to a term, it must also be true for all the parent terms this enables us to move up the ontology structure from a granular term to a broader term Premise of many GO anaylsis tools

Bio-ontology requirements 1. Ontology development n 2. Annotate data to the ontology n n

Bio-ontology requirements 1. Ontology development n 2. Annotate data to the ontology n n 3. continual process as new terms are added to support more detailed data computational annotation (breadth - quick) manual biocuration (depth - slow) Tools that use the ontology data n n browsing and searching the ontology and its associated data analysis of data annotated to the ontology

Resources for biocuration p p bio-ontologies (Open Biomedical Ontologies) computational pipelines (‘breadth’) n n

Resources for biocuration p p bio-ontologies (Open Biomedical Ontologies) computational pipelines (‘breadth’) n n p manual biocuration (‘depth’) n n n p for computational annotations useful for gene products without published information requires trained biocurators community annotation efforts each species has its own body of literature biocuration co-ordination n n MODs? Consortium? Community? biocuration prioritization co-ordination with existing Dbs, annotation, nomenclature initiatives data updates

Current bio-ontology limitations ontology development p annotation strategies to match increasing amount of biological

Current bio-ontology limitations ontology development p annotation strategies to match increasing amount of biological data p n n n p computational pipelines & biocomputing community annotation/prioritization strategies biocurators tools for dataset analysis (data complexity) n n cross-ontology data mining data visualization

http: //obo. sourceforge. net/ The Open Biomedical Ontology is an initiative to develop bio-ontologies

http: //obo. sourceforge. net/ The Open Biomedical Ontology is an initiative to develop bio-ontologies using common rules/principles and resources p aim to develop interoperable ontologies p n n n p common relationships common evidence codes standardize file sharing develop links between ontologies?

http: //obo. sourceforge. net/ Gene Ontology Plant Ontology Sequence Ontology Trait Ontology Expression/Tissue Ontologies

http: //obo. sourceforge. net/ Gene Ontology Plant Ontology Sequence Ontology Trait Ontology Expression/Tissue Ontologies Infectious Disease Ontology Cell Ontology

Genomic Annotation p 1. 2. p Genome annotation is the process of attaching biological

Genomic Annotation p 1. 2. p Genome annotation is the process of attaching biological information to genomic sequences. It consists of two main steps: identifying functional elements in the genome: “structural annotation” attaching biological information to these elements: “functional annotation” biologists often use the term “annotation” when they are referring only to structural annotation

Structural & Functional Annotation Structural Annotation: p Open reading frames (ORFs) predicted during genome

Structural & Functional Annotation Structural Annotation: p Open reading frames (ORFs) predicted during genome assembly p predicted ORFs require experimental confirmation p Sequence Ontology Project (SO): provide for a structured controlled vocabulary for the description of primary annotations of nucleic acid sequence Functional Annotation: p Gene Ontology (GO): annotation of gene product function p initially, predicted ORFs have no functional literature and GO annotation relies on computational methods (rapid) p functional literature exists for many genes/proteins prior to genome sequencing

Genomic Annotation Other annotations using other bioontologies e. g. Anatomy Ontology Structural Annotation including

Genomic Annotation Other annotations using other bioontologies e. g. Anatomy Ontology Structural Annotation including Sequence Ontology Functional annotation using Gene Ontology Nomenclature (species’ genomenclature committees)

Gene Ontology (GO) p Not about genes! n n p Gene products: genes, transcripts,

Gene Ontology (GO) p Not about genes! n n p Gene products: genes, transcripts, nc. RNA, proteins The GO describes gene product function Not a single ontology n n n Biological Process (BP or P) Molecular Function (MF or F) Cellular Component (CC or C)

Gene Ontology (GO) de facto method for functional annotation p Widely used for functional

Gene Ontology (GO) de facto method for functional annotation p Widely used for functional genomics (high throughput) p Many tools available for gene expression analysis using GO p The GO Consortium homepage: p http: //www. geneontology. org

Plant Ontology (PO) p p p describes plant structures and growth and developmental stages

Plant Ontology (PO) p p p describes plant structures and growth and developmental stages Currently used for Arabidopsis, maize, rice – more being added (soybean, tomato, cotton, etc) Plant Structure: describes morphological and anatomical structures representing organ, tissue and cell types Growth and developmental stages: describes (i) whole plant growth stages and (ii) plant structure developmental stages The PO Consortium homepage: http: //www. plantontology. org/

PO Browser – based on the GO Consortium browser, Amigo

PO Browser – based on the GO Consortium browser, Amigo

http: //www. ebi. ac. uk/ontology-lookup/

http: //www. ebi. ac. uk/ontology-lookup/