Modeling Functional Genomics Datasets CVM 8890 101 Lesson

  • Slides: 33
Download presentation
Modeling Functional Genomics Datasets CVM 8890 -101 Lesson 2 13 June 2007 Teresia Buza

Modeling Functional Genomics Datasets CVM 8890 -101 Lesson 2 13 June 2007 Teresia Buza

Lesson 2: Introduction to functional annotation. Orthologs and homologs; clusters of orthologous genes (COGs)

Lesson 2: Introduction to functional annotation. Orthologs and homologs; clusters of orthologous genes (COGs) and the gene ontology (GO); and how to find what functional annotation is available.

1. Introduction to Functional Annotation

1. Introduction to Functional Annotation

Where are we? Central Dogma New technology Genomic hypothesis Genome sequencing m. RNA transcript

Where are we? Central Dogma New technology Genomic hypothesis Genome sequencing m. RNA transcript ATGTCCTATCC ATGTCGTACAG ATTGACGAGAT Transcriptome Transcript profiling Protein Proteome Protein quantification What is all this? Structural annotation What next? Functional annotation

Genome Annotation Biologists refer to both the annotation of the genome and functional annotation

Genome Annotation Biologists refer to both the annotation of the genome and functional annotation of gene products: “Structural” Annotation & “Functional” Annotation

Structural & Functional Annotation Structural annotation Identification of genomic elements. • ORFs predicted during

Structural & Functional Annotation Structural annotation Identification of genomic elements. • ORFs predicted during genome assembly • Location of ORFs • Gene structure • Coding regions • Location of regulatory motifs etc Functional annotation Attaching biological information to genomic elements. • Biochemical function • Biological function • Involved regulation and interactions • Expression etc These steps may involve both biological experiments and in silico analysis. http: //en. wikipedia. org/wiki/Genome_annotation#Genome_annotation (with modifications)

Why Functional Annotation? Enables you to take large “laundry lists” of genes/proteins and turn

Why Functional Annotation? Enables you to take large “laundry lists” of genes/proteins and turn them into a biologically useful model

Functional Annotation • Annotation of gene products = Gene Ontology (GO) annotation • Initially,

Functional Annotation • Annotation of gene products = Gene Ontology (GO) annotation • Initially, predicted ORFs have no functional literature and GO annotation relies on computational methods (rapid but ? Quantity vs Quality) • Functional literature exists for many genes/proteins prior to genome sequencing (slow but provide high quality annotations) • GO annotation does not rely on a completed genome sequence!

Types of Functional annotation Based in direct experimental evidence of function Experiments in the

Types of Functional annotation Based in direct experimental evidence of function Experiments in the same ORGANISM example: • Enzyme assays • Binding experiments • Pathway analysis • Synthetic lethals • Functional complementation • Gene mutations • RNAi • 2 -hybrid interactions etc Indirect Evidence of function • Expression analysis • Structure analysis • Sequence analysis

Functional Annotation Problem: • Many genes/proteins have no annotation • Some have unknown functions

Functional Annotation Problem: • Many genes/proteins have no annotation • Some have unknown functions Challenge: • We want to get the maximum functional annotation for modeling our data Solution: • Read papers (Pubmed etc) • Search for homologs/orthologs of known function • Homologs and orthologs help assign function….

2. Finding Function: orthologs and homologs

2. Finding Function: orthologs and homologs

What are Homologs, Orthologs, Paralogs? Homolog Is a relationship between genes separated by the

What are Homologs, Orthologs, Paralogs? Homolog Is a relationship between genes separated by the event of speciation or genetic duplication Orthologs are homologous genes in different species that evolved from a common ancestor gene by speciation. Normally (not always), orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes. Paralogs are homologous genes related by duplication within a genome. Paralogs evolve new functions, even if these are related to the original one. http: //homepage. usask. ca/~ctl 271/857/def_homolog. shtml

Orthologs & Paralogs orthologs Paralogs http: //www. ensembl. org/info/data/compara/tree_example 1. jpg

Orthologs & Paralogs orthologs Paralogs http: //www. ensembl. org/info/data/compara/tree_example 1. jpg

How to search for Orthology? BLAST : • • http: //www. ncbi. nlm. nih.

How to search for Orthology? BLAST : • • http: //www. ncbi. nlm. nih. gov/BLAST/ Sequence alignment search tool Utilizes heuristic algorithm MPsrch: • • • http: //www. ebi. ac. uk/MPsrch/ Sequence comparison tool Implement Smith & Waterman algorithm Utilizes exhaustive algorithm Domain analysis: http: //www. ncbi. nlm. nih. gov/Structure/cdd. shtml • • Analysis of regions of sequence homology among sets of proteins that are not all fulllength homologs. Homology domains often, but not always, correspond to recognizable protein folding domains Protein family databases (e. g. COGs & KOGs) • • Superfamily: Complete set of proteins having sequence homology over essentially their full length. Subfamilies: Incomplete set of homologous proteins which yet encompass proteins of diverse function

Systems for Functional Annotation 1. Clusters of Orthologous Groups (COGs) Prokaryotes 2. eu. Karyote

Systems for Functional Annotation 1. Clusters of Orthologous Groups (COGs) Prokaryotes 2. eu. Karyote Orthologous Groups (KOGs) Eukaryotes 3. Gene Ontology (GO)

COGs & KOGs n n n Both are based on orthology. Genes are assigned

COGs & KOGs n n n Both are based on orthology. Genes are assigned to broad categories (A -Z) Each category corresponds to an ancient conserved domain COGs - prokaryotes KOGs - eukaryotes

Clusters of Orthologous Groups (COGs) http: //www. ncbi. nlm. nih. gov/COG/ Text search: COGs

Clusters of Orthologous Groups (COGs) http: //www. ncbi. nlm. nih. gov/COG/ Text search: COGs has 25 functional categories (A – Z) in four broad groups 1. Information storage and processing 2. Cellular processes and signaling 3. Metabolism 4. Poorly characterized

COGs Categories INFORMATION STORAGE AND PROCESSING [J] Translation, ribosomal structure and biogenesis [A] RNA

COGs Categories INFORMATION STORAGE AND PROCESSING [J] Translation, ribosomal structure and biogenesis [A] RNA processing and modification [K] Transcription [L] Replication, recombination and repair [B] Chromatin structure and dynamics CELLULAR PROCESSES AND SIGNALING [D] Cell cycle control, cell division, chromosome partitioning [Y] Nuclear structure [V] Defense mechanisms [T] Signal transduction mechanisms [M] Cell wall/membrane/envelope biogenesis [N] Cell motility [Z] Cytoskeleton [W] Extracellular structures [U] Intracellular trafficking, secretion, and vesicular transport [O] Posttranslational modification, protein turnover, chaperones ftp: //ftp. ncbi. nih. gov/pub/COG/fun. txt

COGs Categories METABOLISM [C] Energy production and conversion [G] Carbohydrate transport and metabolism [E]

COGs Categories METABOLISM [C] Energy production and conversion [G] Carbohydrate transport and metabolism [E] Amino acid transport and metabolism [F] Nucleotide transport and metabolism [H] Coenzyme transport and metabolism [I] Lipid transport and metabolism [P] Inorganic ion transport and metabolism [Q] Secondary metabolites biosynthesis, transport and catabolism POORLY CHARACTERIZED [R] General function prediction only [S] Function unknown ftp: //ftp. ncbi. nih. gov/pub/COG/fun. txt

Example 1 Classification of COGs by functional categories Tatusov et al. , 2000: The

Example 1 Classification of COGs by functional categories Tatusov et al. , 2000: The COG database: a tool for genome-scale analysis of protein functions and evolution

Example 2 Effects of Antibiotics on Pasteurella multocida transcriptome AMX CTC ENR 40 35

Example 2 Effects of Antibiotics on Pasteurella multocida transcriptome AMX CTC ENR 40 35 30 25 20 15 10 5 0 Decrease Increase - C D E F G H I J K L M N O P Q R S T U V COG categories Nanduri et al 2006

The Gene Ontology (GO) • The Gene Ontology (GO) is the de facto Standard

The Gene Ontology (GO) • The Gene Ontology (GO) is the de facto Standard for functional annotation • GO functional annotation is based on orthology AND direct experimental evidence • GO terms allow much more detailed functional analysis (> 24, 000 terms) than COGs & KOGs (25 broad terms) • GO is a controlled vocabulary of terms split into three related ontologies covering basic areas of molecular biology: § molecular function: 8, 123 terms § biological process: 13, 960 terms GO Report 2007 - 04 § cellular component: 2, 071 terms

Example 3 Functional Annotation of Chicken Proteomic data Cellular Component

Example 3 Functional Annotation of Chicken Proteomic data Cellular Component

Use GO for……. • Modeling function in high-throughput datasets (arrays!) started by Fly, Yeast,

Use GO for……. • Modeling function in high-throughput datasets (arrays!) started by Fly, Yeast, Mouse (Ashburner et al 2000, 2001) • Grouping gene products by biological function • Determining which classes of gene products are over-represented or under-represented • Focusing on particular biological pathways and functions (hypothesis-driven) • Relating a protein’s location to its function

Annotating to the GO • Need to show type of evidence of function §

Annotating to the GO • Need to show type of evidence of function § Literature curation: read and interpret reviewed literature (IDA, IGI, IMP, IPI, IGC) (TAS, NAS) § Computational analysis (RCA, ISS, IEA) http: //www. geneontology. org/GO. evidence. shtml

4. How to find functional annotation for your species

4. How to find functional annotation for your species

How to find functional annotation n For quick search you need to know: n

How to find functional annotation n For quick search you need to know: n n n Name of your species (e. g Sus scrofa, Aspergillus flavus) Taxonomy ID (e. g 9823 – S. scrofa, 5059 – A. flavus etc) Database to look in (e. g. NCBI, Uniprot, EBI-GOA, GOC, Ag. Base etc) Not all functional annotation for a species will be in one database! Not very many species have a broad coverage of GO annotation… BUT do not worry n n Search for their homologs might help May rely on manual annotation from literature (Refer Manual annotation Course on by Fiona Mc. Carthy)

Are the genes/proteins in Gen. Bank? Check by Taxon ID Functional annotation Yes No

Are the genes/proteins in Gen. Bank? Check by Taxon ID Functional annotation Yes No Known? NM_, NP_ Uni. Parc/IPI Annotate byby structural/sequence similarity ORTHOLOGS(ISScode) Yes Uni. Prot. KB No GO Manualannotationsfromliterature (IDA, IMP, IPI, IMP, IGI, IPI, IEP IGI, codes) IEP codes) GOA make GO annotations (IEA) using automated methods Fill in GO association file GOA collect all GO annotations & submit to GOC Submit to Ag. Base (Agricultural Species) GOC maintain annotation files • unfiltered GOA • filtered GOA maintain annotation file Ag. Base maintains annotation file

Demonstration

Demonstration