Comprehensive Microbial Resource www tigr orgCMR Bioinformatics Visualization
Comprehensive Microbial Resource www. tigr. org/CMR Bioinformatics Visualization Workshop Owen White May 30, 2002
Curation v v v Genome Annotation § Michelle Gwinn § Bob Dodson § Bob De. Boy § James Kolonay § Bill Nelson § Ramana Madupu § Sean Daugherty § Maureen Beanan § Scott Durkin § Lauren Brinkac Bioinformatics Engineers § Jeremy Peterson § Lowell Umayam § Samual Angiuoli TIGRFAMs/Groups § Dan Haft § Jeremy Selengut v v Maria Ermolaeva (Operons/Terminators) Erik Ferlanti (All vs. All) Faculty § Jonathan Eisen (DNA repair) § Ian Paulsen (transporters) § Steven Salzberg Collaborators § Swiss-prot § Monica Riley § The open source crowd § Art Delcher (Glimmer)
Retrieval Pointed- Rounded- Truncate- Emarginate- Lunate- Forked- Heterocercal- Caudal Fins http: //web. pdx. edu/~bowersn/bi 399/lecture 2. html
Retrieval across data types. Dorsal Spines Dorsal Rays Caudal Fins
Typical annotation datatypes clone_info: Tracks information related to the parent nucleotide assembly, including its annotation status, which institution the sequence was derived, and whether it is part of a larger assembly such as a chromosome. asm_feature: All major features of the parent assembly are stored here, including annotated genes, predicted genes, repetitive elements, splice sites, and all underlying components of a gene (models, transcript exons, and cds exons). phys_ev: Attribute for each gene component within the asm_feature table. For example, each predicted annotated gene has a model and multiple exons stored in the asm_feature table. Linking the feature to phys_ev will identify the type of feature present: ie. glimmer, genscan+, genemark. HMM, or working (annotation). This becomes important if a single feature in the asm_feature table is shared by multiple model types. feat_link: This table is key to the principles behind representing gene models in the database. All parent and child relationships are defined here. evidence: The main repository for all sequence database search results. Also, it retains information regarding gene model attributes such as the best blast match and all Pfam matches. ident: Stores attributes for the highest element of the gene component hierarchy, the transcriptional unit. Gene names, loci, EC symbols, and other attributes are available. role_link: The role category assignments for each gene are available here. Roles include examples such as ‘transcription’, ‘DNA synthesis’, ‘translation’, ‘DNA repair’, ‘amino acid metabolism’, etc.
Omniome Content, Genes Total # of genes: 132, 998 from world-wide effort. n (43, 311 TIGR projects). n 36, 274 w/ genetic names. n 15, 098 genes placed into 5, 451 paralogous families. n 413 r. RNAs. 1311 t. RNAs. 49 s. RNAs. 293 IS elements.
Omniome Content Evidence: 1073 distinct EC#s, assigned to 17308 genes n Rows of all. Vall data: 3, 996, 851 n Rows of HMM TIGRFAM data: 91, 550 n Rows of HMM Pfam data: 131, 963 n Rows of COG data: 149, 940 n Rows of Interpro data: 175, 760 n Rows of Prosite data: 53, 132 n Rows of BER data: 91, 899 n
TIGRFAM Matrix
The Genome Browser: Linear Display of DNA Molecules
Genome vs. Genome Protein Hits
MUMmer: The Whole Genome Alignment Tool
Role Category Graph
Multi-Genome Query Tool Query across all genomes based on different properties n n n MW, p. I, membrane spanning regions Taxon, Paralogous families, TIGRFAMs, Role Category Best Match to: organism, locus, kingdom, etc. “Genes with >5 membrane spanning regions and MW 36, 000 -51, 000 d. ” “E. coli genes with best match to Archeoglobis involved in DNA metabolism. ”
Pseudo-Restriction Digest and Linear Depiction of Cuts
Position effect:
- Slides: 28