NCBI Field Guide NCBI Molecular Biology Resources A
NCBI Field. Guide NCBI Molecular Biology Resources A Field Guide part 1 September 29, 2004 ICGEB
• Primary Databases – Original submissions by experimentalists – Database staff review and may organize the data, but we don’t add/modify additional information – Records are “owned” and updated by their authors • Examples: Gen. Bank, SNP, GEO • Derivative Databases – Human-curated (compilation and correction of data) Ø Examples: Gene(Locus. Link), Structure & Literature databases – Computationally-Derived Ø Example: Uni. Gene – Combination Ø Examples: Ref. Seq, Genome Assembly, Domain databases NCBI Field. Guide Types of Databases
Gen. Bank genomes transcripts proteins NCBI Field. Guide NCBI’s Derivative Sequence Database
– Forming the “best representative” sequence – Standardizing nomenclature and record structure – Adding annotation (references, sequence features) NCBI Field. Guide RELEASE 6 IS NOW AVAILABLE ON THE FTP SITE!
NCBI Field. Guide Ref. Seq Curation Processes Curated genomic DNA (NC, NT, NW) Scanning. . Curated Model m. RNA (XM) (XR) Curated m. RNA (NM) (NR) Model protein (XP) Protein (NP)
NCBI Field. Guide Ref. Seq Chromosomes: NC_ LOCUS NC_000913 4639221 bp DNA circular BCT 30 -JUL-2003 DEFINITION Escherichia coli K 12, complete genome. ACCESSION NC_000913 VERSION NC_000913. 1 GI: 16127994 gene 3954631. . 3956478 KEYWORDS. /gene="mut. L" SOURCE Escherichia coli K 12. /locus_tag="b 4170" ORGANISM Escherichia coli K 12 BASE COUNT 978672 a 1011074 c 997153 g 974742 t Enterobacteriales; /note="synonym: mut-25" Bacteria; Proteobacteria; Gammaproteobacteria; ORIGIN Enterobacteriaceae; Escherichia. CDS 3954631. . 3956478 REFERENCE /gene="mut. L" 1 (bases 1 to 4639221) 1 cgtcttcatt gtcagacagc agaatttgta cgcgctgttc ggcttgttgt aatttggcct AUTHORS /locus_tag="b 4170" Blattner, F. R. , Plunkett, G. III, acgccgcgtt Bloch, C. A. , cgaactcgtt Perna, N. T. , cagcgcctct Burland, V. , tccagcggca 61 gcccctgacg tgccagctgc Riley, M. , Collado-Vides, J. , Glasner, J. D. , Rode, C. K. , Mayhew, G. F. , 121 ggtcgccact ttccagacggmismatch gttacaatct gttccagctcagcgcc ttttcaaagc /function="methyl-directed repair" Gregor, J. , Davis, N. W. , Kirkpatrick, H. A. , Goeden, M. A. , Rose, D. J. , 181 tggcgc /codon_start=1 Mau, R. and Shao, Y. ctcatttttc ttcggcataa tgaatgtctg actctcaata tttttcgccc 241 complete cgtcatggta aaataacgcg caatggtaag gtgatgtgca /transl_table=11 TITLE The genomeacggactcag sequence of ggcaaatagc Esherichia coli K 12. 301 cagcaaagcg tatacttccg cgcctggatg cagccgcagg tgtgggctgc JOURNAL /product="Mut. L" Science 277 (5331), atgttagtgg 1453 -1474 (1997) MEDLINE /protein_id="NP_418591. 1" 97426617 361 tgtatttttc cctatacaag tcgcttaagg cttgccaacg aaccattgcc gccatgaagt PUBMED /db_xref="GI: 16131992" 9278503 421 ttatcattaa attgttcccg gaaatcacca tcaaaagcca atctgtgcgc ttgcgcttta REFERENCE 2 (bases 1 to 4639221) 481 taaaaatcct taccgggaac attcgtaacg ttttaaagca ctatgatgag acgctg /translation="MPIQVLPPQLANQIAAGEVVERPASVVKELVENSLDAGATRIDI AUTHORS Blattner, F. R. 541 tcgtccgcca DIERGGAKLIRIRDNGCGIKKDELALALARHATSKIASLDDLEAIISLGFRGEALASI TITLE Direct submission ctgggataac atcgaagttc gcgcaaaaga tgaaaaccag cgtctggcta 601 ttcgcgacgc tctgacccgt attccgggta tccaccatat gaagacgtgc JOURNAL SSVSRLTLTSRTAEQQEAWQAYAEGRDMNVTVKPAAHPVGTTLEVLDLFYNTPARRKF Sumbitted (16 -JAN-1997) Guy Plunkett III, Laboratory of tctcgaagtc Genetics, 661 cgtttaccga catgcacgat attttcgaga aagcgttggt tcagtatcgc gatcagctgg University of Wisconsin, 445 Henry Mall, Madison, WI 53706, USA. LRTEKTEFNHIDEIIRRIALARFDVTINLSHNGKIVRQYRAVPEGGQKERRLGAICGT E-mail ecoli@genetics. wisc. edu 608 -262 -2543 Fax: acatgatttt agctcgattg 721 aaggcaaaac cttctgcgta Phone: cgcgtgaagc gccgtggcaa AFLEQALAIEWQHGDLTLRGWVADPNHTTPALAEIQYCYVNGRMMRDRLINHAIRQAC Annotation of sequence Genome Gene, CDS, 781 atgtggaacg ttacgtcggc ggcggtttaa atcagcatat tgaatccgcgtgaagc EDKLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQSRLVHDFIYQGVLSVLQQQL and other features 841 tgaccaatcc ggatgtgact gtccatctgg aagtggaaga cgatcgtctc ctgctgatta ETPLPLDDEPQPAPRSIPENRVAAGRNHFAEPAAREPVAPRYTPAPASGSRPAAPWPN 901 aaggccgcta cgaaggtatt ggcggtttcc cgatcggcac ccaggaagat gtgctgtcgc AQPGYQKQQGEVYRQLLQTPAPMQKLKAPEPQEPALAANSQSFGRVLTIVHSDCALLE 961 tcatttccgg tggtttcgac tccggtgttt ccagttatat gttgatgcgt cgcggctgcc RDGNISLLSLPVAERWLRQAQLTPGEAPVCAQPLLIPLRLKVSAEEKSALEKAQSALA ELGIDFQSDAQHVTIRAVPLPLRQQNLQILIPELIGYLAKQSVFEPGNIAQWIARNLM SEHAQWSMAQAITLLADVERLCPQLVKTPPGGLLQSVDLHPAIKALKDE"
NCBI’s Derivative Sequence Database Ref. Seq Benefits • • NCBI Field. Guide Ref. Seq: Non-redundant Explicitly linked nucleotide and protein sequences Updated to reflect current sequence data and biology Validated by hand Format consistency Distinct accession series Stewardship by NCBI staff and collaborators ftp: //ftp. ncbi. nih. gov/refseq/release
Genes: The Gene Summary Database Summary pages of curated information about genetic loci for organisms in the Ref. Seq project. ►Graphics ►Gene information ►Bibliography (Pub. Med links) ►General gene information ►NCBI Reference Sequences ►Related sequences ►Additional Links NCBI Field. Guide Announcing!
NCBI Field. Guide Entrez Gene
NCBI Field. Guide
NCBI Field. Guide NM/NP Records in Entrez Gene
Clustering Expressed Sequences • Records are clusters of m. RNAs and ESTs that ideally represent single genes • Records are created automatically by a modified BLAST algorithm • Uni. Gene provides a means to identify an EST or unannotated m. RNA NCBI Field. Guide Uni. Gene
A Cluster of ESTs: NCBI Field. Guide Arabidopsis serine protease query 5’ EST hits 3’ EST hits Sequence & Expression
Chordata Mammalia Bos taurus (cow) Canis familiaris (dog) Homo sapiens (human) Mus musculus (mouse) Ovis aries (sheep) Rattus norvegicus (rat) Sus scrofa (pig) Aves Gallus gallus (chicken) Amphibia Xenopus laevis (african clawed frog) Xenopus tropicalis (western clawed frog) Actinopterygii Danio rerio (zebra fish) Oncorhynchus mykiss (rainbow trout) Oryzias Latipes (japanese rice fish) Salmo salar (salmon) Ascidiacea Ciona intestinalis (sea squirt) Arthropoda Insecta Anopheles gambiae (malaria mosquito) Apis mellifera (honeybee) Drosophila melanogaster (fruit fly) Bombyx mori (silkworm) Echinodermata Echinoidea Strongylocentrotus purpuratus Nematoda Chromadorea Caenorhabditis elegans Platyhelminthes Trematoda Schistosoma mansoni As of July 2004 Embryophyta Cycadopsida Pinus taeda (loblolly pine) Bryopsida Physcomitrella patens Eudicotyledons Arabidopsis thaliana (thale cress) Glycine max (soybean) Helianthus annus (sunflower) Lactuca sativa (lettuce) Lotus corniculatus (lotus flower) Lycopersicon esculentum (tomato) Malus x domestica (apple) Medicago truncatula (barrel medic) Populus tremula/tremuloides (poplar) Solanum tuberosum (potato) Vitis vinifera (wine grape) Liliopsida Hordeum vulagre (barley) Oryza sativa (rice) Saccharum officinarum (noble cane) Sorghum bicolor (sorghum) Triticum aestivum (bread wheat) Zea mays (corn) Mycetozoa Dictyosteliida Dictyostedlium discoideum (slime mold) Chlorophyta Chlorophycaea Chlamydomonas reinhardii Apicomplexa Coccidia Toxoplasma gondii NCBI Field. Guide Uni. Gene Collections
by link by Entrez search NCBI Field. Guide Finding Uni. Gene Clusters
NCBI Field. Guide Uni. Gene Cluster for PRNP
as of June 2004 Organelles: – Mitochondria (558) Viruses (1923) – Plastids (40) Archaebacteria (44) – Plasmids (626) Eubacteria (176) – Nucleomorphs (3) Eukaryotes (61) NCBI Field. Guide Complete Genomes
• Full chromosomal sequences are provided • Genes are annotated • The annotation can be shown graphically and linked to sequence records NCBI Field. Guide Simple Genomes
NCBI Field. Guide
NCBI Field. Guide mut. L
• Sequences are provided complete or we help assemble • Heavy annotation: Genes, transcript regions & ORFs, sequence variations & markers, clones, ESTs, etc. • The annotation can be shown graphically and linked to other databases using the Map. Viewer NCBI Field. Guide Complex Genomes
NCBI Map Viewer • Map Viewer Home Page • Shows all supported organisms • Provides links to genomic BLAST – Genome Overview Page • Provides links to individual chromosomes • Shows hits on a genome graphically – Chromosome Viewing Page • Allows interactive views of annotation details • Provides numerous maps unique to each genome NCBI Field. Guide Viewing Complex Genomes
NCBI Field. Guide Map Viewer Home Page
Search the maps Genomic BLAST Species-specific help! NCBI Field. Guide Genome Overview Page
PRNP NCBI Field. Guide Search For Human PRNP
NCBI Field. Guide Human PRNP on Genome View
Map Summary Add or remove maps Master Map with exploded content Genes Uni. Gene Zooming Controls Clone NCBI Field. Guide Chromosome Viewing Page
Left click NCBI Field. Guide Zooming in…
Link to Protein Evidence Viewer Homologene Link to OMIM Sequence Viewer Download Sequence Model. Maker NCBI Field. Guide Map Viewer Analysis Tools
NCBI Field. Guide Homologene
NCBI Field. Guide Homology Comparisons on Map Viewer
NCBI Field. Guide Intermission
- Slides: 32