Gen Bank Nucleotide only sequence database Archival in
Gen. Bank • Nucleotide only sequence database • Archival in nature • Data shared nightly among three collaborating databases • Gen. Bank at NCBI • DNA Database of Japan (DDBJ) • EMBL at EBI
Source NCBI
NCBI site map: A good place to find resources http: //www. ncbi. nlm. nih. gov/Sitemap/index. html
Gene. Bank Release 131. 0 December 15 2003 • 30968418 • 36553368485 Sequences Bases • full release every two months • incremental and cumulative updates daily • available only through internet ftp: //ftp. ncbi. nih. gov/genbank/
Gen. Bank Record Ø Header information that apply to the whole record Ø Features annotations on the record Ø Sequence
Header Gene. Bank Record modification date Molecule Type Locus Name Sequence Length Accession Number Version Number Modification Date Gen. Bank Division
FEATURE Link to Seq Gene. Bank Record
Sequence Gen. Bank Record
Entrez
http: //www. ncbi. nlm. nih. gov/gquery. fcgi Select Gen. Bank Entrez
Find m. RNA sequence for human “epidermal growth factor receptor”
Specify human as an organism : Click Preview/Index Specify “human” by selecting “Organisms” from “All Fields” dropdown menu
2 1
Limit your search Select m. RNA in the “Molecule” list Select “Refseq” in the database list Exclude all technology generated records
Ref. Seq l Database of reference sequences l Curated l Non-redundant; one record for each gene, or each splice variant, from each organism represented l Each record is intended to present an encapsulation of the current understanding of a gene or protein, similar to a review article Ref. Seq FAQ
Molecular databases
Find Gene Name by searching Locus. Link : http: //www. ncbi. nlm. nih. gov/Locus. Link/ Select organism
Locus. Link
Find m. RNA sequence for epidermal growth factor receptor (EGFR): Starts with gene name EGFR Limit search to 1. Gene Name 2. exclude all technology generated records 3. Select m. RNA as Molecule 4. Select “Refseq” as source database
Entrez: Neighbors and Hard Links Word weight Pub. Med abstracts Phylogeny 3 -D 3 -D Structure Taxonomy VAST Genomes BLAST Nucleotide sequences Protein sequences BLAST Source NCBI
SRS – List of Public SRS Servers
SRS – List of Public SRS Servers
SRS Tutorial
http: //srs. ebi. ac. uk Permanent session Temporary session List of public servers Database Information -which are present -when indexed Documentation
What is SRS? Central resource for molecular biology data l Data retrieval system l - more than 250 databanks have been indexed. More than 35 SRS servers over the WWW l Data analysis applications server - 11 protein applications - 6 nucleic acid applications - Uniform query interface on the web
History of SRS l 1990 - Main author Dr. Thure Etzold – Development started in EMBL, Heidelberg l 1997 – Moved to EBI in Cambridge. Development work was supported by various grants amongst others from the EMBnet. l 1998 – Etzold and his group join Lion. Biosciences
Why SRS? l Information retrieval – Easy way to retrieve information from sequence and sequence-related databases – Possibility to search for multiple words/other criteria l Linkage between different databases – E. g. Find all primary structures with known three-dimensional structure l. . . and much more
Philosophy of SRS Original database file -plain text, html, xml parsed Index file Data Retrieval Searchable links between database entries
The Library Select Page Workbenches Query Forms Libraries L i b r a r y g r o u p s
SRS main toolbar tabs l l l Top Page: displays databases in different database groups Query: displays either the standard or extended query form Results or “the query manager”: maintains a history of all the results obtained during a session Projects or “the project manager”: maintains a history of all queries and views used during a session Views: allows a user to define a user specific view for one or more databases Databanks: contains a list and some facts about the databases available in the system
Search terms in SRS l SRS indexed fields can be searched using any of the following: – Single word search – Multiple word phrases – Numbers and dates – Regular expressions – Wildcards
- Slides: 31