Basics of BLAST w Basic BLAST Search What
Basics of BLAST w Basic BLAST Search - What is BLAST? - The framework of BLAST - Different protocols of BLAST - The database you can search - Where can I BLAST?
What is BLAST? w BLAST stands for Basic Local Alignment Search Tool w NCBI- BLAST vs. WU- BLAST w BLAST ! = BLAT w Why is BLAST popular? - Good balance of sensitivity and speed - Reliability - Flexibility
The Framework of BLAST (1) w Scoring matrix - high BLOSUM (low PAM) closely related sequences - low BLOSUM (high PAM) distantly related sequences - Default is BLOSUM 62
BLOSUM Matrix w BLOSUM = BLOcks SUbstitution Matrix http: //helix. biology. mcmaster. ca/721/distance/node 10. html
The Framework of BLAST (2) w Sequence Alignment - W (word size); blastn 11; others 2 or 3 - G (gap open penalty); blastn 5; others 11 - E (gap extension penalty); blastn 2; others 1 w Statistic Interpretation - e (threshold for expectation value) default 10
BLAST Protocols w The most common BLAST search includes five protocols: Program BLASTN Database Nucleotide Query Nucleotide BLASTP Protein BLASTX Protein Nt. Protein TBLASTN Nt. Protein TBLASTX Nt. Protein
BLASTN w BLASTN - The query is a nucleotide sequence. - The database is a nucleotide database - No conversion is done on the query or database w DNA : : DNA homology - Mapping oligos to a genome - Cross-species sequence exploration - Annotating genomic DNA with ESTs
BLASTP w BLASTP - The query is an amino acid sequence - The database is an amino acid database - No conversion is done on the query or database w Protein : : Protein homology - Protein function exploration - Novel gene makes parameters more sensitive
BLASTX w BLASTX - The query is a nucleotide sequence - The database is an amino acid database - All six reading frames are translated on the query and used to search the database w Coding nucleotide seq : : Protein homology - Gene finding in genomic DNA - Annotating ESTs (and Shotgun Sequence)
TBLASTN w TBLASTN - The query is an amino acid sequence - The database is a nucleotide database - All six frames are translated in the database and searched with the protein sequence w Protein : : Coding Nucleotide DB homology - Mapping a protein to a genome - Mining ESTs (Shotgun DNA) for protein similarities
TBLASTX w TBLASTX - The query is a nucleotide sequence - The database is a nucleotide database - All six frames are translated on the query and on the database w Coding : : Coding homology - For searching distantly related species - Sensitive but expensive
BLAST output w List of Sequences with scores – Raw score, higher is better (length dependant) – Expect Value, smaller is better (length and database size independent) • List of alignments
The Databases (1) w Genbank NR (protein and nucleotide versions) Non-redundant large databases (compile and remove dups) Anyone can submit, you can call your sequence anything Quality low; names can be meaningless w EST databases Short single reads of c. DNA clones Other short single reads High error rates
The Databases (2) w Swissprot Curated from literature REAL proteins; REAL functions; small; w Genomic Databases Human, Mouse, Drosophila, Arabidopsis etc NCBI, species specific web pages
Where Can I run BLAST? w Three choices: – NCBI (www. ncbi. nih. gov) databases updated constantly (daily); very slow at times – Goose web (goose. wustl. edu/blast. html) –command line (blastall)
- Slides: 15