FASTA and BLAST Chitta Baral FASTA Basic Steps
FASTA and BLAST Chitta Baral
FASTA : Basic Steps • Step 1: – Set a word size. (usually 6 for DNA and 2 for proteins) – Make a plot. – Find the long diagonals (or high scoring regions) • Step 2: – Score the 10 best diagonal runs using a scoring matrix. (allow mismatches, end extensions, joining of two diagonals; but no gaps) – (init 1: single best sub-alignment found in this stage. ) • Step 3: – Merge non-overlapping diagonal runs to allow gaps (ins/del). – Score of joined regions = sum of individual scores – penalty – Score of the highest scoring region at the end of this step is called initn. • Step 4: – • Use a variant of Smith-Waterman algorithm on a narrow band around initn and construct an optimal alignment of this region. Modifications: – In Step 4, use a band around init 1.
BLAST: basic steps • Step 1: – Set a word size (3 for protein and 11 for DNA); Create a word list for the query sequence – Eg. qlnfsagw {ql, ln, nf, fs, sa, ag, gw} – Expand the list (using a threshold T, say 8) • • • ql: ql, qm, hl, zl ln: ln, lb nf: nf, af, ny, df, qf, ef, gf, hf, kf, sf, tf, bf, zf fs: fs, fa fn, fd, fg, fp, ft, fb, ys sa: none ag: ag gw: gw, aw, rw, nw, dw, qw, ew, hw, iw, kw, mw, pw, sw, tw, vw, bw, zw, xw Step 2 – Scan through the string and whenever a word in the list is found try to extend it in both directions (no gaps) to get to a score beyond a threshold S. While extending use a parameter L that defines how long an extension will be tried to raise the score over S. • Modifications of Step 2: – Original BLAST: extension is continued as long as the score continues to increase – Another version: extension is stopped when the accumulated score stops increasing and has just begun to fall a certain amount below the best score found. – Blast 2 (gapped BLAST) • • • Lower value of T is used After extension try to combine (allowing gaps) Find maximal scoring segment. Use Smith-Waterman algorithm around a band of this segment (as in FASTA)
Home Work (due 3/31/03) • Compare BLAST and FASTA. (Hint: Read the external pointers in the class notes page. )
- Slides: 4