Using Dynamic Programming To Align Sequences Cdric Notredame
- Slides: 70
Using Dynamic Programming To Align Sequences Cédric Notredame (25/09/2021)
Our Scope Understanding the DP concept Coding a Global and a Local Algorithm Aligning with Affine gap penalties Saving memory Sophisticated variants… Cédric Notredame (25/09/2021)
Outline -Coding Dynamic Programming with Non-affine Penalties -Turning a global algorithm into a local Algorithm -Adding affine penalties -Using A Divide and conquer Strategy -Tailoring DP to your needs: -The repeated Matches Algorithm -Double Dynamic Programming Cédric Notredame (25/09/2021)
Global Alignments Without Affine Gap penalties Dynamic Programming Cédric Notredame (25/09/2021)
How To align Two Sequences With a Gap Penalty, A Substitution matrix and Not too Much Time Dynamic Programming Cédric Notredame (25/09/2021)
A bit of History… -DP invented in the 50 s by Bellman -Programming Tabulation -Re-invented in 1970 by Needlman and Wunsch -It took 10 year to find out… Cédric Notredame (25/09/2021)
The Foolish Assumption The score of each column of the alignment is independent from the rest of the alignment It is possible to model the relationship between two sequences with: -A substitution matrix -A simple gap penalty Cédric Notredame (25/09/2021)
The Principle of DP If you extend optimally an optimal alignment of two sub-sequences, the result remains an optimal alignment X Deletion X-XX XXXX Cédric Notredame (25/09/2021) + ? ? X X Alignment X Insertion
Finding the score of i, j -Sequence 1: [1 -i] -Sequence 2: [1 -j] -The optimal alignment of [1 -i] vs [1 -j] can finish in three different manners: X Cédric Notredame (25/09/2021) X X X -
Finding the score of i, j 1…i 1…j-1 + j 1…i-1 1…j-1 + i j 1…i-1 1…j + i - Cédric Notredame (25/09/2021) Three ways to 1…i build 1…j the alignment
Finding the score of i, j In order to Compute the score of 1…i 1…j All we need are the scores of: 1…i-1 1…j Cédric Notredame (25/09/2021) 1…i-1 1…j-1 1…i 1…j-1
Formalizing the algorithm F(i, j)= best F(i, j-1) + Gep 1…i 1…j-1 + X F(i-1, j-1) + Mat[i, j] 1…i-1 1…j-1 + X X 1…i-1 1…j + X - F(i-1, j) + Gep Cédric Notredame (25/09/2021)
Arranging Everything in a Table F A S T Cédric Notredame (25/09/2021) F A 1…I-1 1…I 1…J-1 1…I-1 1…I 1…J T
Taking Care of the Limits The DP strategy relies on the idea that ALL the cells in your table have the same environment… This is NOT true of ALL the cells!!!! In a Dynamic Programming strategy, the most delicate part is to take care of the limits: -what happens when you start -what happens when you finish Cédric Notredame (25/09/2021)
Taking Care of the Limits F - FA -- FAT --- - F A T F FA -- Match=2 Mis. Match=-1 Gap=-1 Cédric Notredame (25/09/2021) FAS --- F A S T 0 -1 -2 -3 -4 -1 -2 -3
Filing Up The Matrix Cédric Notredame (25/09/2021)
F - F A T 0 -1 -2 -3 -1 A -2 S -3 T Cédric Notredame (25/09/2021) -4 +2 -2 -2 -3 -3 -4 -4 -5 -2 +2 -2 +1 +1 +1 0 0 -1 -1 +4 0 0 -1 -1 -2 -3 +1 0 +4 +3 +3 +2 +2 -3 0 0 +3 +3 +2 +5 +1 -4 0 -1 +3 +2 +5
Delivering the alignment: Trace-back T S A F T - A F Score of 1… 3 Vs 1… 4 Optimal Aln Score Cédric Notredame (25/09/2021)
Trace-back: possible implementation while (!($i==0 && $j==0)) { if ($tb[$i][$j]==$sub) #SUBSTITUTION { $aln. I[$aln_len]=$seq. I[--$i]; $aln. J[$aln_len]=$seq. J[--$j]; } elsif ($tb[$i][$j]==$del) #DELETION { $aln. I[$aln_len]='-'; $aln. J[$aln_len]=$seq. J[--$j]; } elsif ($tb[$i][$j]==$ins) #INSERTION { $aln. I[$aln_len]=$seq. I[0][--$i]; $aln. J[$aln_len]='-'; } $aln_len++; } Cédric Notredame (25/09/2021)
Local Alignments Without Affine Gap penalties Smith and Waterman Cédric Notredame (25/09/2021)
Getting rid of the pieces of Junk between the interesting bits Smith and Waterman Cédric Notredame (25/09/2021)
Cédric Notredame (25/09/2021)
The Smith and Waterman Algorithm F(i-1, j) + Gep 1…i 1…j-1 + X F(i-1, j-1) + Mat[i, j] 1…i-1 1…j-1 + X X 1…i-1 1…j + X - F(i, j)= best F(i, j-1) + Gep 0 Cédric Notredame (25/09/2021)
The Smith and Waterman Algorithm F(i-1, j) + Gep F(i-1, j-1) + Mat[i, j] F(i, j)= best F(i, j-1) + Gep 0 Cédric Notredame (25/09/2021)
The Smith and Waterman Algorithm 0 Ignore The rest of the Matrix Terminate a local Aln Cédric Notredame (25/09/2021)
Filing Up a SW Matrix Cédric Notredame (25/09/2021) 0
Filling up a SW matrix: borders * C A T A N D O G 0 0 0 0 0 A 0 Cédric Notredame (25/09/2021) N 0 I 0 C 0 Easy: Local alignments NEVER start/end with a gap… A 0 T 0
Filling up a SW matrix * C A T A N D O G 0 0 0 0 0 A 0 0 2 0 0 N 0 0 0 4 2 0 0 I 0 0 0 2 2 0 0 C 0 2 0 0 0 0 E 0 0 0 0 0 C 0 2 0 0 0 0 Best Local score Beginning of the trace-back Cédric Notredame (25/09/2021) A 0 0 4 2 0 0 0 T 0 0 0 6 4 2 0 0 0
for ($i=1; $i<=$len 0; $i++) { for ($j=1; $j<=$len 1; $j++) { if ($res 0[0][$i-1] eq $res 1[0][$j-1]){$s=2; } else {$s=-1; } $sub=$mat[$i-1][$j-1]+$s; Turning $del=$mat[$i ][$j-1]+$gep; NW $ins=$mat[$i-1][$j ]+$gep; if ($sub>$del && $sub>$ins && $sub>0) into {$smat[$i][$j]=$sub; $tb[$i][$j]=$subcode; } elsif($del>$ins && $del>0 ) SW {$smat[$i][$j]=$del; $tb[$i][$j]=$delcode; } elsif( $ins>0 ) {$smat[$i][$j]=$ins; $tb[$i][$j]=$inscode; } else {$smat[$i][$j]=$zero; $tb[$i][$j]=$stopcode; } Prepare Trace back if ($smat[$i][$j]> $best_score) { $best_score=$smat[$i][$j]; $best_i=$i; $best_j=$j; } } } Cédric Notredame (25/09/2021)
A few things to remember SW only works if the substitution matrix has been normalized to give a Negative score to a random alignment. Chance should not pay when it comes to local alignments ! Cédric Notredame (25/09/2021)
More than One match… -SW delivers only the best scoring Match -If you need more than one match: -SIM (Huang and Millers) Or -Waterman and Eggert (Durbin, p 91) Cédric Notredame (25/09/2021)
Waterman and Eggert -Iterative algorithm: -1 -identify the best match -2 -redo SW with used pairs forbidden -3 -finish when the last interesting local extracted -Delivers a collection of non-overlapping local alignments -Avoid trivial variations of the optimal. Cédric Notredame (25/09/2021)
Adding Affine Gap Penalties The Gotoh Algorithm Cédric Notredame (25/09/2021)
Forcing a bit of Biology into your alignment The Gotoh Formulation Cédric Notredame (25/09/2021)
Why Affine gap Penalties are Biologically better Cost GOP GEP GOP L Afine Gap Penalty Cédric Notredame (25/09/2021) Cost=gop+L*gep Or Cost=gop+(L-1)*gep Parsimony: Evolution takes the simplest path (So We Think…)
But Harder To compute… More Than 3 Ways to extend an Alignment X-XX XXXX + ? ? X - Deletion X X Alignment X Cédric Notredame (25/09/2021) Opening Extension Opening Insertion Extension
More Questions Need to be asked For instance, what is the cost of an insertion ? 1…I-1 ? ? X 1…J-1 ? ? X 1…I ? ? 1…J-1 ? ? X GEP GOP 1…I 1…J Cédric Notredame (25/09/2021) ? ? X
Solution: Maintain 3 Tables Ix: Table that contains the score of every optimal alignment 1…i vs 1…j that finishes with an Insertion in sequence X. Iy: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an Insertion in sequence Y. M: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an alignment between sequence X and Y Cédric Notredame (25/09/2021)
The Algorithm M(i, j)= best Ix(i, j)= best Iy(i, j)= best Cédric Notredame (25/09/2021) M(i-1, j-1) + Mat(i, j) Ix(i-1, j-1) + Mat(i, j) Iy(i-1, j-1) + Mat(i, j) 1…i-1 1…j-1 M(i-1, j) + gop 1…i-1 X + 1…j X X - Ix(i-1, j) + gep 1…i-1 X + 1…j - X - M(i, j-1) + gop 1…i X + 1…j-1 X X Iy(i, j-1) + gep 1…i - + 1…j-1 X X + X X
Trace-back? Ix Iy M Start From BEST Cédric Notredame (25/09/2021) M(i, j) Ix(i, j) Iy(i, j)
Trace-back? Ix M Iy Navigate from one table to the next, knowing that a gap always finishes with an aligned column… Cédric Notredame (25/09/2021)
Going Further ? With the affine gap penalties, we have increased the number of possibilities when building our alignment. CS talk of states and represent this as a Finite State Automaton (FSA are HMM cousins) Cédric Notredame (25/09/2021)
Going Further ? Cédric Notredame (25/09/2021)
Going Further ? In Theory, there is no Limit on the number of states one may consider when doing such a computation. Cédric Notredame (25/09/2021)
Cédric Notredame (25/09/2021)
Going Further ? Imagine a pairwise alignment algorithm where the gap penalty depends on the length of the gap. Can you simplify it realistically so that it can be efficiently implemented? Cédric Notredame (25/09/2021)
Lx Ly Cédric Notredame (25/09/2021)
A divide and Conquer Strategy The Myers and Miller Strategy Cédric Notredame (25/09/2021)
Remember Not To Run Out of Memory The Myers and Miller Strategy Cédric Notredame (25/09/2021)
A Score in Linear Space You never Need More Than The Previous Row To Compute the optimal score Cédric Notredame (25/09/2021)
A Score in Linear Space R 1 R 2 For I For J R 2[i][j]=best R 2[j-1], +gep R 1[j-1]+mat R 1[j]+gep For J, R 1[j]=R 2[j] Cédric Notredame (25/09/2021)
A Score in Linear Space Cédric Notredame (25/09/2021)
A Score in Linear Space You never Need More Than The Previous Row To Compute the optimal score You only need the matrix for the Trace-Back, Or do you ? ? Cédric Notredame (25/09/2021)
An Alignment in Linear Space Forward Algorithm F(i, j)=Optimal score of 0…i Vs 0…j B(i, j)=Optimal score of M…i Vs N…j Backward algorithm B(i, j)+F(i, j)=Optimal score of the alignment that passes through pair i, j Cédric Notredame (25/09/2021)
An Alignment in Linear Space Forward Algorithm Backward algorithm Optimal B(i, j)+F(i, j) Cédric Notredame (25/09/2021)
Cédric Notredame (25/09/2021)
An Alignment in Linear Space Forward Algorithm Backward algorithm Recursive divide and conquer strategy: Myers and Miller (Durbin p 35) Cédric Notredame (25/09/2021)
An Alignment in Linear Space Cédric Notredame (25/09/2021)
A Forward-only Strategy(Durbin, p 35) Forward Algorithm -Keep Row M in memory M -Keep track of which Cell in Row M lead to the optimal score -Divide on this cell Cédric Notredame (25/09/2021)
M M Cédric Notredame (25/09/2021)
An interesting application: finding sub-optimal alignments Forward Algorithm Backward algorithm Cédric Notredame (25/09/2021) Sum over the Forw/Bward and identify the score of the best aln going through cell i, j
Application: Non-local models Double Dynamic Programming Cédric Notredame (25/09/2021)
Outline The main limitation of DP: Context independent measure Cédric Notredame (25/09/2021)
Double Dynamic Programming High Level Smith and Waterman Dynamic Programming Score=Max { S(i-1, j-1)+RMSd score S(i, j-1)+gp RMSd Score 1 13 12 9 Cédric Notredame (25/09/2021) Rigid Body Superposition where i and j are forced together 1 14 13 5 8
Double Dynamic Programming Cédric Notredame (25/09/2021)
Application: Repeats The Durbin Algorithm Cédric Notredame (25/09/2021)
Cédric Notredame (25/09/2021)
In The End: Wraping it Up Cédric Notredame (25/09/2021)
Dynamic Programming Needleman and Wunsch: Delivers the best scoring global alignment Smith and Waterman: NW with an extra state 0 Affine Gap Penalties: Making DP more realistic Cédric Notredame (25/09/2021)
Dynamic Programming Linear space: Using Divide and Conquer Strategies Not to run out of memory Double Dynamic Programming, repeat extraction: DP can easily be adapted to a special need Cédric Notredame (25/09/2021)
- Cedric notredame
- Greedy programming vs dynamic programming
- Binomial coefficient using dynamic programming
- Predictive analytics staffing
- Align rt
- Adm estimator is what application
- Align asset
- Imgt domain gap align
- Textdecoration none
- Image url to text
- Vertical-align
- Text align css
- Vertical-align trong css
- Vertical align
- Vertical align
- Vertical align
- Bleu
- Html table parameters
- Realign study
- Wmic partition
- Tinkercad align
- Transferered
- Tabulation dynamic programming
- Matrix chain multiplication algorithm
- A b a b c d e
- Dynamic programming example
- Divide and conquer
- Elements of dynamic programming
- Dynamic programming vs divide and conquer
- Reliability design in dynamic programming
- Dynamic programming excel
- Fibonacci sequence dynamic programming
- Egg drop algorithm
- Divide and conquer and greedy method
- Dynamic programing
- 4d3d41669541f1bf19acde21e19e43d23ebbd23b
- Multistage graph dynamic programming
- Dynamic c programming
- Assignment problem dynamic programming
- Advantages of dynamic programming
- Advanced dynamic programming
- Knapsack dynamic programming
- Dynamic programming paradigm
- Principle of optimality in algorithm
- Dynamic programming paradigm
- Disadvantages of dynamic array
- Gerrymandering dynamic programming
- Stage coach problem
- Rna secondary structure dynamic programming
- Recursive thought
- Minimum weight triangulation dynamic programming
- Manhattan tourist problem dynamic programming
- Dynamic programming recursion example
- Discrte
- Dynamic programming slides
- Bh&m
- Dynamic programming equation
- Dynamic programming
- Dynamic programming
- Characteristics of dynamic programming
- Dynamic programming in excel
- Levenshtein distance for oslo-snow
- Dynamic programming
- Dynamic programming
- Rna secondary structure dynamic programming
- Dynamic programming
- Binomial coefficient excel
- Gap strategy in dynamic programming
- Programingz
- Dynamic programming history
- Dynamic programming matrix chain multiplication