Sequence Alignment Using Dynamic Programming Saurabh Sinha Dynamic

Sequence Alignment Using Dynamic Programming Saurabh Sinha

Dynamic Programming • Is not a type of programming language • Is a type of algorithm, used to solve many different computational problems • Sequence Alignment is one of these problems • We will see the algorithm in its general sense first

Manhattan Tourist Problem source 1 2 3 5 2 3 10 2 3 1 4 3 5 0 0 5 5 1 5 0 2 0 sink Find most weighted path from source to sink.

Manhattan Tourist Problem source 1 1 2 3 5 2 3 1 2 13 5 5 3 3 1 16 4 5 0 0 5 10 5 0 3 0 20 2 0 22 sink

MTP: Greedy Algorithm Is Not Optimal source 1 2 3 5 2 3 10 5 3 0 1 4 3 5 0 0 5 5 1 2 promising start, but leads to bad choices! 5 0 2 0 18 22 sink

MTP: Dynamic Programming j 0 source 1 0 i 1 1 S 0, 1 = 1 5 S 1, 0 = 5 • Calculate optimal path score for each vertex in the graph • Each vertex’s score is the maximum of the prior vertices score plus the weight of the respective edge in between

MTP: Dynamic Programming j (cont’d) 1 0 source 1 0 i 2 1 5 3 -5 1 5 3 2 2 8 S 2, 0 = 8 4 S 1, 1 = 4 3 S 0, 2 = 3

MTP: Dynamic Programming j 0 source 1 0 i 2 10 -5 1 3 3 1 5 4 5 3 -5 2 8 0 8 S 3, 0 = 8 9 S 2, 1 = 9 3 5 1 5 3 (cont’d) 2 1 13 S 1, 2 = 13 8 S 3, 0 = 8

MTP: Dynamic Programming j 0 source 1 0 i 2 13 5 -3 3 -5 8 9 0 0 0 8 -5 -5 4 3 8 10 1 5 2 3 3 -5 1 3 5 1 5 3 (cont’d) 2 1 9 S 3, 1 = 9 12 S 2, 2 = 12 8 S 1, 3 = 8

MTP: Dynamic Programming j 0 source 1 0 i 2 13 5 3 9 12 0 -5 0 8 2 3 8 8 -3 -5 0 -5 -5 4 3 8 10 1 5 2 3 3 -5 1 0 9 3 5 1 5 3 (cont’d) 2 1 9 S 3, 2 = 9 15 S 2, 3 = 15

MTP: Dynamic Programming j 0 source 1 0 i 2 3 9 12 0 15 -5 0 8 2 3 8 8 -3 -5 0 -5 13 5 1 0 0 9 Almost Done -5 4 3 8 10 1 5 2 3 3 -5 1 3 5 1 5 3 (cont’d) 2 1 9 16 S 3, 3 = 16

MTP: Dynamic Programming j 0 source 1 0 i 2 3 9 12 0 15 -5 0 8 2 3 8 8 -3 -5 0 -5 13 5 1 0 0 9 Done! -5 4 3 8 10 1 5 2 3 3 -5 1 3 5 1 5 3 (cont’d) 2 1 9 16 S 3, 3 = 16

MTP Dynamic Programming: Formal Description Computing the score for a point (i, j) by the recurrence relation: si, j = max si-1, j + weight of the edge between (i-1, j) and (i, j) si, j-1 + weight of the edge between (i, j-1) and (i, j)

Applying Dynamic Programming to Sequence Alignment

Representing alignments Alignment : 2 x k matrix ( k m, n ) V = ACCTGGTAAA n = 10 W = ACATGCGTATA m = 11 V A C C T G GT A A A W A C A T GC GT A 8 2 0 1 matches mismatches deletions insertions

Scoring functions • A simple scoring function: • if in an alignment there are nm matches, nmis substitutions and ng gaps, the alignment score is • where wm , wmis , wg represent match score, mismatch score and gap score (penalty) respectively 16

Sequence Alignment as a MTP-like problem A C C T G G T A A C A T G C G T A 17

Sequence Alignment as a MTP-like problem A C C T G G T A A C A T G C G T Match = 20 Mismatch = -10 Gap = -20 A T A Score of path = 8 matches + 2 mismatches + 1 gap = 130 18

What alignment is this? A C C T G T A A A V A CCT GG A T GCGT A W AC G T A A C A T G C G T Match = 20 Mismatch = -10 Gap = -20 A T A Score of path = 5 matches + 2 mismatches + 7 gaps = -60 19

Sequence Alignment, formally Find the best alignment between two strings under a given scoring scheme Input : Strings v and w and a scoring schema Output : Alignment of maximum score Dynamic programming recurrence: si-1, j-1 + score (vi, wj) si, j = max si-1, j + gapscore si, j-1 + gapscore 20

Sequence Alignment: Example Calculate and show the Dynamic Programming matrix and an optimal alignment for the DNA sequences GCTTAGC and GCATTGC, scoring +3 for a match, -2 for a mismatch, and -3 for a gap 21