Approximate String Matching Optimal Sequence Alignment Global alignment

Approximate String Matching Optimal Sequence Alignment

Global alignment Structural formula Initialization 0 -1 -2 -3 -4 -5 -1 -2 -3 The value of the optimal alignment: V* = V(m, n) -4 -5 V*

Free ends alignment Structural formula Initialization 0 • 0 0 0 0 How does the alignment looks like? 0 0 0 V*

Finding approximate occurrences of P in T Structural formula T 0 Initialization 0 0 0 -1 -2 P -3 -4 V* is the maximal value over all cells in row n -5 V* 0 0

Problem: Find the optimal approximate occurrence of P in T aligned to the shortest sub-sequence of T Example: P = ABCD T = XXAXXBCDXX M=1 I, D, R=-1 XXAXXBCDXX ABCD XXAXXBCDXX A__BCD v* = 2 v*=2

Solution: Always choose the right-most arrow, when back-tracing the green alignment graph T 0 0 0 0 -1 -2 P -3 -4 -5 3 0

Local alignment Structural formula Initalization 0 0 0 4 0

Local alignment - example Scores: M=1 R/D/I = -1 X Y Z X X X A B C D E 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 2 1 0 0 0 0 1 1 A B D F

Problem 1 •

Solution 1. Run the algorithm for optimal local alignment with the 2 -rows method and save the value v*. Running-time: O(nm) Space: O(n+m) 2. Run the algorithm again using the q-rows method. Each time when V(i, j)=v*, try building the optimal alignment by back-tracing the green pointers in q rows and report if succeeded. Running-time: O(mn + rq) Space: O( q min(n, m) )

Solution: use a q-rows method 0 0 0 0 2 2 1 2 0 0 3 3 2 4 3 0 1 2 2 3 4 2 0 0 q

Problem 2 Assume that we know that there exists an optimal local alignment of the two substrings of length at most q << |S 1|, |S 2|. Find an optimal local alignment between S 1 and S 2, using • space O(|S 1|+|S 2|+ r +q 2) • time O(|S 1|*|S 2|+r q 2), where r is the number of distinct pairs of end-indexes of optimal local alignments

Solution 1) Run the algorithm with the two-rows method, to find V* and the list of all r pairs (i, j) such Running-time: O(nm) that V(i, j) = V*. Space: O(n+m+r) 2) For each pair (i, j) in the list, compute the local optimal alignment on the table of S 1[i-q+1 … i] and S 2[j-q+1 … j]. If the optimal value is V* , restore and report that alignment. Running-time: O(rq 2) Space: O(r+q 2)

Solution V* = 5 1 0 0 3 2 0 4 2 0 1 0 0 2 1 0 0 3 2 0 3 5 2 0 1 0 0 2 4 3 1 0 0 1 2 0 2 2 0 5 2 0 1 0 0 5 2 0

Solution q+1 4 2 1 0 0 2 3 5 2 0 3 2 2 2 0 2 1 3 3 0 0 0 q+1
- Slides: 15