Longest common subsequence LCS The longest common subsequence
Longest common subsequence (LCS)
The longest common subsequence (LCS) problem A string : A = b a c a d A subsequence of A: deleting 0 or more symbols from A (not necessarily consecutive). e. g. ad, ac, bac, acad, bcd. Common subsequences of A = b a c a d and B = a c c b a d c b : ad, ac, bac, acad. The longest common subsequence (LCS) of A and B: a c a d. Subsequence need not be consecutive, but must be in order. 7 -2
Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC
Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC
Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC
Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC
Longest common subsequence INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC
Naïve /Brute Force Algorithm For every subsequence of X, check whether it’s a subsequence of Y. ◦ X has 2 m subsequences. ◦ Each subsequence takes Θ(n) time to check: scan Y for first letter, for second, and so on. Time: Θ(n 2 m).
Optimal Substructure Theorem Let Z = z 1, . . . , zk be any LCS of X x 1, . . , xk and Y y 1, . . , yk . 1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1. 2. If xm yn, then either zk xm and Z is an LCS of Xm-1 and Y. 3. or zk yn and Z is an LCS of X and Yn-1.
Optimal Substructure Theorem Let Z = z 1, . . . , zk be any LCS of X and Y. 1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1. Proof: (case 1: xm = yn) Any sequence Z’ that does not end in xm = yn can be made longer by adding xm = yn to the end. Therefore, (1) longest common subsequence (LCS) Z must end in xm = yn. (2) Zk-1 is a common subsequence of Xm-1 and Yn-1, and (3) there is no longer CS of Xm-1 and Yn-1, or Z would not be an LCS.
Optimal Substructure Theorem Let Z = z 1, . . . , zk be any LCS of X and Y. 2. If xm yn, then either zk xm and Z is an LCS of Xm-1 and Y. Proof: (case 2: xm yn, and zk xm) Since Z does not end in xm, (1) Z is a common subsequence of Xm-1 and Y, and (2)there is no longer CS of yn-1 and x, or Z would not be an LCS.
Optimal Substructure Theorem Let Z = z 1, . . . , zk be any LCS of X and Y. 2. If xm yn, then zk yn and Z is an LCS of X and Yn-1. Proof: (case 2: xm yn, and zk yn) Since Z does not end in yn, (1) Z is a common subsequence of yn-1 and x, and (2)there is no longer CS of Xm-1 and Y, or Z would not be an LCS.
Recursive Solution Sequences x 1, …, xn, and y 1, …, ym LCS(i, j) = length of a longest common subsequence of x 1, …, xi and y 1, …, yj if xi = yj then LCS(i, j) =
Recursive Solution Sequences x 1, …, xn, and y 1, …, ym LCS(i, j) = length of a longest common subsequence of x 1, …, xi and y 1, …, yj if xi = yj then LCS(i, j) = 1 + LCS(i-1, j-1)
Recursive Solution Sequences x 1, …, xn, and y 1, …, ym LCS(i, j) = length of a longest common subsequence of x 1, …, xi and y 1, …, yj if xi yj then LCS(i, j) = max (LCS(i-1, j), LCS(i, j-1)) xi and yj cannot both be in LCS
Recursive Solution Sequences x 1, …, xn, and y 1, …, ym LCS(i, j) = length of a longest common subsequence of x 1, …, xi and y 1, …, yj if xi = yj then LCS(i, j) = 1 + LCS(i-1, j-1) if xi yj then LCS(i, j) = max (LCS(i-1, j), LCS(i, j-1))
Recursive Solution Define c[i, j] = length of LCS of Xi and Yj. We want c[m, n].
Constructing an LCS
LCS Example We’ll see how LCS algorithm works on the following example: X = ABCB Y = BDCAB What is the Longest Common Subsequence of X and Y? LCS(X, Y) = BCB X=AB C B Y= BDCAB
j i 0 Xi 1 A 2 B 3 C 4 B 0 Yj 1 B 2 D X = ABCB; m = |X| = 4 Y = BDCAB; n = |Y| = 5 Allocate array c[5, 4] 3 C 4 A ABCB BDCAB 5 B
j i ABCB BDCAB 5 0 Yj 1 B 2 D 3 C 4 A B 0 0 0 Xi 0 1 A 0 2 B 0 3 C 0 4 B 0 for i = 1 to m for j = 1 to n c[i, 0] = 0 c[0, j] = 0
j i ABCB BDCAB 5 0 Yj 1 B 2 D 3 C 4 A B 0 0 0 Xi 0 0 1 A 0 0 2 B 0 3 C 0 4 B 0 if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j i ABCB BDCAB 5 0 Yj 1 B 2 D 3 C 4 A B 0 0 0 Xi 0 0 1 A 0 0 2 B 0 3 C 0 4 B 0 if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 2 B 0 3 C 0 4 B 0 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 3 C 0 4 B 0 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 3 C 0 4 B 0 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 1 3 C 0 4 B 0 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 3 C 0 4 B 0 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 2 3 C 0 4 B 0 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 2 3 C 0 1 1 4 B 0 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 2 3 C 0 1 1 2 4 B 0 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 2 3 C 0 1 1 2 2 4 B 0 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 2 3 C 0 1 1 2 2 2 4 B 0 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 2 3 C 0 1 1 2 2 2 4 B 0 1 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 2 3 C 0 1 1 2 2 2 4 B 0 1 1 2 2 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 2 3 C 0 1 1 2 2 2 4 B 0 1 1 2 2 3 i ABCB BDCAB if ( Xi == Yj ) c[i, j] = c[i-1, j-1] + 1 b[i, j] = “ ” if ( c[i-1, j] >= c[i, j] ) c[i, j] = c[i-1, j] b[i, j] = “ ” else c[i, j] = c[i, j-1] b[i, j] = “ ”
Computing the Length of an LCS
Analysis of an LCS O(m*n) since each c[i, j] is calculated in constant time, and there are m*n elements in the array Running time and memory: O(mn) and O(mn).
How to find actual LCS j 0 1 2 i Yj B D 0 Xi 0 0 1 A 0 2 B 3 4 5 C A B 0 0 0 0 1 1 1 2 3 C 0 1 1 2 2 2 4 B 0 1 1 2 2 3 1
Finding LCS j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 2 3 C 0 1 1 2 2 2 4 B 0 1 1 2 2 3 i
How to find actual LCS j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 2 3 C 0 1 1 2 2 2 4 B 0 1 1 2 2 3 i • Initial call is PRINT-LCS (b, X, m, n). • When b[i, j ] = , we have extended LCS by one character. So LCS = • Time: O(m+n) (no of characters to be checked) entries with in them.
j 0 Yj 1 B 2 D 3 C 4 A 5 B 0 Xi 0 0 0 1 A 0 0 1 1 2 B 0 1 1 2 3 C 0 1 1 2 2 2 4 B 0 1 1 2 2 3 i LCS (reversed order): B C B LCS (straight order): B C B (this string turned out to be a palindrome)
LCS Example We’ll see how LCS algorithm works on the following example: X = PRESIDENT Y = PROVIDENCE What is the Longest Common Subsequence of X and Y?
Output: PRIDEN
- Slides: 45