The Longest Common Subsequence Problem CSE 373 Data
The Longest Common Subsequence Problem CSE 373 Data Structures CSE 373 AU 04 -- Longest Common Subsequences
Reading Goodrich and Tamassia, 3 rd ed, Chapter 12, section 11. 5, pp. 570 -574. 12/28/2021 CSE 373 AU 04 -- Longest Common Subsequences 2
Motivation • Two Problems and Methods for String Comparison: › The substring problem › The longest common subsequence problem. • In both cases, good algorithms do substantially better than the brute force methods. 12/28/2021 CSE 373 AU 04 -- Longest Common Subsequences 3
String Matching Problem • Given two strings TEXT and PATTERN, find the first occurrence of PATTERN in TEXT. • Useful in text editing, document analysis, genome analysis, etc. 12/28/2021 CSE 373 AU 04 -- Longest Common Subsequences 4
String Matching Problem: Brute-Force Algorithm For i = 0 to n – m { For j = 0 to m – 1 { If TEXT[j] PATTERN[i] then break If j = m – 1 then return i } return -1; } Suppose TEXT = 0000001 PATTERN = 0000001 This type of problem has (n 2) behavior. A more efficient algorithm is the Boyer-Moore algorithm. (We will not be covering it in this course. ) 12/28/2021 CSE 373 AU 04 -- Longest Common Subsequences 5
Longest Common Subsequence Problem • A Longest Common Subsequence LCS of two strings S 1 and S 2 is a longest string the can be obtained from S 1 and from S 2 by deleting elements. • For example, S 1 = “thoughtful” and S 2 = “shuffle” have an LCS: “hufl”. • Useful in spelling correction, document comparison, etc. 12/28/2021 CSE 373 AU 04 -- Longest Common Subsequences 6
Dynamic Programming • Analyze the problem in terms of a number of smaller subproblems. • Solve the subproblems and keep their answers in a table. • Each subproblem’s answer is easily computed from the answers to its own subproblems. 12/28/2021 CSE 373 AU 04 -- Longest Common Subsequences 7
Longest Common Subsequence: Algorithm using Dynamic Programming • For every prefix of S 1 and prefix of S 2 we’ll compute the length L of an LCS. • In the end, we’ll get the length of an LCS for S 1 and S 2 themselves. • The subsequence can be recovered from the matrix of L values. • (see demonstration) 12/28/2021 CSE 373 AU 04 -- Longest Common Subsequences 8
- Slides: 8