Strings and Pattern Matching Algorithms Pattern P0 m1

  • Slides: 7
Download presentation
Strings and Pattern Matching Algorithms Pattern P[0. . m-1] Text T[0. . n-1] Brute

Strings and Pattern Matching Algorithms Pattern P[0. . m-1] Text T[0. . n-1] Brute Force Pattern Matching Algorithm Brute. Force. Match(T, P): Input: Strings T with n characters and P with m characters Output: String index of the first substring of T matching P, or an indication that P is not a substring of T for i: =0 to n-m do //for each candidate index in T do // { j: =0 while (j<m and T[i+j]=P[j]) do j: =j+1 if j=m then return i } return “ there is no substring of T matching P. ” Time complexity: O(mn)

Boyer-Moore Algorithm Improve the running time of the brute-force algorithm by adding two potentially

Boyer-Moore Algorithm Improve the running time of the brute-force algorithm by adding two potentially timesaving heuristics: Looking-Glass Heuristics: When testing a possible placement of P[0. . m-1] against T[0. . n-1], begin the comparisons from the end of P and move backward to the front of P. Character-Jump Heuristic: Suppose that T[i] does not match P[j] and T[i]=c. If c is not contained anywhere in P, then shift P completely past T[i], otherwise, shift P until an occurrence of character c in P gets aligned with T[i]. last(c): if c is in P, last(c) is the index of the last (rightmost) occurrence of c in P. Otherwise, define last(c)=1. Compute-Last-Occurrence(P, m, Σ) for each character c in Σ do last(c) : = -1 for j : = 0 to m-1 do last(P[j]) : = j Example: P[0. . 5] = abacab c a b c d last(c) 4 5 3 -1 Time complexity: O(m+ |Σ|)

Algorithm BMMatch(T, P) Input: Strings T with n characters and P with m characters

Algorithm BMMatch(T, P) Input: Strings T with n characters and P with m characters Output: String index of the first substring of T matching P, or an indication that P is not a substring of T Compute-Last-Occurrence(P, m, Σ) i: = m-1 j: = m-1 …………. a…………. . repeat { if P[j] = T[i] then …a………b… if j=0 then m-j-1 m-last(T[i])-1 return i //a match!// …a………b… else i: = i-1 j: = j-1 else Time complexity( worst case): i: = i+(m-1)-min(j-1, last(T[i])) //jump step// O(nm+ |Σ|) j: = m-1 Example: T=aaaa…aaaa, } P=baa…a until i>n-1 Usually it runs much faster. return “ there is no substring of T matching P. ”

Knuth-Morris-Pratt Algorithm T b a c b a b a a a b c

Knuth-Morris-Pratt Algorithm T b a c b a b a a a b c b a b … P a b a b a c a In general T: xxxxxxxxxxxxxxxxxxxxxxxxx P: xxxx…………xxxxxxxx prefix suffix

Algorithm KMPPrefix. Function(P) Input: String P[1. . m] with m characters Output: The prefix

Algorithm KMPPrefix. Function(P) Input: String P[1. . m] with m characters Output: The prefix function pre for P, which maps j to the length of the longest prefix of P that is a suffix of P[1. . j]. k: = 0 pre(1): = 0 for q : = 2 to m do while k > 0 and P[k+1] P[q] do k : = pre(k) if P[k+1]= P[q] then k : = k+1 pre(q): = k return pre k: index of the last character in the prefix Example i 1 2 3 4 5 6 7 8 9 10 P[i] a b a b c a pre(i) 0 0 1 2 3 4 5 6 0 1 Time complexity: O(m)

 Algorithm KMPMatch(T, P) Input: Strings T[1. . n] with n characters and P[1.

Algorithm KMPMatch(T, P) Input: Strings T[1. . n] with n characters and P[1. . m] with m characters Output: String index of the first substring of T matching P, or an indication that P is not a substring of T pre: = KMPPrefix. Function(P) j: =0 for i: = 1 to n do while j>0 and P[j+1] ≠ T[i] do j : = pre(j) if P[j+1] = T[i] then j : = j+1; if j = m then print “Pattern occurs with shift” i-m; //a match!// j : = pre(j) // look for the next match// Time complexity: O(m+n)

Assignment (1) How many character comparisons will be Boyer-Moore algorithm make in searching for

Assignment (1) How many character comparisons will be Boyer-Moore algorithm make in searching for each of the following patterns in the binary text? Text: repeat “ 01110” 20 times Pattern: (a) 01111, (b) 01110 (2) (i) Compute the prefix function in KMP pattern match algorithm for pattern ababbababbabb when the alphabet is ∑ = {a, b}. (ii) How many character comparisons will be KMP pattern match algorithm make in searching for each of the following patterns in the binary text? Text: repeat “ 010011” 20 times Pattern: (a) 010010, (b) 010110