UMass Lowell Computer Science 91 503 Analysis of
- Slides: 36
UMass Lowell Computer Science 91. 503 Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Tuesday, 12/3/02 String Matching Algorithms Chapter 32
Chapter Dependencies Automata Ch 32 String Matching You’re responsible for material in Sections 32. 1 -32. 4 of this chapter.
String Matching Algorithms Motivation & Basics
String Matching Problem Motivations: text-editing, pattern matching in DNA sequences 32. 1 Text: array T[1. . . n] Pattern: array P[1. . . m] Array Element: Character from finite alphabet S Pattern P occurs with shift s in T if P[1. . . m] = T[s+1. . . s+m] source: 91. 503 textbook Cormen et al.
String Matching Algorithms ä Naive Algorithm ä Worst-case running time in O((n-m+1) m) ä Rabin-Karp ä Worst-case running time in O((n-m+1) m) ä Better than this on average and in practice ä Finite Automaton-Based ä Worst-case running time in O(n + m|S|) ä Knuth-Morris-Pratt ä Worst-case running time in O(n + m)
Notation & Terminology ä S* = set of all finite-length strings formed using characters from alphabet S ä Empty string: e ä |x| = length of string x ab ä w is a prefix of x: w x cca ä w is a suffix of x: w x ä prefix, suffix are transitive abcca
Overlapping Suffix Lemma 32. 1 32. 3 32. 1 source: 91. 503 textbook Cormen et al.
String Matching Algorithms Naive Algorithm
Naive String Matching worst-case running time is in Q((n-m+1)m) 32. 4 source: 91. 503 textbook Cormen et al.
String Matching Algorithms Rabin-Karp
Rabin-Karp Algorithm ä Assume each character is digit in radix-d notation (e. g. d=10) ä p = decimal value of pattern ä ts = decimal value of substring T[s+1. . s+m] for s = 0, 1. . . , n-m ä Strategy: ä compute p in O(m) time (which is in O(n)) ä compute all ti values in total of O(n) time find all valid shifts s in O(n) time by comparing p with each t s ä ä Compute p in O(m) time using Horner’s rule: ä ä ä p = P[m] + d(P[m-1] + d(P[m-2] +. . . + d(P[2] + d. P[1]))) Compute t 0 similarly from T[1. . m] in O(m) time Compute remaining ti‘s in O(n-m) time ä ts+1 = d(ts - d m-1 T[s+1]) + T[s+m+1] source: 91. 503 textbook Cormen et al.
Rabin-Karp Algorithm p, ts may be large, so use mod 32. 5 source: 91. 503 textbook Cormen et al.
Rabin-Karp Algorithm (continued) ts+1 = d(ts - d m-1 T[s+1]) + T[s+m+1] p = 31415 spurious hit source: 91. 503 textbook Cormen et al.
Rabin-Karp Algorithm (continued) source: 91. 503 textbook Cormen et al.
Rabin-Karp Algorithm (continued) d is radix q is modulus Q(m) in Q(n) high-order digit position for m-digit window Preprocessing Q(m) Q((n-m+1)m) Try all possible shifts Q(m) worst-case running time is in Q((n-m+1)m) Matching loop invariant: when line 10 executed ts=T[s+1. . s+m] mod q rule out spurious hit source: 91. 503 textbook Cormen et al.
Rabin-Karp Algorithm (continued) d is radix q is modulus Q(m) in Q(n) high-order digit position for m-digit window Preprocessing Q(m) Q((n-m+1)m) Try all possible shifts Matching loop invariant: when line 10 executed ts=T[s+1. . s+m] mod q rule out spurious hit Q(m) Assume reducing mod q is like random mapping from S* to Zq # spurious hits is in O(n/q) Estimate (chance that ts= p mod q) = 1/q Expected matching time = O(n) + O(m(v + n/q)) If v is in O(1) and q >= m (v = # valid shifts) average-case running time is in O(n+m) source: 91. 503 textbook Cormen et al.
String Matching Algorithms Finite Automata
Finite Automata 32. 6 source: 91. 503 textbook Cormen et al. Strategy: Build automaton for pattern, then examine each text character once. worst-case running time is in Q(n) + automaton creation time
Finite Automata source: 91. 503 textbook Cormen et al.
String-Matching Automaton Pattern = P = ababaca Automaton accepts strings ending in P 32. 7 source: 91. 503 textbook Cormen et al.
String-Matching Automaton Suffix Function for P: s (x) = length of longest prefix of P that is a suffix of x 32. 3 Automaton’s operational invariant 32. 4 at each step: keeps track of longest pattern prefix that is a suffix of what has been read so far source: 91. 503 textbook Cormen et al.
String-Matching Automaton Simulate behavior of string-matching automaton that finds occurrences of pattern P of length m in T[1. . n] assuming automaton has already been created. . . worst-case running time of matching is in Q(n) source: 91. 503 textbook Cormen et al.
String-Matching Automaton (continued) Correctness of matching procedure. . . 32. 2 32. 8 32. 2 source: 91. 503 textbook Cormen et al.
String-Matching Automaton (continued) Correctness of matching procedure. . . 32. 3 32. 9 32. 2 32. 1 32. 9 32. 3 source: 91. 503 textbook Cormen et al.
String-Matching Automaton (continued) Correctness of matching procedure. . . 32. 4 32. 3 source: 91. 503 textbook Cormen et al.
String-Matching Automaton (continued) source: 91. 503 textbook Cormen et al. worst-case running time of automaton creation is in O(m 3 |S|) can be improved to: O(m |S|) worst-case running time of entire string-matching strategy is in O(m |S|) + O(n) automaton creation time pattern matching time
String Matching Algorithms Knuth-Morris-Pratt
Knuth-Morris-Pratt Overview ä Achieve Q(n+m) time by shortening automaton preprocessing time below O(m |S|) ä Approach: ä don’t precompute automaton’s transition function ä calculate enough transition data “on-the-fly” ä obtain data via “alphabet-independent” pattern preprocessing ä pattern preprocessing compares pattern against shifts of itself
Knuth-Morris-Pratt Algorithm determine how pattern matches against itself 32. 10 source: 91. 503 textbook Cormen et al.
Knuth-Morris-Pratt Algorithm 32. 5 Equivalently, what is largest k < q such that Pk P q? Prefix function p shows how pattern matches against itself p(q) is length of longest prefix of P that is a proper suffix of Pq Example: source: 91. 503 textbook Cormen et al.
Knuth-Morris-Pratt Algorithm Q(m) in Q(n) Q(m+n) using amortized analysis # characters matched scan text left-to-right next character does not match Q(n) next character matches using amortized analysis Is all of P matched? Look for next match source: 91. 503 textbook Cormen et al.
Knuth-Morris-Pratt Algorithm Amortized Analysis Potential Method k = current state of algorithm Potential is never negative since p (k) >= 0 for all k Q(m) in Q(n) initial potential value potential decreases potential increases by <=1 in each execution of for loop body source: 91. 503 textbook Cormen et al. amortized cost of loop body is in O(1) Q(m) loop iterations
Knuth-Morris-Pratt Algorithm Correctness. . . source: 91. 503 textbook Cormen et al.
Knuth-Morris-Pratt Algorithm 32. 5 Correctness. . . 32. 6 32. 1 source: 91. 503 textbook Cormen et al.
Knuth-Morris-Pratt Algorithm Correctness. . . 32. 11 32. 5 source: 91. 503 textbook Cormen et al.
Knuth-Morris-Pratt Algorithm 32. 6 Correctness. . . 32. 5 32. 7 32. 6 source: 91. 503 textbook Cormen et al.
- Umass lowell computer science masters
- Uml political science
- Umass lowell police department
- Umass lowell nuclear engineering
- Umass lowell bcba
- Umass lowell difference maker
- Aci 224
- Cpsc 503
- Nacr-503
- Rule 503 commissions and referral fees
- Humiseal 1063
- Champion equality diversity and inclusion
- 4837 to the nearest 100
- Heamop
- 2 687 in scientific notation
- H 503
- 503 divided by 3
- Umass polymer science
- Paragraph on my favourite subject science
- Lowell stott
- Lowell system
- Lowell system
- George ronald york and james douglas latham
- Francis scott lowell
- Amy lawrence wikipedia
- Naviance lowell
- The first snowfall by james russell lowell summary
- Patent foramen ovale
- What did francis lowell do
- Lowell joint elementary
- Prayer guide: a manual for leading prayer lowell snow
- Robert lowell skunk hour
- Lowell
- Amy lowell pronunciation
- Amy lowell pronunciation
- Francis cabot lowell invention
- Lowell textile mill