String Matching L 2 CS209 Design and Analysis

  • Slides: 23
Download presentation
String Matching L 2 CS-209: Design and Analysis of Algorithm Instructor: Dr. Maria Anjum

String Matching L 2 CS-209: Design and Analysis of Algorithm Instructor: Dr. Maria Anjum

Contents • Naïve Algorithm • Knuth Morris Pratt (KMP) Algorithm • Robin Karp algorithm

Contents • Naïve Algorithm • Knuth Morris Pratt (KMP) Algorithm • Robin Karp algorithm • Finite Automata

Rabin-Karp Algorithm • Rabin-Karp Algorithm is a string searching algorithm created by Richard M.

Rabin-Karp Algorithm • Rabin-Karp Algorithm is a string searching algorithm created by Richard M. Karp and Michael O. Rabin in 1987. • The algorithm uses hashing to find a set of pattern strings in a text. • It is an other application of hashing. • Widely used for multiple pattern search.

Rabin-Karp Algorithm • • Generates a hash of pattern that we are looking for

Rabin-Karp Algorithm • • Generates a hash of pattern that we are looking for in the text. Check if the rolling hash of text matches the pattern or not If it doesn't match, then pattern doesn't exist in the text. However, if it does match, the pattern can be present in the text

Rabin-Karp Algorithm Text: c c a a e d b a n=11 Pattern: d

Rabin-Karp Algorithm Text: c c a a e d b a n=11 Pattern: d b a m=3 4 x 102 + 2 x 101 + 1 x 100 4*100+ 2*10 + 1*10 = 421 Text: c c a a e d b a 3 x 102 + 3 x 101 + 1 x 100 = 331 Pattern: d b a m= 3 4 x 102 + 2 x 101 + 1 x 100 = 421 : . There are 3 letters, and total no. of letters in codes are 10, so we apply P[1]*10 m-1+ P[1]*10 m-2 + P[1]*10 m-3 : . If letters are more than 10 we take that number as base instead of 10 Hash code Note: Letters are assigned assumed numbers, you can use actual ones. Codes a – 1 b – 2 c – 3 d – 4 e – 5 f – 6 g – 7 h – 8 i – 9 j – 10

Rabin-Karp Algorithm Text: c c a a e d b a Pattern: dba =421

Rabin-Karp Algorithm Text: c c a a e d b a Pattern: dba =421 3 x 102 + 3 x 101 + 1 x 100 = 331 Rolling hash c c a a e d b a 331 -3 x 102 =31 31 x 10 or (3 x 101 + 1 x 100 ) x 10 = 310+ 3 x 100 = 313 3 x 102 + 1 x 101 + 3 x 100 c c a a e d b a 313 -3 x 102= 13 ( 1 x 101 + 3 x 100 )10= 130+ 3 x 100 = 133 1 x 102 + 3 x 101 + 3 x 100 c c a a e d b a 133 - 1 x 102 = 33 Codes a – 1 b – 2 c – 3 d – 4 e – 5 f – 6 g – 7 h – 8 i – 9 j – 10

Rabin-Karp Algorithm Text: Pattern: dba =421 c c a a e d b a

Rabin-Karp Algorithm Text: Pattern: dba =421 c c a a e d b a 133 - 1 x 102 = 33 (3 x 101 + 3 x 100 ) x 10 = 330+ 1 x 100 = 331 3 x 102 + 3 x 101 + 1 x 100 c c a a e d b a 133 - 1 x 102 = 33 (3 x 101 + 1 x 100 ) x 10 = 310+ 1 x 100 = 311 3 x 102 + 1 x 101 + 1 x 100 c c a a e d b a 311 - 3 x 102 = 11 (1 x 101 + 1 x 100 ) x 10 = 110+ 5 x 100 = 115 1 x 102 + 1 x 101 + 5 x 100 Codes a – 1 b – 2 c – 3 d – 4 e – 5 f – 6 g – 7 h – 8 i – 9 j – 10

Rabin-Karp Algorithm Text: Pattern: dba =421 c c a a e d b a

Rabin-Karp Algorithm Text: Pattern: dba =421 c c a a e d b a 311 - 3 x 102 = 11 (1 x 101 + 1 x 100 ) x 10 = 110+ 5 x 100 = 115 1 x 102 + 1 x 101 + 5 x 100 c c a a e d b a 115 - 1 x 102 = 15 (1 x 101 + 5 x 100 ) x 10 = 150+ 4 x 100 = 154 1 x 102 + 5 x 101 + 4 x 100 c c a a e d b a 154 - 1 x 102 = 54 (5 x 101 + 4 x 100 ) x 10 = 540+ 2 x 100 = 542 5 x 102 + 4 x 101 + 2 x 100 Codes a – 1 b – 2 c – 3 d – 4 e – 5 f – 6 g – 7 h – 8 i – 9 j – 10

Rabin-Karp Algorithm Text: Pattern: dba =421 c c a a e d b a

Rabin-Karp Algorithm Text: Pattern: dba =421 c c a a e d b a 154 - 1 x 102 = 54 (5 x 101 + 4 x 100 ) x 10 = 540+ 2 x 100 = 542 5 x 102 + 4 x 101 + 2 x 100 c c a a e d b a 542 - 5 x 102 = 42 (4 x 101 + 2 x 100 ) x 10 = 420+ 1 x 100 = 421 4 x 102 + 2 x 101 + 1 x 100 Pattern match- the calculations are called rolling hash Codes a – 1 b – 2 c – 3 d – 4 e – 5 f – 6 g – 7 h – 8 i – 9 j – 10

Rabin-Karp Algorithm • Time complexity is O(n-m+1) • Worst time is O(mn) for spurious

Rabin-Karp Algorithm • Time complexity is O(n-m+1) • Worst time is O(mn) for spurious (fake) hits

Finite Automata • The string-matching automaton is very efficient: it examines each character in

Finite Automata • The string-matching automaton is very efficient: it examines each character in the text exactly once and reports all the valid shifts in O(n) time. Basic Idea: • Each character in the pattern has a state. • Each match sends the automaton into a new state. • If all the characters in the pattern has been matched, the automaton enters the accepting state. • Otherwise, the automaton will return to a suitable state according to the current state and the input character such that this returned state reflects the maximum advantage we can take from the previous matching. • the matching takes O(n) time since each character is examined once.

Finite Automata •

Finite Automata •

Finite Automata •

Finite Automata •

Finite Automata • Cartesian product Q∑ Input (alphabets) Mapping is represented through Transition Table

Finite Automata • Cartesian product Q∑ Input (alphabets) Mapping is represented through Transition Table or Transition function. states a b 0 1 0 0 Transition Table

Finite Automata Text= a b a b a c a b a Pattern= a

Finite Automata Text= a b a b a c a b a Pattern= a b a c a • The first step is to make finite automata of given pattern • While making pattern there are two concepts involved: • Prefix : take a substring of pattern string from left-to-right e. g a, aba, ababa, ababac …… • Suffix : take a substring of pattern string from right-to-left e. g a, ca, aca, baca, abaca, babaca, …. • While making automata it is important to note were prefix and suffix match.

Finite Automata Text= a b a b a c a b a Pattern= a

Finite Automata Text= a b a b a c a b a Pattern= a b a c a • Pattern has 7 letters so state will be from 0 -7 • ∑ (sigma) is a finite input alphabet ={a, b, c} • We will check all symbols on each state. • Check for prefix and suffix for each letter. • The count of letters in a match with prefix and suffix determines the state number. a a a Initial state 0 a 1 b 2 a 3 b 4 a b b Finite Automata a 5 c 6 a 7 End state

Finite Automata • T= a b a b a c a b a P=

Finite Automata • T= a b a b a c a b a P= a b a c a • While making machines • • First check match with pattern, and change the state If pattern doesn’t match, check prefix and suffix If prefix and suffix match, check no. of letters and determine the state number. If prefix and suffix doesn't match ignore and move on. • After finite automata convert it into transition table. (next slide please)

Finite Automata a Initial state 0 a 1 b 2 a 3 b 4

Finite Automata a Initial state 0 a 1 b 2 a 3 b 4 a 5 c a 6 7 End state b b Finite Automata a • T= a b a b a c a b a • P= a b a c a Transition Table a b c δ(0, a), δ(0, b), δ(0, c) 0 or q 0 1 0 0 δ(1, a), δ(1, b), δ(1, c) 1 or q 1 1 2 0 δ(2, a), δ(2, b), δ(2, c) 2 or q 2 3 0 0 δ(3, a), δ(3, b), δ(3, c) 3 or q 3 1 4 0 δ(4, a), δ(4, b), δ(4, c) 4 or q 4 5 0 0 δ(5, a), δ(5, b), δ(5, c) 5 or q 5 1 4 6 δ(6, a), δ(6, b), δ(6, c) 6 or q 6 7 0 0 δ(7, a), δ(7, b), δ(7, c) 7 or q 7 1 2 0

Finite Automata a Initial state 0 a 1 b 2 a 3 b 4

Finite Automata a Initial state 0 a 1 b 2 a 3 b 4 a 5 c a 6 7 End state b b Finite Automata a Transition Table • T= a b a b a c a b a (length n) • P= a b a c a (length m) State, input string text(T) When state equals to length of pattern ith index of text (T) – pattern length a b c 0 or q 0 1 0 0 1 or q 1 1 2 0 2 or q 2 3 0 0 3 or q 3 1 4 0 4 or q 4 5 0 0 5 or q 5 1 4 6 6 or q 6 7 0 0 7 or q 7 1 2 0

Finite Automata a Initial state 0 a b 1 a 2 b 3 a

Finite Automata a Initial state 0 a b 1 a 2 b 3 a 4 c 5 6 a 7 End state b b Finite Automata a • T= a b a b a c a b a (length n) • P= a b a c a Transition Table (length m) P= a b a c a 9 -7=2 Pattern occurs with shift 2 a b c 0 or q 0 1 0 0 1 or q 1 1 2 0 2 or q 2 3 0 0 3 or q 3 1 4 0 4 or q 4 5 0 0 i - 1 2 3 4 5 6 7 8 9 10 11 5 or q 5 1 4 6 T[i] - a b a b a c a b A 6 or q 6 7 0 0 state 0 1 2 3 4 5 6 7 7 or q 7 1 2 0

Finite Automata • Preprocessing Θ(m|Σ|) and matching time Θ(n)

Finite Automata • Preprocessing Θ(m|Σ|) and matching time Θ(n)

Home Assignment • What is the Collison situation in Robin Karp Algorithm • Consult

Home Assignment • What is the Collison situation in Robin Karp Algorithm • Consult book for exercise 32. 3 -1 finite automata for string matching. • What will be the big Oh for finite automata for string matching.

References • Book Introduction to algorithms, 3 rd edition, Chapter String Matching • https:

References • Book Introduction to algorithms, 3 rd edition, Chapter String Matching • https: //www. youtube. com/watch? v=q. Q 8 v. S 2 btsx. I check for spurious hits • https: //www. youtube. com/watch? v=M_Xp. GQyyq. IQ • http: //cs. bc. edu/~alvarez/Algorithms/Notes/string. Matching 2. html • http: //web. cs. mun. ca/~wang/courses/cs 6783 -13 f/n 2 -string-1. pdf • https: //www. youtube. com/watch? v=-Ze. P 4 KHibk. U finite automata machine • http: //web. cs. mun. ca/~wang/courses/cs 6783 -13 f/n 2 -string-1. pdf