Amihood Amir Dina Sokol Shoshana Neuburger UWSL 2006
Amihood Amir, Dina Sokol, Shoshana Neuburger UWSL 2006 1
2 -Dimensional Pattern Matching Perform pattern matching on images MRI FAX
Searching Aerial Photographs 3
Historic Two Dimensional Model: 4
2 D Pattern Matching - Example Input: Text = {A, B} A A B A B A B A A B B B A A A A B B A A B B B B A B Pattern: A B A A A B Output: { (1, 4), (2, 2), (4, 3)}
Bird-Baker Algorithm (1976) n Time: for bounded fixed alphabets. for infinite alphabets. n Technique: linearization. 6
Bird / Baker First linear-time 2 D pattern matching algorithm. n View each pattern row as a metacharacter to linearize problem. n Convert 2 D pattern matching to 1 D. n
Linearization Concatenate rows of Text and use string matching tools. n In this case – The Aho and Corasick algorithm for a dictionary of patterns. n The dictionary consists of all pattern rows. n 8
Find all pattern rows… then align them. 9
Bird / Baker Preprocess pattern: n Name rows of pattern using AC automaton. n Using names, pattern has 1 D representation. n Construct KMP automaton of pattern. Identical rows receive identical names.
Bird / Baker - Example Preprocess pattern: n Name rows of pattern using AC automaton. n Using names, pattern has 1 D representation. n Construct KMP automaton of pattern. A B A 1 A A B 2
Bird / Baker Scan text: n Name positions of text that match a row of pattern, using AC automaton within each row. n Run KMP on named columns of text. Since the 1 D names are unique, only one name can be given to a text location.
Bird / Baker - Example Scan text: n Name positions of text that match a row of pattern, using AC automaton within each row. n Run KMP on named columns of text. A A B A A 0 0 2 1 0 B A B A B 0 0 0 1 0 A A B B 0 0 2 1 0 2 0 B A A A 0 0 0 2 1 0 0 A B A A A 0 0 1 0 0 B B A A B 0 0 2 1 0 B B B A B 0 0 0 1 0
Another linearizationpad with “don’t cares” m Time: n-m Fischer-Paterson (1972) 14
Witnesses Popular paradigm in pattern matching: 1. find consistent candidates 2. verify candidates consistent candidates → verification is linear
Dueling Algorithm
Data Structure n n n List of potential candidates R = rightmost element of that list N = new element R N
Case 1: N dies X R N N
Case 2: R dies X R N
Case 3: noone dies add N to list of consistent candidates n Since N is consistent with R, and R is consistent with the rest of the list, n by transitivity, N is consistent with the list n
Witnesses n Vishkin introduced the duel to choose between two candidates by checking the value of a witness. n Alphabet-independent method.
Dueling Paradigm [Vishkin 1985] P= i T= j witness b ? a A duel chooses between two possible candidates by checking the value of a ‘witness. ’
Witness Table n A witness table is a table of size |P|, which stores a location of a conflict for each location of P (w/r to left cand). P Witness table i j T
Dueling Method in 2 D How do we arrange for candidates to agree on overlap? – duel! A A A AAAAA AAV AAAAA AAAAA A A A When there is conflict between two candidates, a single text check eliminates at least one candidate. The text location can be pre -computed because of transitivity. The dueling phase is thus linear time. 24
A duel in 2 -dimensions Witness[3, 3]=(4, 3) 1 1 2 3 4 a b a a 2 3 4 b a a b a b a b
2 -D Witness Table n A 2 -D Witness table is a table of size m 2, storing a witness for each location of P. P Witness Table a b a b a * * 4, 3 4, 1 * 4, 3 4, 2 4, 3 *
2 D Witnesses Amir et. al. – 2 D witness table can be used for linear time and space alphabet-independent 2 D matching. n The order of duels is significant, it is done in 2 waves: n 1: duel within each column, bottom to top. n 2: duel between columns from right to left. n
First Truly 2 d Algorithm – The Dueling Method (A-Benson. Farach 1991) Once duels are over, the situation is: All potential pattern “starts” agree on overlap. i. e. all want to see the same symbol in every text location. A 28
Verification Do a forward wave down the columns to label starts of pattern rows. n Do a forward wave on each row, beginning anew for each new row. Label positions of mismatch. n Kill all candidates that contain a mismatch (using 2 similar backwards waves) n 29
Dueling Method … n Time for checking every text element’s correctness: linear. n Every candidate with incorrect element in its range is eliminated. n Method: The “wave”. n Total Time: 30
2 D Dictionary Matching Suppose we are given a set of 2 d patterns, called a dictionary. n Goal: search for all Patterns in Text simultaneously, in linear time. n Bird/Baker can be extended, if all patterns have uniform width. (How? ) n 31
- Slides: 31