Less Than Matching Orgad Keller Modified by Ariel
Less Than Matching Orgad Keller Modified by Ariel Rosenfeld
Not Exact-Matching n Parametrized matching int i=1; int j=2; Int x= i * j; int first. Var=1; int second. Var=2; Int result= first. Var= * second. Var; Algorithms 2 2
Not Exact-Matching n Order-Preserving matching Algorithms 2 3
Less Than Matching Algorithms 2 4
Less Than Matching Input: A text , a pattern over alphabet with order relation n Output: All locations where n n . Can we use the regular methods? Algorithms 2 5
Transitivity n Less Than Matching is in fact transitive, but that is not enough for us: does not imply anything about the relation between and. Algorithms 2 6
Approach n A good approach for solving Pattern Matching problems is sometimes solving: ¨ The problem for a binary alphabet. ¨ The problem for a bounded alphabet. ¨ The problem for an ubounded alphabet. In that order. Algorithms 2 7
Binary Alphabet n The only case that prevents a match at location is the case where: n This is equivalent to: n So how can we solve this case? Algorithms 2 8
Binary Alphabet n So if , there is no match at can calculate ¨ Then we’ll calculate ¨ We’ll return all locations . ¨ We Algorithms 2 (P reverse) using FFT. where 9
Example n Algorithms 2 10
Example n Algorithms 2 11
n P=0101 T=0101001110 Algorithms 2 13
What just happened? n T! = PR = Algorithms 2 14
Complexity n Time: Algorithms 2 15
Bounded Alphabet We need reductions to binary alphabet. n For each we’ll define: n n We notice are binary. Algorithms 2 16
Bounded Alphabet Theorem: (less than) matches at location if and only if , (less than) matches at location. n Proof: does not match at iff. that is true iff , meaning that does not (less than) match at location. n Algorithms 2 17
Bounded Alphabet So for each , we’ll run the binary alphabet algorithm on. n We’ll return only the locations that matched in all iterations. n Time: n Algorithms 2 18
Problem Can be worse than the naïve algorithm. n What about unbounded alphabet? n We present an improvement on the next slides. n Algorithms 2 19
The Trick n We’ll split the text into of size like this: overlapping segments ¨ So every match in the text must appear in whole in one of the segments. 20
Abrahamson-Kosaraju Method First, use the segment splitting trick. Therefore we can assume. n For each location in text, we’ll produce a triplet: , where. n For each location in pattern, we’ll produce a triplet: , where. n We now have triplets all together. n Algorithms 2 21
Abrahamson-Kosaraju Method We’ll hold all triplets together. n Sort all triplets according to symbol. n We’ll define a symbol that has more than triplets as a “frequent symbol”. n There are frequent symbols. n Put all frequent symbols’ triplets aside. n Algorithms 2 22
Abrahamson-Kosaraju Method For each such group, choose the symbol of the first triplet in group as the group’s representative. n For instance, on previous example, group 1’s representative is and group 2’s representative is. n There are representatives all together. n Algorithms 2 27
Abrahamson-Kosaraju Method n To sum up: frequent symbols. ¨ representatives of non-frequent symbols. ¨ We’ll swap each non-frequent symbol in pattern and text with its representative. n Now our text and pattern are over sized alphabet. n Algorithms 2 28
Abrahamson-Kosaraju Method We can now run our algorithm over the new text and pattern in. n But we still haven’t handled comparisons between two non-frequent symbols that are in the same group. n Algorithms 2 35
Abrahamson-Kosaraju Method n We’ll do so naively in each group: ¨ For n n each triplet in the group For each triplet of the form in the group, if , then add an error at location. Time: Algorithms 2 36
Running Time n For one segment: ¨ Sorting the triplets and representatives: . ¨ Running the algorithm: . ¨ Correcting results (Adding in-group errors): . Overall for one segment: n Overall for all segments: n Algorithms 2 . . 37
Running Time n We can improve to ¨ Left . as an exercise. Algorithms 2 38
- Slides: 33