Less Than Matching Orgad Keller Less Than Matching

  • Slides: 23
Download presentation
Less Than Matching Orgad Keller

Less Than Matching Orgad Keller

Less Than Matching Input: A text , a pattern over alphabet with order relation

Less Than Matching Input: A text , a pattern over alphabet with order relation n Output: All locations where n n . Can we use the regular methods? Orgad Keller - Algorithms 2 - Recitation 12 2

Transitivity n Less Than Matching is in fact transitive, but that is not enough

Transitivity n Less Than Matching is in fact transitive, but that is not enough for us: does not imply anything about the relation between and. Orgad Keller - Algorithms 2 - Recitation 12 3

Approach n A good approach for solving Pattern Matching problems is sometimes solving: ¨

Approach n A good approach for solving Pattern Matching problems is sometimes solving: ¨ The problem for a binary alphabet. ¨ The problem for a bounded alphabet. ¨ The problem for an ubounded alphabet. In that order. Orgad Keller - Algorithms 2 - Recitation 12 4

Binary Alphabet n The only case that prevents a match at location is the

Binary Alphabet n The only case that prevents a match at location is the case where: n This is equivalent to: n So how can we solve this case? Orgad Keller - Algorithms 2 - Recitation 12 5

Binary Alphabet n So if , there is no match at can calculate ¨

Binary Alphabet n So if , there is no match at can calculate ¨ Then we’ll calculate ¨ We’ll return all locations . ¨ We n Time: using FFT. where . Orgad Keller - Algorithms 2 - Recitation 12 6

Bounded Alphabet We need reductions to binary alphabet. n For each we’ll define: n

Bounded Alphabet We need reductions to binary alphabet. n For each we’ll define: n n We notice are binary. Orgad Keller - Algorithms 2 - Recitation 12 7

Bounded Alphabet Theorem: (less than) matches at location if and only if , (less

Bounded Alphabet Theorem: (less than) matches at location if and only if , (less than) matches at location. n Proof: does not match at iff. that is true iff , meaning that does not (less than) match at location. n Orgad Keller - Algorithms 2 - Recitation 12 8

Bounded Alphabet So for each , we’ll run the binary alphabet algorithm on. n

Bounded Alphabet So for each , we’ll run the binary alphabet algorithm on. n We’ll return only the locations that matched in all iterations. n Time: . n Orgad Keller - Algorithms 2 - Recitation 12 9

Unbounded Alphabet Running the bounded alphabet algorithm could result in a time algorithms (We’ll

Unbounded Alphabet Running the bounded alphabet algorithm could result in a time algorithms (We’ll run it only for alphabet symbols which are actually in pattern). n Can be worse than the naïve algorithm. n We present an improvement on the next slides. n Orgad Keller - Algorithms 2 - Recitation 12 10

Abrahamson-Kosaraju Method First, use the segment splitting trick. Therefore we can assume. n For

Abrahamson-Kosaraju Method First, use the segment splitting trick. Therefore we can assume. n For each location in text, we’ll produce a triplet: , where. n For each location in pattern, we’ll produce a triplet: , where. n We now have triplets all together. n Orgad Keller - Algorithms 2 - Recitation 12 11

Abrahamson-Kosaraju Method We’ll hold all triplets together. n Sort all triplets according to symbol.

Abrahamson-Kosaraju Method We’ll hold all triplets together. n Sort all triplets according to symbol. n We’ll define a symbol that has more than triplets as a “frequent symbol”. n There are frequent symbols. n Put all frequent symbols’ triplets aside. n Orgad Keller - Algorithms 2 - Recitation 12 12

Abrahamson-Kosaraju Method n Split non-frequent symbols’ triplets to groups of size in the following

Abrahamson-Kosaraju Method n Split non-frequent symbols’ triplets to groups of size in the following manner: Orgad Keller - Algorithms 2 - Recitation 12 13

Abrahamson-Kosaraju Method n The rule is that there can’t be two triplets of the

Abrahamson-Kosaraju Method n The rule is that there can’t be two triplets of the same symbol in different groups. Orgad Keller - Algorithms 2 - Recitation 12 14

Abrahamson-Kosaraju Method For each such group, choose the symbol of the first triplet in

Abrahamson-Kosaraju Method For each such group, choose the symbol of the first triplet in group as the group’s representative. n For instance, on previous example, group 1’s representative is and group 2’s representative is. n There are representatives all together. n Orgad Keller - Algorithms 2 - Recitation 12 15

Abrahamson-Kosaraju Method n To sum up: frequent symbols. ¨ representatives of non-frequent symbols. ¨

Abrahamson-Kosaraju Method n To sum up: frequent symbols. ¨ representatives of non-frequent symbols. ¨ We’ll swap each non-frequent symbol in pattern and text with its representative. n Now our text and pattern are over sized alphabet. n Orgad Keller - Algorithms 2 - Recitation 12 16

Abrahamson-Kosaraju Method We want to run our algorithm over the new text and pattern

Abrahamson-Kosaraju Method We want to run our algorithm over the new text and pattern to count the mismatches between symbols of different groups. n But we have a problem: n ¨ Let’s say is a frequent symbol, but: Orgad Keller - Algorithms 2 - Recitation 12 17

Abrahamson-Kosaraju Method n The representative of group 2 is , which is smaller than

Abrahamson-Kosaraju Method n The representative of group 2 is , which is smaller than , but the group also contains which is greater than. Orgad Keller - Algorithms 2 - Recitation 12 18

Abrahamson-Kosaraju Method In that case we’ll split group 2 to two groups with their

Abrahamson-Kosaraju Method In that case we’ll split group 2 to two groups with their own representatives. n Since we performed at most such splits, we still have representatives. n Orgad Keller - Algorithms 2 - Recitation 12 19

Abrahamson-Kosaraju Method We can now run our algorithm over the new text and pattern

Abrahamson-Kosaraju Method We can now run our algorithm over the new text and pattern in. n But we still haven’t handled comparisons between two non-frequent symbols that are in the same group. n Orgad Keller - Algorithms 2 - Recitation 12 20

Abrahamson-Kosaraju Method n We’ll do so naively in each group: ¨ For n n

Abrahamson-Kosaraju Method n We’ll do so naively in each group: ¨ For n n each triplet in the group For each triplet of the form in the group, if , then add an error at location. Time: Orgad Keller - Algorithms 2 - Recitation 12 21

Running Time n For one segment: ¨ Sorting the triplets and representatives: . ¨

Running Time n For one segment: ¨ Sorting the triplets and representatives: . ¨ Running the algorithm: . ¨ Correcting results (Adding in-group errors): . Overall for one segment: n Overall for all segments: n Orgad Keller - Algorithms 2 - Recitation 12 . . 22

Running Time n We can improve to ¨ Left . as an exercise. Orgad

Running Time n We can improve to ¨ Left . as an exercise. Orgad Keller - Algorithms 2 - Recitation 12 23