Faster 2 Dimensional Scaled Matching Amihood Amir and

  • Slides: 24
Download presentation
Faster 2 -Dimensional Scaled Matching Amihood Amir and Eran Chencinski

Faster 2 -Dimensional Scaled Matching Amihood Amir and Eran Chencinski

Real Scaling Given an n x n Text T, m x m pattern P,

Real Scaling Given an n x n Text T, m x m pattern P, find all occurrences of P in T, scaled to any read scale Best known algorithm [Amir at el. ]: Time: O(nm 3+n 2 m*log(m)) Space: O(nm 3+n 2) Our Altorithm: Time: O(n 2 m) Space: O(n 2)

Scaling – Geometric Definition

Scaling – Geometric Definition

Scaling – Algebraic Definition Rounding Function:

Scaling – Algebraic Definition Rounding Function:

Scaling – Algebraic Definition Given pattern P, of size m x m, and scale

Scaling – Algebraic Definition Given pattern P, of size m x m, and scale r The first row would be scaled to || 1*r || The first 2 rows would be scaled to || 2*r || … The first m rows would be scaled to || m*r || Similarly on the columns

Scaling – Algebraic Definition Rounding Function: Inverse Rounding Function: suppose we know that K

Scaling – Algebraic Definition Rounding Function: Inverse Rounding Function: suppose we know that K rows where scaled to L row:

Subrow/column Repetition Query time: O(1), preprocessing time: O(n 2)

Subrow/column Repetition Query time: O(1), preprocessing time: O(n 2)

Algorithm Layout The algorithm consists of 4 stages: 1. Scale Elimination 2. Candidate Consistency

Algorithm Layout The algorithm consists of 4 stages: 1. Scale Elimination 2. Candidate Consistency 3. Candidate Verification 4. Occurrence Recognition Each stage takes O(n 2 m) time and O(n 2) space

Scale Elimination Stage Pivot

Scale Elimination Stage Pivot

Scale Elimination Stage (i, j)

Scale Elimination Stage (i, j)

Scale Elimination Stage (i, j) O(m) time for each location, O(n 2 m) total,

Scale Elimination Stage (i, j) O(m) time for each location, O(n 2 m) total, O(n 2) space

Candidate Consistency Stage

Candidate Consistency Stage

Candidate Consistency Stage Case (a) Case (b)

Candidate Consistency Stage Case (a) Case (b)

Witness Table Construction For each suffix O(m 2) time and O(m) space

Witness Table Construction For each suffix O(m 2) time and O(m) space

Pre-Dueling Step For each candidate c in T: For each suffix s of P:

Pre-Dueling Step For each candidate c in T: For each suffix s of P: Compare c’s borders with witness table borders of suffix s If borders are not the same – c is eliminated Can be done in O(m) time for each candidate

Performing a Duel

Performing a Duel

The Dueling Order Each candidate performs at most O(m) succ. duels

The Dueling Order Each candidate performs at most O(m) succ. duels

Candidate Consistency Stage Witness Table construction: O(m 3) time, O(m 2) space Pre-Dueling Step:

Candidate Consistency Stage Witness Table construction: O(m 3) time, O(m 2) space Pre-Dueling Step: O(n 2 m) time, O(m 2) space # of Duel At most O(n) unsucc. , at most O(n 2 m) succ. where each duel takes O(1) time Total: O(n 2 m) time, O(n 2) space

Candidate Verification Stage

Candidate Verification Stage

Candidate Verification Stage For each location find maximal containing interval Can be solved in

Candidate Verification Stage For each location find maximal containing interval Can be solved in O(n) time per row using solution to Maximal Interval Problem

Candidate Verification Stage Once we find the largest interval we: Verify each row in

Candidate Verification Stage Once we find the largest interval we: Verify each row in O(m) time, using subcolumn repetition queries Save the longest matching length For each candidate run a Range Minimum Query on the lengths The pattern appears iff pattern size >= RMQ

Candidate Verification Stage Finding largest intervals: O(n) time per row, O(n 2) total Verifing

Candidate Verification Stage Finding largest intervals: O(n) time per row, O(n 2) total Verifing columns: O(nm) time per row, O(n 2 m) total RMQ : Preprocess: O(n) time per row, O(n 2) total Quering: O(1) time per candidate, O(n 2) total Total: O(n 2 m) time, O(n 2) space

Occurrence Recognition Stage Recall: Scale elimination stage returned At most O(m) steps per candiate

Occurrence Recognition Stage Recall: Scale elimination stage returned At most O(m) steps per candiate Total: O(n 2 m) time

Conclusions The algorithm consists of 4 stages: 1. Scale Elimination 2. Candidate Consistency 3.

Conclusions The algorithm consists of 4 stages: 1. Scale Elimination 2. Candidate Consistency 3. Candidate Verification 4. Occurrence Recognition Each stage takes O(n 2 m) time and O(n 2) space