An Improved Algorithm to Accelerate Regular Expression Evaluation

  • Slides: 16
Download presentation
An Improved Algorithm to Accelerate Regular Expression Evaluation Author:Michela Becchi 、Patrick Crowley Publisher:ANCS’ 07

An Improved Algorithm to Accelerate Regular Expression Evaluation Author:Michela Becchi 、Patrick Crowley Publisher:ANCS’ 07 Presenter:Wen-Tse Liang Date: 2010/11/17 1

Outline Introduction D 2 FA Improved Algorithm Experiment Evaluation 2

Outline Introduction D 2 FA Improved Algorithm Experiment Evaluation 2

Introduction Kumar et al. [9] observe that many states in DFAs have similar sets

Introduction Kumar et al. [9] observe that many states in DFAs have similar sets of outgoing transitions. Substantial space savings in excess of 90% are achievable in current rule-sets when this redundancy is exploited. The proposed automaton, called a Delayed Input DFA (D 2 FA), replaces redundant transitions common to a pair of states with a single default transition. 3

Introduction In this paper, we propose an improved yet simplified algorithm for building default

Introduction In this paper, we propose an improved yet simplified algorithm for building default transitions that addresses these problems. On practical data sets, the level of compression achieved is similar than the original D 2 FA scheme, while providing a superior worst-case memory bandwidth bound. 4

D 2 FA Consider two states u and v, where both u and v

D 2 FA Consider two states u and v, where both u and v have a transition labeled by the symbol a to a common third state w, and no default transition. If we introduce a default transition from u to v, we can eliminate the a-transition from u without affecting the destination state function δ(x). 5

D 2 FA two automata on the input string aabdbc. 6

D 2 FA two automata on the input string aabdbc. 6

D 2 FA Note that by the same reasoning, if there are multiple symbols

D 2 FA Note that by the same reasoning, if there are multiple symbols a, for which u has a labeled outgoing edge and for which δ(a, u)=δ(a, v), the introduction of a default edge from u to v allows us to eliminate all these edges. 7

D 2 FA 8

D 2 FA 8

D 2 FA The edge joining a pair of vertices (states) u and v

D 2 FA The edge joining a pair of vertices (states) u and v is assigned a weight w(u, v) that is one less than the number of symbols a for which δ(a, u)=δ(a, v). 9

D 2 FA The natural way to avoid long default paths is to construct

D 2 FA The natural way to avoid long default paths is to construct a maximum weight spanning tree with a specified bounded diameter. 10

D 2 FA Diameter bound of 4 11

D 2 FA Diameter bound of 4 11

D 2 FA Diameter bound of 2 12

D 2 FA Diameter bound of 2 12

Improved Algorithm in order to propose a more general compression algorithm which leads to

Improved Algorithm in order to propose a more general compression algorithm which leads to a traversal time bound independent of the maximum default transition path length. we define its depth as the minimum number of states visited when moving from s 0 to s in the DFA. In other words, the initial state s 0 will have depth 0, the set of states S 1 directly reachable from s 0 will have depth 1, the set of states S 2 directly reachable from any of the S 1 (but not from s 0) will have depth 2, and so on. 13

Improved Algorithm Lemma: If none of the default transitions in a D 2 FA

Improved Algorithm Lemma: If none of the default transitions in a D 2 FA lead from a state with depth di to a state of depth dj with dj ≥ di, then any string of length N will require at most 2 N state traversals to be processed. In other words, a 2 N time bound is guaranteed on all D 2 FA having only “backwards” transitions. In a sense, this can be thought of as a generalization of to regular expressions. 14

Improved Algorithm 15

Improved Algorithm 15

Experiment Evaluation 16

Experiment Evaluation 16