Regular Expressions Hopcroft Motawi Ullman Chap 3 Machines

  • Slides: 12
Download presentation
Regular Expressions Hopcroft, Motawi, Ullman, Chap 3

Regular Expressions Hopcroft, Motawi, Ullman, Chap 3

Machines versus expressions n n Finite automata are machine-like descriptions of languages Alternative: declarative

Machines versus expressions n n Finite automata are machine-like descriptions of languages Alternative: declarative description n n Specifying a language using expressions and operations Example: 01* + 10* defines the language containing strings such as 01111, 100, 0, 1000000; * and + are operators in this “algebra”

Regular expressions defined n The simplest regular expressions are n n n Given that

Regular expressions defined n The simplest regular expressions are n n n Given that R 1 and R 2 are regular expressions, regular expressions are built from the following operations n n n The empty string A symbol a from an alphabet Union: R 1+R 2 Concatenation: R 1 R 2 Closure: R 1* Parentheses (to enforce precedence): (R 1) Nothing else is a regular expression unless it is built from the above rules

Examples n n n n 1+ (ab)* + (ba)* (0+1+2+3+4+5+6+7+8+9)*(0+5) (x+y)*x(x+y)* (01)* 0(1*) 01*

Examples n n n n 1+ (ab)* + (ba)* (0+1+2+3+4+5+6+7+8+9)*(0+5) (x+y)*x(x+y)* (01)* 0(1*) 01* equivalent to 0(1*)

Equivalence between regular expressions and finite automata Strategy: n Convert regular expression to an

Equivalence between regular expressions and finite automata Strategy: n Convert regular expression to an -NFA n Convert a DFA to a regular expression

Regular Expression to -NFA n n Recursive construction Base cases follow base case definitions

Regular Expression to -NFA n n Recursive construction Base cases follow base case definitions of regular expressions n n n : -NFA that accepts the empty string – a single state that is the start and end state a: -NFA that accepts {a} – two-state machine (start and final state) with an a-transition Note: technically, we also need an -NFA for the empty language {} – easy

Regular Expression to -NFA n n Recursive step: build -NFA from smaller NFAs that

Regular Expression to -NFA n n Recursive step: build -NFA from smaller NFAs that correspond to the operand regular expressions To simplify construction, we may ensure the following characteristics for the automata we build n n n Only one final state, with no outgoing transitions No transitions into the start state Note: the base cases satisfy these characteristics

Regular Expression to -NFA n n n Suppose -NFA 1 and -NFA 2 are

Regular Expression to -NFA n n n Suppose -NFA 1 and -NFA 2 are the automata for R 1 and R 2 Three operations to worry about: union R 1 + R 2, concatenation R 1 R 2), closure R 1* With -transitions, construction is straightforward n n n Union: create a new start state, with -transitions into the start states of -NFA 1 and -NFA 2; create a new final state, with -transitions from the two final states of -NFA 1 and NFA 2 Concatenation: -transition from final state of -NFA 1 to the start state of -NFA 2 Closure: closure can be supported by an -transition from final to start state; need a few more -transitions (why? )

DFA to Regular Expression n More difficult construction Build the regular expression “bottom up”

DFA to Regular Expression n More difficult construction Build the regular expression “bottom up” starting with simpler strings that are acceptable using a subset of states in the DFA Define Rki, j as the expression for strings that have an admissible state sequence from state i to state j with no intermediate states greater than k n Assume no states are numbered 0, but k can be 0

R 0 i, j n Observe that R 0 i, j describes strings of

R 0 i, j n Observe that R 0 i, j describes strings of length 1 or 0, particularly: n n {a 1, a 2, a 3, … }, where, for each ax, (i, ax) = j Add to the set if i = j The 0 in R 0 i, j means no intermediate states are allowed, so either no transition is made (just stay in state i to accept if i = j) or make a single transition from state i to state j These are the base cases in our construction

Rki, j n n Recursive step: for each k, we can build Rki, j

Rki, j n n Recursive step: for each k, we can build Rki, j as follows: Rki, j = Rk-1 i, j + Rk-1 i, k (Rk-1 k, k)* Rk-1 k, j Intuition: since the accepting sequence contains one or more visits to state k, break the path into pieces that n n n first goes from i to its first k-visit (Rk-1 i, k) followed by zero or more revisits to k (Rk-1 k, k) followed by a path from k to j (Rk-1 k, j )

And finally… n n We get the regular expression(s) that represent all strings with

And finally… n n We get the regular expression(s) that represent all strings with admissible sequences that start with the initial state (state 1) and end with a final state Resulting regular expression built from the DFA: the union of all Rn 1, f where f is a final state n Note: n is the number of states in the DFA meaning there are no more restrictions for intermediate states in the accepting sequence