3 Regular Expressions and Languages CIS 5513 Automata

  • Slides: 20
Download presentation
3. Regular Expressions and Languages CIS 5513 - Automata and Formal Languages – Pei

3. Regular Expressions and Languages CIS 5513 - Automata and Formal Languages – Pei Wang 1

Regular languages L is a regular language if and only if it is accepted

Regular languages L is a regular language if and only if it is accepted by a DFA or NFA (or ε-NFA) Regular languages can be specified without automata, but with regular expressions Regular expressions are notations that specify a language declaratively, focusing on the symbolic patterns in the sentences 2

Operators of regular languages A language is a set of strings, therefore �Union: The

Operators of regular languages A language is a set of strings, therefore �Union: The union of two languages L and M, L M, is set union of the two �Dot: The concatenation of two languages L and M, L • M or LM, is string concatenation of the two (similar to Cartesian product L×M) �Star: The closure of a language, L*, is defined as L 0 L 1 L 2 L 3 …, 3

Operators of regular expressions Regular expressions are formed recursively �Constant or symbol: L(Ø) =

Operators of regular expressions Regular expressions are formed recursively �Constant or symbol: L(Ø) = Ø, L(ε) = {ε}, L(a) = {a} �Union, concatenation, and star: L(E+F) = L(E) L(F) L(EF) = L(E)L(F) L(E*) = (L(E))* �Parenthesized expression: L((E)) = L(E) 4

Order of precedence of regularexpression operators, from high to low: star > dot >

Order of precedence of regularexpression operators, from high to low: star > dot > union Example: Regular expression for the strings that consist of alternating 0’s and 1’s �(01)*+(10)*+0(10)*+1(01)* 5

Exercises Write regular expressions for the following languages: � 3. 1. 1(a) The set

Exercises Write regular expressions for the following languages: � 3. 1. 1(a) The set of strings over alphabet {a, b, c} containing at least one a and at least one b � 3. 1. 2(a) Binary strings where every pair of adjacent 0’s appear before any pair of adjacent 1’s Solution: http: //infolab. stanford. edu/~ullman/ialcsols/ 6 sol 3. html#sol 31

DFA to R. E. by inclusion (1) A regular expression can be built from

DFA to R. E. by inclusion (1) A regular expression can be built from a DFA for the same language Proof: mathematical induction on the states to be used as intermediate in DFA (1) Name the states of D from 1 to n, starting at the start state (2) Use Rij(k) for the regular expression where L(Rij(k)) = {w | w is the label of a path in D from state i to state j without going through nodes greater than k} 7

DFA to R. E. by inclusion (2) (3) The basis is k = 0,

DFA to R. E. by inclusion (2) (3) The basis is k = 0, where Rij(0) is the union of the symbols on the direct edges from i to j (4) Given Rij(k-1) for all pairs of state Rij(k) = Rij(k-1) + Rik(k-1) (Rkk(k-1))* Rkj(k-1) (5) When k = n, the regular expression for L(D) is the union of all R 1 j(k) where j is a final state 8

DFA to R. E. by inclusion: example 9

DFA to R. E. by inclusion: example 9

DFA to R. E. by elimination (1) Eliminating the state s 10

DFA to R. E. by elimination (1) Eliminating the state s 10

DFA to R. E. by elimination (2) Overall process: (1) Label the edges using

DFA to R. E. by elimination (2) Overall process: (1) Label the edges using r. e. (2) Repeatedly eliminating states for each final state, get a r. e. (R+SU*T)*SU* or R* (3) Take the union of all the 11

DFA to R. E. by elimination: example 12 Resulting R. E. : (0+1)*1(0+1)+ (0+1)*1(0+1)(0+

DFA to R. E. by elimination: example 12 Resulting R. E. : (0+1)*1(0+1)+ (0+1)*1(0+1)(0+ 1) or (0+1)*1(0+1)(ϵ+

Regular expression to ε-NFA 13

Regular expression to ε-NFA 13

Regular expression to ε-NFA (2) Regular expression: (0+1)*1(0+1) (a) 0+1 (b) (0+1)* (c) (0+1)*1(0+1)

Regular expression to ε-NFA (2) Regular expression: (0+1)*1(0+1) (a) 0+1 (b) (0+1)* (c) (0+1)*1(0+1) How about a simpler one? 14

Regular expressions in UNIX In UNIX, regular expressions are widely used to represent patterns

Regular expressions in UNIX In UNIX, regular expressions are widely used to represent patterns in text �“. ” for any character �“[abc]” for a+b+c �“[a-z]” for any character between a and z �“[: digit: ]” for any digit, as [0 -9] �“[: alpha: ]” for any letter, as [A-Za 15

Regular expressions in UNIX (cont. ) �Infix “|” for union �Suffix “? ” for

Regular expressions in UNIX (cont. ) �Infix “|” for union �Suffix “? ” for “zero or one of” �Suffix “+” for “one or more of” �Suffix “{n}” for “n copies of” Please note that the above usage is not identical to how “regular expression” is defined in our context 16

Laws for regular expressions Two regular expressions are equivalent if and only if they

Laws for regular expressions Two regular expressions are equivalent if and only if they specify the same language �L + M = M + L �(L + M) + N = L + (M + N) �(LM)N = L(MN) �Ø + L = L + Ø = L �εL = Lε = L �ØL = LØ = Ø 17

Laws for regular expressions (cont. ) �L(M + N) = LM + LN �L

Laws for regular expressions (cont. ) �L(M + N) = LM + LN �L + L = L �(L*)* = L* �Ø* = ε �ε* = ε �L+ = LL* = L*L �L* = L+ + ε �L? = ε + L 18

Proving a law of regular expression To prove a law of regular expression: 1.

Proving a law of regular expression To prove a law of regular expression: 1. Converting the regular expressions by replacing the variables by different symbols 2. Checking the equality of the two languages produced by the regular expressions Theorem 3. 14 proves this procedure is correct using the substitutability 19

Proving the laws: examples Check the following identities: �Exercise 3. 4. 1(a): R +

Proving the laws: examples Check the following identities: �Exercise 3. 4. 1(a): R + S = S + R �Exercise 3. 4. 1(f): (R*)* = R* �Exercise 3. 4. 2(a): (R + S)* = R* + S* �Exercise 3. 4. 2(c): (RS + R)*RS = (RR*S)* Solution: http: //infolab. stanford. edu/~ullman/ialcsols/ sol 3. html#sol 34 20