CS 154 Lecture 3 DFA NFA Regular Expressions

Deterministic Finite Automata Computation with finite memory

Non-Deterministic Finite Automata Computation with finite memory and “verified guessing”

From NFAs to DFAs Input: NFA N = (Q, Σ, , Q 0, F)

From NFAs to DFAs: Subset Construction Input: NFA N = (Q, Σ, , Q

Example of the ε-closure ε({q 0}) = {q 0 , q 1, q 2}

Given: NFA N = ( {1, 2, 3}, {a, b}, , {1} ) Construct:

Reverse Theorem for Regular Languages Theorem: The reverse of a regular language is also

Using NFAs in place of DFAs can make proofs about regular languages much easier!

Regular Languages are closed under concatenation Concatenation: A B = { vw | v

Regular Languages are closed under star A* = { s 1 … sk |

Formally, the construction is: Input: DFA M = (Q, Σ, , q 1, F)

Regular Languages are Closed Under Star How would we prove that this NFA construction

1. L(N) L* Assume w = w 1…wk is in L* where w 1,

2. L(N) L* Assume w is accepted by N; we want to show w

Closure Properties for Regular Languages Union: A B = { w | w A

Regular Expressions Computation as simple, logical description A totally different way of thinking about

Inductive Definition of Regexp Let Σ be an alphabet. We define the regular expressions

Precedence Order: * then + Example: R 1*R 2 + R 3 = (

Definition: Regexps Represent Languages The regexp ∊ Σ represents the language { } The

Regexps Represent Languages For every regexp R, define L(R) to be the language that

Assume Σ = {0, 1} { w | w has exactly a single 1

Assume Σ = {0, 1} What language does the regexp * represent? {ε}

Assume Σ = {0, 1} { w | w has length ≥ 3 and

Assume Σ = {0, 1} { w | every odd position in w is

Assume Σ = {0, 1} { w | w has equal number of occurrences

L can be represented by some regexp L is regular

L can be represented by some regexp L is regular Base Cases (R has

Induction Step: Suppose every regexp of length < k represents some regular language. Consider

Give an NFA that accepts the language represented by (1(0 + 1))* ε 1

Generalized NFAs (GNFA) L can be represented by a regexp L is a regular

Generalized NFA (GNFA) Is aaabcbcba accepted or rejected? Is bcba accepted or rejected? This

NFA While the machine has more than 2 states: Pick an internal state, rip

GNFA While the machine has more than 2 states: In general: R(q 1, q

a q 0 ε a, b b (a*b)(a+b)* a*b q 1 q 2 ε

DFAs NFAs DEFINITION Regular Languages Regular Expressions

Parting thoughts: Regular Languages can be defined by their closure properties NFA=DFA, does it

Slides: 45

Download presentation

CS 154, Lecture 3: DFA NFA, Regular Expressions

Homework 1 is coming out …

Deterministic Finite Automata Computation with finite memory

Non-Deterministic Finite Automata Computation with finite memory and “verified guessing”

From NFAs to DFAs Input: NFA N = (Q, Σ, , Q 0, F) Output: DFA M = (Q , Σ, , q 0 , F ) To learn if an NFA accepts, we could do the computation in parallel, maintaining the set of all possible states that can be reached Idea: Set Q = 2 Q

From NFAs to DFAs: Subset Construction Input: NFA N = (Q, Σ, , Q 0, F) Output: DFA M = (Q , Σ, , q 0 , F ) Q = 2 Q : Q Σ → Q (R, ) = ε( (r, ) ) r R * q 0 = ε(Q 0) F = { R Q | f R for some f F } For S Q, the ε-closure of S is ε(S) = {q | q reachable from some s S by taking 0 or more ε transitions} *

Example of the ε-closure ε({q 0}) = {q 0 , q 1, q 2} ε({q 1}) = {q 1, q 2} ε({q 2}) = {q 2}

Given: NFA N = ( {1, 2, 3}, {a, b}, , {1} ) Construct: Equivalent DFA M = (2{1, 2, 3}, {a, b}, , {1, 3}, …) N M a 1 a {1, 3} b ε 2 a a, b a 3 b {2} {3} b a b b {2, 3} a ε({1}) = {1, 3} {1}, {1, 2} ? a, b b a {1, 2, 3}

Reverse Theorem for Regular Languages Theorem: The reverse of a regular language is also a regular language If a language can be recognized by a DFA that reads strings from right to left , then there is an “normal” DFA that accepts the same language Proof? Given a DFA for a language L, “reverse” its arrows and flip its start and accept states, getting an NFA. Convert that NFA back to a DFA!

Using NFAs in place of DFAs can make proofs about regular languages much easier! Remember this on homework/exams!

Union Theorem using NFAs?

Regular Languages are closed under concatenation Concatenation: A B = { vw | v A and w B } Given DFAs M 1 and M 2, connect the accept states of M 1 to the start states of M 2 ε 1 0 0, 1 1 0 0 0, 1 0 0 1 0, 1 1 1 0 1 1 ε 0 0 1 L(N) = L(M 1) L(M 2) 0 1

Regular Languages are closed under star A* = { s 1 … sk | k ≥ 0 and each si A } Let M be a DFA, and let L = L(M) We can construct an NFA N that recognizes L* ε 1 0 ε 0, 1 1 0 0 1 ε

Formally, the construction is: Input: DFA M = (Q, Σ, , q 1, F) Output: NFA N = (Q , Σ, , q 0, F ) Q = Q {q 0} F = F {q 0} (q, a) = { (q, a)} if q Q and a ≠ ε {q 1} if q F and a = ε {q 1} if q = q 0 and a = ε if q = q 0 and a ≠ ε else

Regular Languages are Closed Under Star How would we prove that this NFA construction works? Want to show: L(N) = L* 1. L(N) L* 2. L(N) L*

1. L(N) L* Assume w = w 1…wk is in L* where w 1, …, wk L We show N accepts w by induction on k Base Cases: k=0 k=1 (w = ε) (w L) Inductive Step: Assume N accepts all strings v = v 1…vk L*, vi L Let u = u 1…ukuk+1 L* , uj L Since N accepts u 1…uk (by induction) and M accepts uk+1, N also accepts u (by construction)

2. L(N) L* Assume w is accepted by N; we want to show w L* If w = ε, then w L* ε I. H. N accepts u and takes at most k ε-transitions u L* u L(N), so By I. H. Let w be accepted by N with k+1 ε-transitions. Write w as w=uv, where v is the substring read after the last εtransition u L* u ε v accept v L w = uv L*

Closure Properties for Regular Languages Union: A B = { w | w A or w B } Intersection: A B = { w | w A and w B } Complement: A = { w Σ* | w A } Reverse: AR = { w 1 …wk | wk …w 1 A, wi Σ} Concatenation: A B = { vw | v A and w B } Star: A* = { s 1 … sk | k ≥ 0 and each si A } Theorem: if A and B are regular then so are: A B, A, AR , A B, and A*

Regular Expressions Computation as simple, logical description A totally different way of thinking about computation: What is the complexity of describing the strings in the language?

Inductive Definition of Regexp Let Σ be an alphabet. We define the regular expressions over Σ inductively: For all ∊ Σ, is a regexp ε is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R 2), and (R 1)* are regexps

Precedence Order: * then + Example: R 1*R 2 + R 3 = ( ( R 1* )· R 2) + R 3

Definition: Regexps Represent Languages The regexp ∊ Σ represents the language { } The regexp ε represents {ε} The regexp represents If R 1 and R 2 are regular expressions representing L 1 and L 2 then: (R 1 R 2) represents L 1 L 2 (R 1 + R 2) represents L 1 L 2 (R 1)* represents L 1*

Regexps Represent Languages For every regexp R, define L(R) to be the language that R represents A string w ∊ Σ* is accepted by R (or, w matches R ) if w ∊ L(R) Examples: 0, 010, and 01010 match (01)*0 110101110100100 matches (0+1)*0

Assume Σ = {0, 1} { w | w has exactly a single 1 } 0*10* { w | w contains 001 } (0+1)*001(0+1)*

Assume Σ = {0, 1} What language does the regexp * represent? {ε}

Assume Σ = {0, 1} { w | w has length ≥ 3 and its 3 rd symbol is 0 } (0+1)0(0+1)*

Assume Σ = {0, 1} { w | every odd position in w is a 1 } (1(0 + 1))*(1 + ε)

Assume Σ = {0, 1} { w | w has equal number of occurrences of 01 and 10} = { w | w = 1, w = 0, or w = ε, or w starts with a 0 and ends with a 0, or w starts with a 1 and ends with a 1 } Claim: A string w has equal occurrences of 01 and 10 w starts and ends with the same bit. 1 + 0 + ε + 0(0+1)*0 + 1(0+1)*1

L can be represented by some regexp L is regular

L can be represented by some regexp L is regular Base Cases (R has length 1): Given any regexp R, we will construct an NFA N s. t. N accepts exactly the strings accepted by R R= Proof by induction on the length of the regexp R R=ε R=

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1)*

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1)* By induction, R 1 and R 2 represent some regular languages, L 1 and L 2 But L(R) = L(R 1 + R 2) = L 1 L 2 so L(R) is regular, by the union theorem!

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1)* By induction, R 1 and R 2 represent some regular languages, L 1 and L 2 But L(R) = L(R 1·R 2) = L 1· L 2 so L(R) is regular by the concatenation theorem

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1)* By induction, R 1 and R 2 represent some regular languages, L 1 and L 2 But L(R) = L(R 1*) = L 1* so L(R) is regular, by the star theorem Therefore: If L is represented by a regexp, then L is regular

Give an NFA that accepts the language represented by (1(0 + 1))* ε 1 0, 1 ε Regular expression: ( 1 (0+1))*

Generalized NFAs (GNFA) L can be represented by a regexp L is a regular language Idea: Transform an NFA for L into a regular expression by removing states and re-labeling the arcs with regular expressions Rather than reading in just letters from the string on a step, we can read in entire substrings

Generalized NFA (GNFA) Is aaabcbcba accepted or rejected? Is bcba accepted or rejected? This GNFA recognizes L(a*b(cb)*a)

NFA Add unique start and accept states

NFA While the machine has more than 2 states: Pick an internal state, rip it out and re-label the arrows with regexps, to account for paths through the missing state 0 01*0 1 0

GNFA While the machine has more than 2 states: In general: R(q 1, q 3) q 1 R(q 1, q 2)R(q , q ) 2, q 2)*R(q 2, q 3) + R(q 1, q 3) 1 2 q 2 R(q 2, q 2) R(q 2, q 3) q 3

a q 0 ε a, b b (a*b)(a+b)* a*b q 1 q 2 ε q 3 R(q 0, q 3) = (a*b)(a+b)* represents L(N)

DFAs NFAs DEFINITION Regular Languages Regular Expressions

Parting thoughts: Regular Languages can be defined by their closure properties NFA=DFA, does it mean that non-determinism is free for Finite Automata? Questions?