CS 154 Lecture 3 DFA NFA Regular Expressions

  • Slides: 45
Download presentation
CS 154, Lecture 3: DFA NFA, Regular Expressions

CS 154, Lecture 3: DFA NFA, Regular Expressions

Homework 1 is coming out …

Homework 1 is coming out …

Deterministic Finite Automata Computation with finite memory

Deterministic Finite Automata Computation with finite memory

Non-Deterministic Finite Automata Computation with finite memory and “verified guessing”

Non-Deterministic Finite Automata Computation with finite memory and “verified guessing”

From NFAs to DFAs Input: NFA N = (Q, Σ, , Q 0, F)

From NFAs to DFAs Input: NFA N = (Q, Σ, , Q 0, F) Output: DFA M = (Q , Σ, , q 0 , F ) To learn if an NFA accepts, we could do the computation in parallel, maintaining the set of all possible states that can be reached Idea: Set Q = 2 Q

From NFAs to DFAs: Subset Construction Input: NFA N = (Q, Σ, , Q

From NFAs to DFAs: Subset Construction Input: NFA N = (Q, Σ, , Q 0, F) Output: DFA M = (Q , Σ, , q 0 , F ) Q = 2 Q : Q Σ → Q (R, ) = ε( (r, ) ) r R * q 0 = ε(Q 0) F = { R Q | f R for some f F } For S Q, the ε-closure of S is ε(S) = {q | q reachable from some s S by taking 0 or more ε transitions} *

Example of the ε-closure ε({q 0}) = {q 0 , q 1, q 2}

Example of the ε-closure ε({q 0}) = {q 0 , q 1, q 2} ε({q 1}) = {q 1, q 2} ε({q 2}) = {q 2}

Given: NFA N = ( {1, 2, 3}, {a, b}, , {1} ) Construct:

Given: NFA N = ( {1, 2, 3}, {a, b}, , {1} ) Construct: Equivalent DFA M = (2{1, 2, 3}, {a, b}, , {1, 3}, …) N M a 1 a {1, 3} b ε 2 a a, b a 3 b {2} {3} b a b b {2, 3} a ε({1}) = {1, 3} {1}, {1, 2} ? a, b b a {1, 2, 3}

Reverse Theorem for Regular Languages Theorem: The reverse of a regular language is also

Reverse Theorem for Regular Languages Theorem: The reverse of a regular language is also a regular language If a language can be recognized by a DFA that reads strings from right to left , then there is an “normal” DFA that accepts the same language Proof? Given a DFA for a language L, “reverse” its arrows and flip its start and accept states, getting an NFA. Convert that NFA back to a DFA!

Using NFAs in place of DFAs can make proofs about regular languages much easier!

Using NFAs in place of DFAs can make proofs about regular languages much easier! Remember this on homework/exams!

Union Theorem using NFAs?

Union Theorem using NFAs?

Regular Languages are closed under concatenation Concatenation: A B = { vw | v

Regular Languages are closed under concatenation Concatenation: A B = { vw | v A and w B } Given DFAs M 1 and M 2, connect the accept states of M 1 to the start states of M 2 ε 1 0 0, 1 1 0 0 0, 1 0 0 1 0, 1 1 1 0 1 1 ε 0 0 1 L(N) = L(M 1) L(M 2) 0 1

Regular Languages are closed under star A* = { s 1 … sk |

Regular Languages are closed under star A* = { s 1 … sk | k ≥ 0 and each si A } Let M be a DFA, and let L = L(M) We can construct an NFA N that recognizes L* ε 1 0 ε 0, 1 1 0 0 1 ε

Formally, the construction is: Input: DFA M = (Q, Σ, , q 1, F)

Formally, the construction is: Input: DFA M = (Q, Σ, , q 1, F) Output: NFA N = (Q , Σ, , q 0, F ) Q = Q {q 0} F = F {q 0} (q, a) = { (q, a)} if q Q and a ≠ ε {q 1} if q F and a = ε {q 1} if q = q 0 and a = ε if q = q 0 and a ≠ ε else

Regular Languages are Closed Under Star How would we prove that this NFA construction

Regular Languages are Closed Under Star How would we prove that this NFA construction works? Want to show: L(N) = L* 1. L(N) L* 2. L(N) L*

1. L(N) L* Assume w = w 1…wk is in L* where w 1,

1. L(N) L* Assume w = w 1…wk is in L* where w 1, …, wk L We show N accepts w by induction on k Base Cases: k=0 k=1 (w = ε) (w L) Inductive Step: Assume N accepts all strings v = v 1…vk L*, vi L Let u = u 1…ukuk+1 L* , uj L Since N accepts u 1…uk (by induction) and M accepts uk+1, N also accepts u (by construction)

2. L(N) L* Assume w is accepted by N; we want to show w

2. L(N) L* Assume w is accepted by N; we want to show w L* If w = ε, then w L* ε I. H. N accepts u and takes at most k ε-transitions u L* u L(N), so By I. H. Let w be accepted by N with k+1 ε-transitions. Write w as w=uv, where v is the substring read after the last εtransition u L* u ε v accept v L w = uv L*

Closure Properties for Regular Languages Union: A B = { w | w A

Closure Properties for Regular Languages Union: A B = { w | w A or w B } Intersection: A B = { w | w A and w B } Complement: A = { w Σ* | w A } Reverse: AR = { w 1 …wk | wk …w 1 A, wi Σ} Concatenation: A B = { vw | v A and w B } Star: A* = { s 1 … sk | k ≥ 0 and each si A } Theorem: if A and B are regular then so are: A B, A, AR , A B, and A*

Regular Expressions Computation as simple, logical description A totally different way of thinking about

Regular Expressions Computation as simple, logical description A totally different way of thinking about computation: What is the complexity of describing the strings in the language?

Inductive Definition of Regexp Let Σ be an alphabet. We define the regular expressions

Inductive Definition of Regexp Let Σ be an alphabet. We define the regular expressions over Σ inductively: For all ∊ Σ, is a regexp ε is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R 2), and (R 1)* are regexps

Precedence Order: * then + Example: R 1*R 2 + R 3 = (

Precedence Order: * then + Example: R 1*R 2 + R 3 = ( ( R 1* )· R 2) + R 3

Definition: Regexps Represent Languages The regexp ∊ Σ represents the language { } The

Definition: Regexps Represent Languages The regexp ∊ Σ represents the language { } The regexp ε represents {ε} The regexp represents If R 1 and R 2 are regular expressions representing L 1 and L 2 then: (R 1 R 2) represents L 1 L 2 (R 1 + R 2) represents L 1 L 2 (R 1)* represents L 1*

Regexps Represent Languages For every regexp R, define L(R) to be the language that

Regexps Represent Languages For every regexp R, define L(R) to be the language that R represents A string w ∊ Σ* is accepted by R (or, w matches R ) if w ∊ L(R) Examples: 0, 010, and 01010 match (01)*0 110101110100100 matches (0+1)*0

Assume Σ = {0, 1} { w | w has exactly a single 1

Assume Σ = {0, 1} { w | w has exactly a single 1 } 0*10* { w | w contains 001 } (0+1)*001(0+1)*

Assume Σ = {0, 1} What language does the regexp * represent? {ε}

Assume Σ = {0, 1} What language does the regexp * represent? {ε}

Assume Σ = {0, 1} { w | w has length ≥ 3 and

Assume Σ = {0, 1} { w | w has length ≥ 3 and its 3 rd symbol is 0 } (0+1)0(0+1)*

Assume Σ = {0, 1} { w | every odd position in w is

Assume Σ = {0, 1} { w | every odd position in w is a 1 } (1(0 + 1))*(1 + ε)

Assume Σ = {0, 1} { w | w has equal number of occurrences

Assume Σ = {0, 1} { w | w has equal number of occurrences of 01 and 10} = { w | w = 1, w = 0, or w = ε, or w starts with a 0 and ends with a 0, or w starts with a 1 and ends with a 1 } Claim: A string w has equal occurrences of 01 and 10 w starts and ends with the same bit. 1 + 0 + ε + 0(0+1)*0 + 1(0+1)*1

 L can be represented by some regexp L is regular

L can be represented by some regexp L is regular

L can be represented by some regexp L is regular

L can be represented by some regexp L is regular

L can be represented by some regexp L is regular Base Cases (R has

L can be represented by some regexp L is regular Base Cases (R has length 1): Given any regexp R, we will construct an NFA N s. t. N accepts exactly the strings accepted by R R= Proof by induction on the length of the regexp R R=ε R=

Induction Step: Suppose every regexp of length < k represents some regular language. Consider

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1)*

Induction Step: Suppose every regexp of length < k represents some regular language. Consider

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1)* By induction, R 1 and R 2 represent some regular languages, L 1 and L 2 But L(R) = L(R 1 + R 2) = L 1 L 2 so L(R) is regular, by the union theorem!

Induction Step: Suppose every regexp of length < k represents some regular language. Consider

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1)* By induction, R 1 and R 2 represent some regular languages, L 1 and L 2 But L(R) = L(R 1·R 2) = L 1· L 2 so L(R) is regular by the concatenation theorem

Induction Step: Suppose every regexp of length < k represents some regular language. Consider

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1)* By induction, R 1 and R 2 represent some regular languages, L 1 and L 2 But L(R) = L(R 1*) = L 1* so L(R) is regular, by the star theorem

Induction Step: Suppose every regexp of length < k represents some regular language. Consider

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1)* By induction, R 1 and R 2 represent some regular languages, L 1 and L 2 But L(R) = L(R 1*) = L 1* so L(R) is regular, by the star theorem Therefore: If L is represented by a regexp, then L is regular

Give an NFA that accepts the language represented by (1(0 + 1))* ε 1

Give an NFA that accepts the language represented by (1(0 + 1))* ε 1 0, 1 ε Regular expression: ( 1 (0+1))*

Generalized NFAs (GNFA) L can be represented by a regexp L is a regular

Generalized NFAs (GNFA) L can be represented by a regexp L is a regular language Idea: Transform an NFA for L into a regular expression by removing states and re-labeling the arcs with regular expressions Rather than reading in just letters from the string on a step, we can read in entire substrings

Generalized NFA (GNFA) Is aaabcbcba accepted or rejected? Is bcba accepted or rejected? This

Generalized NFA (GNFA) Is aaabcbcba accepted or rejected? Is bcba accepted or rejected? This GNFA recognizes L(a*b(cb)*a)

NFA Add unique start and accept states

NFA Add unique start and accept states

NFA While the machine has more than 2 states: Pick an internal state, rip

NFA While the machine has more than 2 states: Pick an internal state, rip it out and re-label the arrows with regexps, to account for paths through the missing state 0 01*0 1 0

GNFA While the machine has more than 2 states: In general: R(q 1, q

GNFA While the machine has more than 2 states: In general: R(q 1, q 3) q 1 R(q 1, q 2)R(q , q ) 2, q 2)*R(q 2, q 3) + R(q 1, q 3) 1 2 q 2 R(q 2, q 2) R(q 2, q 3) q 3

a q 0 ε a, b b (a*b)(a+b)* a*b q 1 q 2 ε

a q 0 ε a, b b (a*b)(a+b)* a*b q 1 q 2 ε q 3 R(q 0, q 3) = (a*b)(a+b)* represents L(N)

DFAs NFAs DEFINITION Regular Languages Regular Expressions

DFAs NFAs DEFINITION Regular Languages Regular Expressions

Parting thoughts: Regular Languages can be defined by their closure properties NFA=DFA, does it

Parting thoughts: Regular Languages can be defined by their closure properties NFA=DFA, does it mean that non-determinism is free for Finite Automata? Questions?