15 453 FORMAL LANGUAGES AUTOMATA AND COMPUTABILITY PUSHDOWN
15 -453 FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
PUSH-DOWN AUTOMATA AND CONTEXT-FREE GRAMMARS
NONE OF THESE ARE REGULAR Σ = {0, 1}, L = { 0 n 1 n | n ≥ 0 } Σ = {a, b, c, …, z}, L = { w | w = w. R } L = { palindromes} madamimadam L zeus sees suez L Σ = { (, ) }, L = { balanced strings of parens } (), ()(), (()()) are in L, (, ())(() are not in L
NONE OF THESE ARE REGULAR Σ = {0, 1}, L = { 0 n 1 n | n ≥ 0 } Σ = {a, b, c, …, z}, L = { w | w = w. R } L = { palindromes} madamimadam L zeus sees suez L
PUSHDOWN AUTOMATA (PDA) FINITE STATE CONTROL STACK (Last in, first out) INPUT
Input, pop, qi a, b → c push qj In state qi reading symbol a from input Seeing b on top of stack, Replace b (pop) by c (push) and go to state qj If a = ε, make transition without reading input If b = ε, “ “ “ popping from stack If c = ε, “ “ “ writing on stack Non-deterministic
input 0011 pop ε, ε → $ push 11 0011 0, ε → 0 1, 0 → ε ε, $ → ε 1 STACK $ $ $ 0 0 Non-deterministic 1, 0 → ε
input 001 pop ε, ε → $ push 1 001 01 0, ε → 0 1, 0 → ε ε, $ → ε 1, 0 → ε STACK $ $ $ 0 0 PDA that recognizes L = { 0 n 1 n | n ≥ 0 }
Definition: A (non-deterministic) PDA is a tuple P = (Q, Σ, Γ, , q 0, F), where: Q is a finite set of states Σ is the input alphabet Γ is the stack alphabet : Q Σ ε Γ ε → 2 Q Γε q 0 Q is the start state F Q is the set of accept states 2 Q Γε is the set of subsets of Q x Γε where Γε = Γ {ε}
Let w Σ* and suppose w can be written as w 1. . . wn where wi Σε (recall Σε = Σ {ε}) Then P accepts w if there are r 0, r 1, . . . , rn Q and s 0, s 1, . . . , sn Γ* (sequence of stacks) such that • r 0 = q 0 and s 0 = ε (P starts in q 0 with empty stack) • For i = 0, . . . , n-1: (ri+1 , b) (ri, wi+1, a), where si =at and si+1 = bt for some a, b Γε and t Γ* (P moves correctly according to state, stack and symbol read) 3. rn F (P is in an accept state at the end of its input)
EVEN-LENGTH PALINDROMES Σ = {a, b, c, …, z} q 0 ε, ε → $ q 1 , ε → ε, ε → ε q 3 ε, $ → ε q 2 Note: Also accepts the empty string , → ε
CONTEXT-FREE GRAMMARS “Colorless green ideas sleep furiously. ”
CONTEXT-FREE GRAMMARS production rules start variable A → 0 A 1 A→B B→# variables terminals A 0 A 1 00 A 11 00 B 11 00#11 (yields) A * 00#11 (derives) Non-deterministic Derivation We say: 00#11 is generated by the Grammar
CONTEXT-FREE GRAMMARS A → 0 A 1 A→B B→# A → 0 A 1 | B B→#
CONTEXT-FREE GRAMMARS A context-free grammar (CFG) is a tuple G = (V, Σ, R, S), where: V is a finite set of variables Σ is a finite set of terminals (disjoint from V) R is set of production rules of the form A → W, where A V and W (V Σ)* S V is the start variable L(G) = {w Σ* | S * w} Strings Generated by G
CONTEXT-FREE LANGUAGES A context-free grammar (CFG) is a tuple G = (V, Σ, R, S), where: V is a finite set of variables Σ is a finite set of terminals (disjoint from V) R is set of production rules of the form A → W, where A V and W (V Σ)* S V is the start variable G = { {S}, {0, 1}, R, S } L(G) = R = { S → 0 S 1, S → ε }
CONTEXT-FREE LANGUAGES A context-free grammar (CFG) is a tuple G = (V, Σ, R, S), where: V is a finite set of variables Σ is a finite set of terminals (disjoint from V) R is set of production rules of the form A → W, where A V and W (V Σ)* S V is the start variable G = { {S}, {0, 1}, R, S } R = { S → 0 S 1, S → ε } L(G) = { 0 n 1 n | n ≥ 0 } Strings Generated by G
WRITE A CFG FOR EVEN-LENGTH PALINDROMES S → S for all Σ S→ε
WRITE A CFG FOR THE EMPTY SET G = { {S}, Σ, , S }
PARSE TREES A A A B 0 0 # 1 1 A 0 A 1 00 A 11 00 B 11 00#11
<EXPR> → <EXPR> + <EXPR> → <EXPR> x <EXPR> → ( <EXPR> ) <EXPR> → a Build a parse tree for a + a x a <EXPR> <EXPR> a + a x a <EXPR> a + a x a
Definition: a string is derived ambiguously in a context-free grammar if it has more than one parse tree Definition: a grammar is ambiguous if it generates some string ambiguously Can you give an unambiguous grammar with standard arithmetic precedence ?
NOT REGULAR Σ = {0, 1}, L = { 0 n 1 n | n ≥ 0 } But L is CONTEXT FREE A → 0 A 1 A→ε WHAT ABOUT? Σ = {0, 1}, L 1 = { 0 n 1 n 0 m| m, n ≥ 0 } Σ = {0, 1}, L 2 = { 0 n 1 m 0 n| m, n ≥ 0 } Σ = {0, 1}, L 3 = { 0 m 1 n 0 n| m=n ≥ 0 }
THE PUMPING LEMMA FOR CFGs Let L be a context-free language Then there is a P such that if w L and |w| ≥ P then can write w = u v x y z, where: 1. |v y| > 0 2. |v x y| ≤ P 3. For every i ≥ 0, u vi x yi z L
THE PUMPING LEMMA FOR CFGs Let L be a context-free language Then there is a P such that if w L and |w| ≥ P then can write w = u v x y z, where: 1. |v y| > 0 2. |v x y| ≤ P 3. For every i ≥ 0, u vi x yi z L WHAT ABOUT? Σ = {0, 1}, L 3 = { 0 m 1 n 0 n| m=n ≥ 0 }
Idea of Proof: If w is long enough, then any parse tree for w must have a path that contains a variable more than once u v T T R R x y z u v v R x y z y
Formal Proof: Let b be the maximum number of symbols on the right-hand side of any rule If the height of a parse tree is h, the length of the string generated by that tree is at most: bh Let |V| be the number of variables in G Define P = b|V|+1 Let w be a string of length at least P Let T be a parse tree for w with a minimum number of nodes. T must have height at least |V|+1
The longest path in T must have ≥ |V|+1 variables Select R to be the variable that repeats among the lowest |V|+1 variables (in the path) T T 2. |vxy| ≤ PR R 1. |vy| > 0 Let T be a parse tree for w with a minimum R number ofxnodes. y z u v y Tzmust have u vheight at least |V|+1 v x y
THE PUMPING LEMMA FOR CFGs Let L be a context-free language Then there is a P such that if w L and |w| ≥ P then can write w = u v x y z, where: 1. |v y| > 0 2. |v x y| ≤ P 3. For every i ≥ 0, u vi x yi z L WHAT ABOUT? Σ = {0, 1}, L 3 = { 0 m 1 n 0 n| m=n ≥ 0 }
PDAs ARE EQUIVALENT TO CFGs
A Language L is generated by a CFG L is recognized by a PDA
Suppose L is generated by a CFG G = (V, Σ, R, S) Construct P = (Q, Σ, Γ, , q, F) that recognizes L A Language L is generated by a CFG L is recognized by a PDA
Suppose L is generated by a CFG G = (V, Σ, R, S) Construct P = (Q, Σ, Γ, , q, F) that recognizes L ε, ε → S$ For each rule 'A → w’ R: ε, $ → ε ε, A → w. R For each terminal a Σ: a, a → ε
Suppose L is generated by a CFG G = (V, Σ, R, S) Describe P = (Q, Σ, Γ, , q, F) that recognizes L : (1) Push $ and then S on the stack (2) Repeat the following steps forever: (a) Pop the stack, call the result X. (b) If X is a variable A, guess a rule that matches A and push result into the stack (c) If X is a terminal, read next symbol from input and compare it to terminal. If they’re different, reject. (d) If X is $: then accept iff no more input
A Language L is generated by a CFG L is recognized by a PDA
A Language L is generated by a CFG L is recognized by a PDA Given PDA P = (Q, Σ, Γ, , q, F) Construct a CFG G = (V, Σ, R, S) such that L(G) = L(P) First, simplify P to have the following form: (1) It has a unique accept state, qacc (2) It empties the stack before accepting (3) Each transition either pushes a symbol or pops a symbol, but not both at the same time
Idea For Our Grammar G: For every pair of states p and q in PDA P, G will have a variable Apq whose production rules will generate all strings x that can take: P from p with an empty stack to q with an empty stack V = {Apq | p, q Q } S = Aq 0 qacc
ε, ε → E Q q 0 ε, ε → $ q 1 ε, ε → D 1, 0 → ε q’ 0 q’ 3 q 3 ε, $ → ε ε, ε → D q 4 ε, E → ε q 2 1, 0 → ε Aq 0 q 1 generates? ε, D → ε ε, σ → ε 0, ε → 0 q 5 Aq 1 q 2 generates? {0 n 1 n | n > 0} Aq 1 q 3 generates? WANT: Apq generates all strings that take p with an empty stack to q with empty stack
ε, ε → E Q q 0 ε, ε → $ q 1 ε, ε → D 1, 0 → ε q’ 0 q’ 3 q 3 ε, $ → ε ε, ε → D q 4 ε, E → ε q 2 1, 0 → ε Aq 0 q 1 generates? ε, D → ε ε, σ → ε 0, ε → 0 q 5 Aq 1 q 2 generates? {0 n 1 n | n > 0} AQq 5 generates? WANT: Apq generates all strings that take p with an empty stack to q with empty stack
WANT: Apq generates all strings that take p with an empty stack to q with empty stack Let x be such a string • P’s first move on x must be a push • P’s last move on x must be a pop Two possibilities: 1. The symbol popped at the end is exactly the one pushed at the beginning 2. The symbol popped at the end is not the one pushed at the beginning
x = ayb takes p with empty stack to q with empty stack 1. The symbol t popped at the end is exactly the one pushed at the beginning stack height input string r s a push t pop t b p ────x──── q δ(p, a, ε) → (r, t) δ(s, b, t) → (q, ε) Apq → a. Arsb
2. The symbol popped at the end is not the one pushed at the beginning stack height input string p r Apq → Apr. Arq q
Formally: V = {Apq | p, q Q } S = Aq 0 qacc For every p, q, r, s Q, t Γ and a, b Σε If (r, t) (p, a, ε) and (q, ε) (s, b, t) Then add the rule Apq → a. Arsb For every p, q, r Q, add the rule Apq → Apr. Arq For every p Q, add the rule App → ε
For all x, Apq generates x x can bring P from p with an empty stack to q with an empty stack Proof (by induction on the number of steps in the derivation of x from Apq): Inductive Step: Base Case: The derivation has 1 step: App ε Assume true for derivations of length ≤ k and prove true for derivations of length k+1: Apq * x in k+1 steps
For all x, Apq generates x x can bring P from p with an empty stack to q with an empty stack Proof (by induction on the number of steps in the derivation of x from Apq): Inductive Step: Assume true for derivations of length ≤ k and prove true for derivations of length k+1: Apq * x in k+1 steps Case 1: First step in derivation: Apq → Apr. Arq * x so x = yz where Apr * y & Arq * z Now use induction hypothesis
For all x, Apq generates x x can bring P from p with an empty stack to q with an empty stack Proof (by induction on the number of steps in the derivation of x from Apq): Inductive Step: Assume true for derivations of length ≤ k and prove true for derivations of length k+1: Apq * x in k+1 steps Case 2: Apq → a. Arsb so x = ayb where Ars * y By induction, y can bring P from r with empty stack to s with empty stack (or with any symbol t on top stack to t on top stack).
For all x, Apq generates x x can bring P from p with an empty stack to q with an empty stack Proof (by induction on the number of steps in the derivation of x from Apq): Inductive Step: Assume true for derivations of length ≤ k and prove true for derivations of length k+1: Apq * x in k+1 steps Case 2: Apq → a. Arsb so x = ayb where Ars * y Because Apq → a. Arsb (r, t) (p, a, ε) and (q, ε) (s, b, t) is a rule, there is a t: state push state alphabet pop
For all x, Apq generates x x can bring P from p with an empty stack to q with an empty stack Proof (by induction on the number of steps in the computation of P from p to q with empty stacks, on input x): Base Case: The computation has 0 steps So it starts and ends in the same state: We must show that App * x But it must be that x = ε, so we are done
Inductive Step: Assume true for computations of length ≤ k, we’ll prove true for computations of length k+1 Suppose that P has a computation where x brings p to q with empty stacks in k+1 steps Two cases: 1. The stack is empty somewhere in the middle of the computation, say at state r. Write x as yz (where the stack is empty after reading y from p to r and after reading z from r to q). So, Apr * y, Arq * z by I. H. Recall, Apq → Apr. Arq is a rule (by construction). So Apq → Apr. Arq → yz. So Apq * x
Inductive Step: Assume true for computations of length ≤ k, we’ll prove true for computations of length k+1 Suppose that P has a computation where x brings p to q with empty stacks in k+1 steps Two cases: 2. The stack is empty only at the beginning and the end of this computation Write x as ayb. There must be states r (immediately following p) and s (immediately before q) where y goes from r on empty stack to s on empty stack. p r…s q (why? ) So, Ars * y by I. H. Now, P must push some t at p and pop t at s. (why? ) So grammar must have rule Apq → a. Arsb. So Apq * x
A Language L is generated by a CFG L is recognized by a PDA
Corollary: Every regular language is context-free
- Slides: 61