Chomsky Normal Form CNF of CFGs Purpose Definition

  • Slides: 44
Download presentation
Chomsky Normal Form (CNF) of CFG’s Purpose Definition Method of Construction 1

Chomsky Normal Form (CNF) of CFG’s Purpose Definition Method of Construction 1

CNF: Purpose • A construct used to establish properties of context-free languages (CFLs) •

CNF: Purpose • A construct used to establish properties of context-free languages (CFLs) • Every CFL without e can be generated by a CFG in Chomsky normal form. • To show that a language without e is a CFL it is sufficient to show that it has a CFG in Chomsky normal form. • Typical approach to closure properties 2

CNF: Definition A context free grammar (CFG) in which all production are of the

CNF: Definition A context free grammar (CFG) in which all production are of the form A->BC or A->a, where A, B and C are variables and a is a terminal 3

CNF construction: 3 elimination task Eliminate “useless” symbols Variables or terminals that do not

CNF construction: 3 elimination task Eliminate “useless” symbols Variables or terminals that do not appear in any derivation of a terminal string from the start symbol Eliminate e-productions A->e Eliminate unit-productions A->B for variables A and B 4

Generating and Reachable Symbols • X is generating if X =>* w (terminal string)

Generating and Reachable Symbols • X is generating if X =>* w (terminal string) • If X is a terminal, then it can generate itself in zero steps. • X is reachable if S =>* Xb for some and b, (S is a start symbol) • Any symbol that is not generating and reachable is useless 5

Induction to find generating variables Basis: If there is a production A -> w,

Induction to find generating variables Basis: If there is a production A -> w, where w is a terminal string, then A is generating. Induction: If there is a production A -> , where consists only of terminals and variables known to derive a terminal string, then A derives a terminal string; hence is generating. 6

Algorithm to eliminate nongenerating variables 1. Discover all variables that derive terminal strings. 2.

Algorithm to eliminate nongenerating variables 1. Discover all variables that derive terminal strings. 2. For all other variables, remove all productions in which they appear either on the LHS or RHS of ->. 7

Exercise 7. 1. 1 text p 275 S->AB|CA A->a B->BC|AB C->a. B|b Eliminate non-generating

Exercise 7. 1. 1 text p 275 S->AB|CA A->a B->BC|AB C->a. B|b Eliminate non-generating variables. Do on board 8

Exercise 7. 1. 1 text p 275 S->AB|CA generating because A and C are

Exercise 7. 1. 1 text p 275 S->AB|CA generating because A and C are generating A->a generating B->BC|AB C->a. B|b generating B is a non-generating variable. No way it can be used to generate a terminal string. Remove all productions that involve B on either side of ->. New CFG with only useful variables A->a C->b S->CA 9

Eliminating non-generating variables may lead to unreachable variables. Example: S->AB|C, A->a. A|a, B->b. B,

Eliminating non-generating variables may lead to unreachable variables. Example: S->AB|C, A->a. A|a, B->b. B, C->c A and C are generating. Why? S is generating. Why? B is not generating. Why? What remains after eliminating B? 10

Eliminating non-generating variables may lead to unreachable variables. Example: S->AB|C, A->a. A|a, B->b. B,

Eliminating non-generating variables may lead to unreachable variables. Example: S->AB|C, A->a. A|a, B->b. B, C->c A and C are generating. A->a and C->c. S is generating. S->C. B is not generating. Cannot be used to generate a terminal sting What remains after eliminating production with B? S->C and C->c A->a. A|a unreachable 11

Finding reachable symbols Basis: Obviously, start symbol is reachable. Induction: if we can reach

Finding reachable symbols Basis: Obviously, start symbol is reachable. Induction: if we can reach A, and there is a production A-> , then we can reach all the symbols in . 12

Epsilon Productions Theorem: If L is a CFL with no empty string, then it

Epsilon Productions Theorem: If L is a CFL with no empty string, then it has a CFG which can be put in CNF with no e-productions. A->e is clearly an e-production. To eliminate all types e-productions, we must first discover the nullable variables, i. e. variables B such that B =>* ε. 13

Inductive definition of nullable symbols Basis: If there is a production A -> ε,

Inductive definition of nullable symbols Basis: If there is a production A -> ε, then A is nullable. Induction: If there is a production A -> , and all symbols in are nullable, then A is nullable. 14

Example: Nullable Symbols S->AB, A->a. A|ε, B->b. B|A A is nullable because of A

Example: Nullable Symbols S->AB, A->a. A|ε, B->b. B|A A is nullable because of A -> ε. B is nullable because of B -> A. S is nullable because of S -> AB. 15

Algorithm to eliminate e-productions Identify all nullable symbols. Consider each production A->X 1…Xn that

Algorithm to eliminate e-productions Identify all nullable symbols. Consider each production A->X 1…Xn that contains nullable symbols If A->X 1…Xn contains m<n nullable symbols Construct a family of productions with 2 m members that are all combinations of nullable symbols present or absent If m=n exclude case with all symbols absent 16

Eliminating e-productions The new CFG with no e-productions consist of all families of productions

Eliminating e-productions The new CFG with no e-productions consist of all families of productions derived from productions with nullable symbols plus, All productions from the original CFG that did not contain nullable symbols 17

Example: Eliminating ε-Productions S->ABC, A->a. A|ε, B->b. B|ε, C->ε Which variables are nullable and

Example: Eliminating ε-Productions S->ABC, A->a. A|ε, B->b. B|ε, C->ε Which variables are nullable and why? What family of productions comes from S->ABC? What family comes from A->a. A? What family comes from B->b. B? Do on board 18

Example: Eliminating ε-Productions S->ABC, A->a. A|ε, B->b. B|ε, C->ε A, B, C, and S

Example: Eliminating ε-Productions S->ABC, A->a. A|ε, B->b. B|ε, C->ε A, B, C, and S are all nullable. Productions S->ABC|AB|AC|BC|A|B|C come from S->ABC Productions A->a. A|a come from A->a. A Productions B->b. B|b come from B->b. B 19

Example: Eliminating ε-Productions S->ABC, A->a. A|ε, B->b. B|ε, C->ε Any productions from original CFG?

Example: Eliminating ε-Productions S->ABC, A->a. A|ε, B->b. B|ε, C->ε Any productions from original CFG? Yes A->e, B->e, C->e Remove these S -> ABC | AB | AC | BC | A | B | C A -> a. A | a B -> b. B | b What is the effect of eliminating C->e? 20

Eliminating ε-Productions continued C is not generating Eliminate C in productions of the new

Eliminating ε-Productions continued C is not generating Eliminate C in productions of the new CFG S -> ABC | AB | AC | BC | A | B | C A -> a. A | a B -> b. B | b 21

Define Unit Productions A unit production is a production whose right side consists of

Define Unit Productions A unit production is a production whose right side consists of exactly one variable. A->a is not a unit production because a is terminal Eliminating unit production by expansion is the most common approach 22

Eliminate by expansion In the CFG defined by E->T|E+T T->F|T*F F->I|(E) I->a|Ia What are

Eliminate by expansion In the CFG defined by E->T|E+T T->F|T*F F->I|(E) I->a|Ia What are the unit productions? 23

Eliminate by expansion In the CFG defined by E->T|E+T T->F|T*F F->I|(E) I->a|Ia In a

Eliminate by expansion In the CFG defined by E->T|E+T T->F|T*F F->I|(E) I->a|Ia In a sequence of unit productions, elimination by expansion starts at the bottom. Do on board 24

Eliminate by expansion In the CFG defined by E->T|E+T T->F|T*F F->I|(E) I->a|Ia Keep I->a|Ia

Eliminate by expansion In the CFG defined by E->T|E+T T->F|T*F F->I|(E) I->a|Ia Keep I->a|Ia Expand F->I|(E): F->a|Ia|(E) Expand T->F|T*F: T->a|Ia|(E)|T*F Expand E->T|E+T: E->a|Ia|(E)|T*F|E+T 25

Cleaning up a CFG Theorem: if L is a CFL, then there is a

Cleaning up a CFG Theorem: if L is a CFL, then there is a CFG for L – {ε} that has: • No ε-productions. • No unit productions. • No useless symbols. 26

Proof by construction Start with a CFG for L. Perform the following steps in

Proof by construction Start with a CFG for L. Perform the following steps in order: 1. Eliminate ε-productions. (most be 1 st because it can create unit production and useless variables) 2. Eliminate unit productions. 3. Eliminate variables that derive no terminal strings. 4. Eliminate variables not reachable from the start symbol. 27

Chomsky Normal Form In addition to being cleaned up, a CFG is said to

Chomsky Normal Form In addition to being cleaned up, a CFG is said to be in Chomsky Normal Form if every production is of one of two forms: A -> BC (right side is two variables). A -> a (right side is a single terminal). Theorem: If L is a CFL, then L – {ε} has a CFG in CNF. 28

Proof by construction Step 1: “Clean” the CFG, so every production has right side

Proof by construction Step 1: “Clean” the CFG, so every production has right side either a single terminal or a combination of terminals and variables with length >2. Step 2: For each right side not a single terminal, make the right side all variables. If terminal a prevents RHS from being all variables, create new variable Aa and production Aa -> a. Replace a by Aa in right sides of productions. 29

Example: Step 2 Consider production A -> Bc. De. We need variables Ac and

Example: Step 2 Consider production A -> Bc. De. We need variables Ac and Ae. with productions Ac -> c and Ae -> e. Replace A -> Bc. De by A -> BAc. DAe. If c and/or e occur in other production, replace then by Ac and/or Ae 30

Clean to CNF Step 2: For each right side not a single terminal, make

Clean to CNF Step 2: For each right side not a single terminal, make the right side all variables. Step 3: Break right sides longer than 2 into a chain of productions with right sides of two variables using “cascade of productions” text p 273 Do not combine steps 2 and 3. Show all strings of variables before applying cascading productions Example of cascading productions: A->B 1 B 2 B 3 B 4 is replaced by A->B 1 C 1, C 1 ->B 2 C 2, and C 2 ->B 3 B 4. 31

Cascade of productions is required There are many ways to get RHS with 2

Cascade of productions is required There are many ways to get RHS with 2 variables “cascade of productions” is a unique result Note in the previous example, A->B 1 B 2 B 3 B 4 replaced by A->B 1 C 1, C 1 ->B 2 C 2, and C 2 ->B 3 B 4 that the 1 st variable on RHS of the new productions is in the same order as in the original production. Example: A->B 1 B 2 B 3 B 4 B 5 is replaced by? Do on board 32

Example: A->B 1 B 2 B 3 B 4 B 5 A->B 1 C

Example: A->B 1 B 2 B 3 B 4 B 5 A->B 1 C 1 C 1 ->B 2 C 2 C 2 ->B 3 C 3 C 3 ->B 4 B 5 33

Assignment 13 Exercise 7. 1. 2 text p 275 and 277 S->ASB|e A->a. AS|a

Assignment 13 Exercise 7. 1. 2 text p 275 and 277 S->ASB|e A->a. AS|a B->Sb. S|A|bb Clean and convert to CNF 34

Example: Ex 7. 8 text p 266 Clean the following CFG S->AB A->a. AA|e

Example: Ex 7. 8 text p 266 Clean the following CFG S->AB A->a. AA|e B->b. BB|e Perform the following steps in order: 1. Eliminate ε-productions. 2. Eliminate unit productions. 3. Eliminate variables that derive no terminal strings. 4. Eliminate variables not reachable from the start symbol. Do on board 35

Convert cleaned version of CFG to CNF For each right side not a single

Convert cleaned version of CFG to CNF For each right side not a single terminal, make the right side all variables. Break right sides longer than 2 into a chain of productions with right sides of two variables using “cascade of productions” Do on board 37

Sometimes elimination of unit production by expansion does not work Will not work on

Sometimes elimination of unit production by expansion does not work Will not work on cycles of unit productions A->B, B->C, and C->A Alternative: find all pairs (A, B) such that A=>*B by a sequence of unit productions If B-> is a non-unit production, then add production A-> and drop all the unit productions in the sequence A=>*B. (i. e. , A-> directly instead of through B via unit productions) 40

Pair search defined by induction Find all pairs (A, B) such that A=>*B by

Pair search defined by induction Find all pairs (A, B) such that A=>*B by a sequence of unit productions only. Basis: A=>*A, therefore (A, A) selected. Induction: If we have found (A, B), and B->C is a unit production, then add (A, C) to the pair list. 41

Example of pair search for CFG E->T|E+T T->F|T*F F->I|(E) I->a|Ia [E, E] basis [T,

Example of pair search for CFG E->T|E+T T->F|T*F F->I|(E) I->a|Ia [E, E] basis [T, T] basis [F, F] basis [I, I] basis [E, T] E->T [T, F] T->F [F, I] F->I [E, F] T->F [T, I] F->I [E, I] F->I Associate each pair with a non-unit production 42

Combine pair search with non-unit productions (E, E) E->E+T (E, T) E->T*F (E, F)

Combine pair search with non-unit productions (E, E) E->E+T (E, T) E->T*F (E, F) E->(E) (E, I) E->a|Ia (T, T) T->T*F (T, F) T->(E) (T, I) T->a|a. I (F, F) F->(E) (F, I) F->a|Ia (I, I) I->a|Ia Original CFG with unit productions E->T|E+T T->F|T*F F->I|(E) I->a|Ia New CFG with no unit productions E->E+T|T*F|(E)|a|Ia T->T*F|(E)|a|Ia F->(E)|a|a. I I->a|Ia same as by expansion 43

Quiz 5 Wednesday 11/10/21 44

Quiz 5 Wednesday 11/10/21 44