Context Free Grammars Xiaoyin Wang CS 5363 Spring

  • Slides: 96
Download presentation
Context Free Grammars Xiaoyin Wang CS 5363 Spring 2019 1

Context Free Grammars Xiaoyin Wang CS 5363 Spring 2019 1

Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016 2

Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016 2

Today’s Class • • Derivation Trees Ambiguity Normal Forms CYK Algorithm 3

Today’s Class • • Derivation Trees Ambiguity Normal Forms CYK Algorithm 3

Derivation Trees • Illustrate the derivation of a certain sentence from a grammar •

Derivation Trees • Illustrate the derivation of a certain sentence from a grammar • Different derivation orders may generate same derivation tree 4

Derivation Order • Consider the following example grammar with 5 productions: 5

Derivation Order • Consider the following example grammar with 5 productions: 5

Leftmost derivation order of string : At each step, we substitute the leftmost variable

Leftmost derivation order of string : At each step, we substitute the leftmost variable 6

Rightmost derivation order of string : At each step, we substitute the rightmost variable

Rightmost derivation order of string : At each step, we substitute the rightmost variable 7

Leftmost derivation of Rightmost derivation of : : 8

Leftmost derivation of Rightmost derivation of : : 8

Derivation Trees Consider the same example grammar: And a derivation of : 9

Derivation Trees Consider the same example grammar: And a derivation of : 9

yield 10

yield 10

yield 11

yield 11

yield 12

yield 12

yield 13

yield 13

Derivation Tree (parse tree) yield 14

Derivation Tree (parse tree) yield 14

Sometimes, derivation order doesn’t matter Leftmost derivation: • Rightmost derivation: Give same derivation tree

Sometimes, derivation order doesn’t matter Leftmost derivation: • Rightmost derivation: Give same derivation tree 15

Ambiguity • A grammar can have multiple derivation tree to derive a certain sentence

Ambiguity • A grammar can have multiple derivation tree to derive a certain sentence • Classification: inherent ambiguity and noninherent ambiguity 16

Grammar for mathematical expressions Example strings: Denotes any number 17

Grammar for mathematical expressions Example strings: Denotes any number 17

A leftmost derivation for 18

A leftmost derivation for 18

Another leftmost derivation for 19

Another leftmost derivation for 19

Two derivation trees for 20

Two derivation trees for 20

take 21

take 21

Good Tree Bad Tree Compute expression result using the tree 22

Good Tree Bad Tree Compute expression result using the tree 22

Two different derivation trees may cause problems in applications which use the derivation trees:

Two different derivation trees may cause problems in applications which use the derivation trees: • Evaluating expressions • In general, in compilers for programming languages 23

Ambiguous Grammar: A context-free grammar if there is a string is ambiguous which has:

Ambiguous Grammar: A context-free grammar if there is a string is ambiguous which has: two different derivation trees or two leftmost derivations (Two different derivation trees give two different leftmost derivations and vice-versa) 24

Example: this grammar is ambiguous since string has two derivation trees 25

Example: this grammar is ambiguous since string has two derivation trees 25

this grammar is ambiguous also because string has two leftmost derivations 26

this grammar is ambiguous also because string has two leftmost derivations 26

Another ambiguous grammar: IF_STMT if EXPR then STMT else STMT Variables Terminals Very common

Another ambiguous grammar: IF_STMT if EXPR then STMT else STMT Variables Terminals Very common piece of grammar in programming languages 27

If expr 1 then if expr 2 then stmt 1 else stmt 2 IF_STMT

If expr 1 then if expr 2 then stmt 1 else stmt 2 IF_STMT if expr 1 if then expr 2 STMT then expr 1 if then expr 2 else stmt 2 Two derivation trees IF_STMT if stmt 1 STMT then else stmt 2 stmt 1 28

In general, ambiguity is bad and we want to remove it Sometimes it is

In general, ambiguity is bad and we want to remove it Sometimes it is possible to find a non-ambiguous grammar for a language But, in general we cannot do so 29

A successful example: Ambiguous Grammar Equivalent Non-Ambiguous Grammar generates the same language 30

A successful example: Ambiguous Grammar Equivalent Non-Ambiguous Grammar generates the same language 30

Unique derivation tree for 31

Unique derivation tree for 31

An un-successful example: is inherently ambiguous: every grammar that generates this language is ambiguous

An un-successful example: is inherently ambiguous: every grammar that generates this language is ambiguous 32

Ambiguity: Summary • A grammar can have multiple parser tree to derive a certain

Ambiguity: Summary • A grammar can have multiple parser tree to derive a certain sentence • Inherent ambiguous language – All grammars are ambiguous • Non-inherent ambiguous language – There exist at least one grammar that is not ambiguous • Checking ambiguity of a grammar or a language: un-decidable problem 33

Today’s Class • • Derivation Trees Ambiguity Normal Forms CYK Algorithm 34

Today’s Class • • Derivation Trees Ambiguity Normal Forms CYK Algorithm 34

Normal Forms • Chomsky Normal Forms • BNF Normal Forms 35

Normal Forms • Chomsky Normal Forms • BNF Normal Forms 35

A context free grammar is said to be in Chomsky Normal Form if all

A context free grammar is said to be in Chomsky Normal Form if all productions are in the following form: A → BC A→α • A, B and C are non terminal symbols • α is a terminal symbol 36

There are three preliminary simplifications 1 Eliminate Useless Symbols 2 Eliminate ε productions 3

There are three preliminary simplifications 1 Eliminate Useless Symbols 2 Eliminate ε productions 3 Eliminate unit productions 37

Eliminate Useless Symbols We need to determine if the symbol is useful by identifying

Eliminate Useless Symbols We need to determine if the symbol is useful by identifying if a symbol is generating and is reachable • X is generating if X * ω for some terminal string ω. • X is reachable if there is a derivation X * αXβ for some α and β 38

Example: Removing non-generating symbols S → AB | a A→b Initial CFL grammar S

Example: Removing non-generating symbols S → AB | a A→b Initial CFL grammar S → AB | a A→b Identify generating symbols S→a A→b Remove non-generating 39

Example: Removing non-reachable symbols S→a A→b Identify reachable symbols S→a Eliminate non-reachable 40

Example: Removing non-reachable symbols S→a A→b Identify reachable symbols S→a Eliminate non-reachable 40

The order is important? Looking first for non-reachable symbols and then for non-generating symbols

The order is important? Looking first for non-reachable symbols and then for non-generating symbols can still leave some useless symbols. S → AB | a A→b S→a A→b 41

Finding generating symbols If there is a production A → α, and every symbol

Finding generating symbols If there is a production A → α, and every symbol of α is already known to be generating. Then A is generating S → AB | a A→b We cannot use S → AB because B has not been established to be generating 42

Finding reachable symbols S is surely reachable. All symbols in the body of a

Finding reachable symbols S is surely reachable. All symbols in the body of a production with S in the head are reachable. S → AB | a A→b In this example the symbols {S, A, B, a, b} are reachable. 43

There are three preliminary simplifications 1 Eliminate Useless Symbols 2 Eliminate ε productions 3

There are three preliminary simplifications 1 Eliminate Useless Symbols 2 Eliminate ε productions 3 Eliminate unit productions 44

Eliminate ε Productions • In a grammar ε productions are convenient but not essential

Eliminate ε Productions • In a grammar ε productions are convenient but not essential • If L has a CFG, then L – {ε} has a CFG A * ε Nullable variable 45

If A is a nullable variable • Whenever A appears on the body of

If A is a nullable variable • Whenever A appears on the body of a production A might or might not derive ε S → ASA | a. B A→B|S B→b|ε Nullable: {A, B} 46

Eliminate ε Productions • Create two version of the production, one with the nullable

Eliminate ε Productions • Create two version of the production, one with the nullable variable and one without it • Eliminate productions with ε bodies S → ASA | a. B A→B|S B→b|ε S → ASA | a. B | AS | SA | S | a A→B|S|ε B→b|ε 47

Eliminate ε Productions • Create two version of the production, one with the nullable

Eliminate ε Productions • Create two version of the production, one with the nullable variable and one without it • Eliminate productions with ε bodies S → ASA | a. B A→B|S B→b|ε S → ASA | a. B | AS | SA | S | a A→B|S|ε B→b|ε 48

Eliminate ε Productions • Create two version of the production, one with the nullable

Eliminate ε Productions • Create two version of the production, one with the nullable variable and one without it • Eliminate productions with ε bodies S → ASA | a. B A→B|S B→b|ε S → ASA | a. B | AS | SA | S | a A→B|S|ε B→b|ε 49

There are three preliminary simplifications 1 Eliminate Useless Symbols 2 Eliminate ε productions 3

There are three preliminary simplifications 1 Eliminate Useless Symbols 2 Eliminate ε productions 3 Eliminate unit productions 50

Eliminate unit productions A unit production is one of the form A → B

Eliminate unit productions A unit production is one of the form A → B where both A and B are variables Identify unit pairs A * B A → B, B → ω, then A → ω 51

Example: T = {*, +, (, ), a, b, 0, 1} I → a

Example: T = {*, +, (, ), a, b, 0, 1} I → a | b | Ia | Ib | I 0 | I 1 F → I | (E) T→F|T*F E→T|E+T Basis: (A, A) is a unit pair of any variable A, if A * A by 0 steps. Pairs Productions ( E, E ) E→E+T ( E, T ) E→T*F ( E, F ) E → (E) ( E, I ) E → a | b | Ia | Ib | I 0 | I 1 ( T, T ) T→T*F ( T, F ) T → (E) ( T, I ) T → a | b | Ia |Ib | I 0 | I 1 ( F, F ) F → (E) ( F, I ) F → a | b | Ia | Ib | I 0 | I 1 ( I, I ) I → a | b | Ia | Ib | I 0 | I 1 52

Example: Pairs Productions … … ( T, T ) T→T*F ( T, F )

Example: Pairs Productions … … ( T, T ) T→T*F ( T, F ) T → (E) ( T, I ) T → a | b | Ia |Ib | I 0 | I 1 … … I → a | b | Ia | Ib | I 0 | I 1 E → E + T | T * F | (E ) | a | b | la | lb | l 0 | l 1 T → T * F | (E) | a | b | Ia | Ib | I 0 | I 1 F → (E) | a | b | Ia | Ib | I 0 | I 1 53

Chomsky Normal Form (CNF) Starting with a CFL grammar with the preliminary simplifications performed

Chomsky Normal Form (CNF) Starting with a CFL grammar with the preliminary simplifications performed 1. Arrange that all bodies of length 2 or more to consists only of variables. 2. Break bodies of length 3 or more into a cascade of productions, each with a body consisting of two variables. 54

Step 1: For every terminal α that appears in a body of length 2

Step 1: For every terminal α that appears in a body of length 2 or more create a new variable that has only one production. E → E + T | T * F | (E ) | a | b | la | lb | l 0 | l 1 T → T * F | (E) | a | b | Ia | Ib | I 0 | I 1 F → (E) | a | b | Ia | Ib | I 0 | I 1 I → a | b | Ia | Ib | I 0 | I 1 E → EPT | TMF | LER | a | b | l. A | l. B | l. Z | l. O T → TMF | LER | a | b | IA | IB | IZ | IO F → LER | a | b | IA | IB | IZ | IO I → a | b | IA | IB | IZ | IO A→a B→b Z→ 0 O→ 1 P→+ M→* L→( R→) 55

Step 2: Break bodies of length 3 or more adding more variables E →

Step 2: Break bodies of length 3 or more adding more variables E → EPT | TMF | LER | a | b | l. A | l. B | l. Z | l. O T → TMF | LER | a | b | IA | IB | IZ | IO C 1 → PT F → LER | a | b | IA | IB | IZ | IO C 2 → MF I → a | b | IA | IB | IZ | IO C 3 → ER A→a B→b Z→ 0 O→ 1 P→+ M→*L→( R→) 56

Normal Forms • Chomsky Normal Forms • BNF Normal Forms 57

Normal Forms • Chomsky Normal Forms • BNF Normal Forms 57

BNF • BNF stands for either Backus-Naur Form or Backus Normal Form • BNF

BNF • BNF stands for either Backus-Naur Form or Backus Normal Form • BNF is used to describe the grammar of a programming language • BNF is formal and precise – BNF is a notation for context-free grammars • BNF is essential in compiler construction 58

BNF • < > indicate a nonterminal that needs to be further expanded, e.

BNF • < > indicate a nonterminal that needs to be further expanded, e. g. <variable> • Symbols not enclosed in < > are terminals; they represent themselves, e. g. if, while, ( • The symbol : : = means is defined as • The symbol | means or; it separates alternatives, e. g. <addop> : : = + | • This is all there is to “plain” BNF; but we will discuss extended BNF (EBNF) later in this lecture 59

BNF uses recursion • <integer> : : = <digit> | <integer> <digit> <integer> :

BNF uses recursion • <integer> : : = <digit> | <integer> <digit> <integer> : : = <digit> | <digit> <integer> or • Recursion is all that is needed (at least, in a formal sense) • "Extended BNF" allows repetition as well as recursion • Repetition is usually better when using BNF to construct a compiler 60

BNF Examples I • <digit> : : = 0|1|2|3|4|5|6|7|8|9 • <if statement> : :

BNF Examples I • <digit> : : = 0|1|2|3|4|5|6|7|8|9 • <if statement> : : = if ( <condition> ) <statement> | if ( <condition> ) <statement> else <statement> 61

BNF Examples II • <unsigned integer> : : = <digit> | <unsigned integer> <digit>

BNF Examples II • <unsigned integer> : : = <digit> | <unsigned integer> <digit> • <integer> : : = <unsigned integer> | + <unsigned integer> | - <unsigned integer> 62

BNF Examples III • <identifier> : : = <letter> | <identifier> <letter> | <digit>

BNF Examples III • <identifier> : : = <letter> | <identifier> <letter> | <digit> <identifier> • <block> : : = { <statement list> } • <statement list> : : = <statement> | <statement list> <statement> 63

BNF Examples IV • <statement> : : = <block> | <assignment statement> | <break

BNF Examples IV • <statement> : : = <block> | <assignment statement> | <break statement> | <continue statement> | <do statement> | <for loop> | <goto statement> | <if statement> |. . . 64

Extended BNF • The following are pretty standard: – [ ] enclose an optional

Extended BNF • The following are pretty standard: – [ ] enclose an optional part of the rule • Example: <if statement> : : = if ( <condition> ) <statement> [ else <statement> ] – { } mean the enclosed can be repeated any number of times (including zero) • Example: <parameter list> : : = ( ) | ( { <parameter> , } <parameter> ) 65

Variations • The preceding notation is the original and most common notation – BNF

Variations • The preceding notation is the original and most common notation – BNF was designed before we had boldface, color, more than one font, etc. – A typical modern variation might: – Use boldface to indicate multi-character terminals – Quote single-character terminals (because boldface isn’t so obvious in this case) • Example: – if_statement : : = if "(" condition ")" statement [ else statement ] 66

Limitations of BNF • No easy way to impose length limitations, such as maximum

Limitations of BNF • No easy way to impose length limitations, such as maximum length of variable names • No easy way to describe ranges, such as 1 to 31 • No way at all to impose distributed requirements, such as, a variable must be declared before it is used • Describes only syntax, not semantics 67

Today’s Class • • Derivation Trees Ambiguity Normal Forms CYK Algorithm 68

Today’s Class • • Derivation Trees Ambiguity Normal Forms CYK Algorithm 68

The CYK Algorithm • The membership problem: – Problem: • Given a context-free grammar

The CYK Algorithm • The membership problem: – Problem: • Given a context-free grammar G and a string w – G = (V, ∑ , P , S) where » V finite set of variables » ∑ (the alphabet) finite set of terminal symbols » P finite set of rules » S start symbol (distinguished element of V) » V and ∑ are assumed to be disjoint – G is used to generate the string of a language – Question: • Is w in L(G)? 69

The CYK Algorithm • J. Cocke • D. Younger, • T. Kasami – Independently

The CYK Algorithm • J. Cocke • D. Younger, • T. Kasami – Independently developed an algorithm to answer this question. 70

The CYK Algorithm Basics – The Structure of the rules in a Chomsky Normal

The CYK Algorithm Basics – The Structure of the rules in a Chomsky Normal Form grammar – Uses a “dynamic programming” or “table-filling algorithm” 71

Chomsky Normal Form • Normal Form is described by a set of conditions that

Chomsky Normal Form • Normal Form is described by a set of conditions that each rule in the grammar must satisfy • Context-free grammar is in CNF if each rule has one of the following forms: – A BC at most 2 symbols on right side – A a, or terminal symbol –S ε null string where B, C Є V – {S} 72

Construct a Triangular Table • Each row corresponds to one length of substrings –

Construct a Triangular Table • Each row corresponds to one length of substrings – Bottom Row – Strings of length 1 – Second from Bottom Row – Strings of length 2. . – Top Row – string ‘w’ 73

Construct a Triangular Table • Xi, i is the set of variables A such

Construct a Triangular Table • Xi, i is the set of variables A such that A wi is a production of G • Compare at most n pairs of previously computed sets: (Xi, i , Xi+1, j ), (Xi, i+1 , Xi+2, j ) … (Xi, j-1 , Xj, j ) 74

Construct a Triangular Table X 1, 5 X 1, 4 X 1, 3 X

Construct a Triangular Table X 1, 5 X 1, 4 X 1, 3 X 1, 2 X 1, 1 X 2, 5 X 2, 4 X 2, 3 X 2, 2 X 3, 5 X 3, 4 X 3, 3 X 4, 5 X 4, 4 X 5, 5 w 1 w 2 w 3 w 4 w 5 Table for string ‘w’ that has length 5 75

Construct a Triangular Table X 1, 5 X 1, 4 X 1, 3 X

Construct a Triangular Table X 1, 5 X 1, 4 X 1, 3 X 1, 2 X 1, 1 X 2, 5 X 2, 4 X 2, 3 X 2, 2 X 3, 5 X 3, 4 X 3, 3 X 4, 5 X 4, 4 X 5, 5 w 1 w 2 w 3 w 4 w 5 Looking for pairs to compare 76

Example CYK Algorithm • Show the CYK Algorithm with the following example: – CNF

Example CYK Algorithm • Show the CYK Algorithm with the following example: – CNF grammar G • • S AB | BC A BA | a B CC | b C AB | a – w is baaba – Question Is baaba in L(G)? 77

Constructing The Triangular Table S AB | BC A BA | a B CC

Constructing The Triangular Table S AB | BC A BA | a B CC | b C AB | a {B} {A, C} b a a b a Calculating the Bottom ROW 78

Constructing The Triangular Table • X 1 , 2 = (Xi , Xi+1 ,

Constructing The Triangular Table • X 1 , 2 = (Xi , Xi+1 , j) = (X 1 , X 2 , 2) • {B}{A, C} = {BA, BC} • Steps: – Look for production rules to generate BA or BC – There are two: S and A S AB | BC A BA | a – X 1 , 2 = {S, A} B CC | b C AB | a 79

Constructing The Triangular Table {S, A} {B} {A, C} b a a b a

Constructing The Triangular Table {S, A} {B} {A, C} b a a b a 80

Constructing The Triangular Table • X 2 , 3 = (Xi , Xi+1 ,

Constructing The Triangular Table • X 2 , 3 = (Xi , Xi+1 , j) = (X 2 , X 3 , 3) • {A, C} = {AA, AC, CA, CC} = Y • Steps: – Look for production rules to generate Y – There is one: B S AB | BC – X 2 , 3 = {B} A BA | a B CC | b C AB | a 81

Constructing The Triangular Table {S, A} {B} {A, C} b a a b a

Constructing The Triangular Table {S, A} {B} {A, C} b a a b a S AB | BC A BA | a B CC | b C AB | a 82

Constructing The Triangular Table • X 3 , 4 = (Xi , Xi+1 ,

Constructing The Triangular Table • X 3 , 4 = (Xi , Xi+1 , j) = (X 3 , X 4 , 4) • {A, C}{B} = {AB, CB} = Y • Steps: – Look for production rules to generate Y – There are two: S and C S AB | BC – X 3 , 4 = {S, C} A BA | a B CC | b C AB | a 83

Constructing The Triangular Table {S, A} {B} {A, C} {S, C} {A, C} {B}

Constructing The Triangular Table {S, A} {B} {A, C} {S, C} {A, C} {B} {A, C} b a a b a 84

Constructing The Triangular Table • X 4 , 5 = (Xi , Xi+1 ,

Constructing The Triangular Table • X 4 , 5 = (Xi , Xi+1 , j) = (X 4 , X 5 , 5) • {B}{A, C} = {BA, BC} = Y • Steps: – Look for production rules to generate Y – There are two: S and A S AB | BC – X 4 , 5 = {S, A} A BA | a B CC | b C AB | a 85

Constructing The Triangular Table {S, A} {B} {A, C} {S, C} {A, C} {S,

Constructing The Triangular Table {S, A} {B} {A, C} {S, C} {A, C} {S, A} {B} {A, C} b a a b a 86

Constructing The Triangular Table • X 1 , 3 = (Xi , Xi+1 ,

Constructing The Triangular Table • X 1 , 3 = (Xi , Xi+1 , j) (Xi , i+1 , Xi+2 , j) = (X 1 , X 2 , 3) , (X 1 , 2 , X 3 , 3) • {B}{B} U {S, A}{A, C}= {BB, SA, SC, AA, AC} = Y • Steps: S AB | BC – Look for production rules to generate Y A BA | a B CC | b – There are NONE: S and A C AB | a – X 1 , 3 = Ø – no elements in this set (empty set) 87

Constructing The Triangular Table Ø {S, A} {B} {A, C} {S, C} {A, C}

Constructing The Triangular Table Ø {S, A} {B} {A, C} {S, C} {A, C} {S, A} {B} {A, C} b a a b a 88

Constructing The Triangular Table • X 2 , 4 = (Xi , Xi+1 ,

Constructing The Triangular Table • X 2 , 4 = (Xi , Xi+1 , j) (Xi , i+1 , Xi+2 , j) = (X 2 , X 3 , 4) , (X 2 , 3 , X 4 , 4) • {A, C}{S, C} U {B}{B}= {AS, AC, CS, CC, BB} = Y • Steps: S AB | BC – Look for production rules to generate Y A BA | a B CC | b – There is one: B C AB | a – X 2 , 4 = {B} 89

Constructing The Triangular Table ? Ø {S, A} {B} {B} {A, C} {S, C}

Constructing The Triangular Table ? Ø {S, A} {B} {B} {A, C} {S, C} {A, C} {S, A} {B} {A, C} b a a b A 90

Constructing The Triangular Table • X 3 , 5 = (Xi , Xi+1 ,

Constructing The Triangular Table • X 3 , 5 = (Xi , Xi+1 , j) (Xi , i+1 , Xi+2 , j) = (X 3 , X 4 , 5) , (X 3 , 4 , X 5 , 5) • {A, C}{S, A} U {S, C}{A, C} = {AS, AA, CS, CA, SC, CA, CC} = Y S AB | BC • Steps: – Look for production rules to generate Y – There is one: B – X 3 , 5 = {B} A BA | a B CC | b C AB | a 91

Constructing The Triangular Table Ø {S, A} {B} {B} {A, C} {B} {S, C}

Constructing The Triangular Table Ø {S, A} {B} {B} {A, C} {B} {S, C} {A, C} {S, A} {B} {A, C} b a a b a 92

Final Triangular Table {S, A, C} X 1, 5 Ø {S, A, C} Ø

Final Triangular Table {S, A, C} X 1, 5 Ø {S, A, C} Ø {B} {S, A} {B} {A, C} b a S AB | BC A BA | a B CC | b C AB | a {B} {S, C} {A, C} {S, A} {B} {A, C} a b a - Table for string ‘w’ that has length 5 - The algorithm populates the triangular table 93

Example (Result) • Is baaba in L(G)? Yes We can see the S in

Example (Result) • Is baaba in L(G)? Yes We can see the S in the set X 1 n where ‘n’ = 5 We can see the table the cell X 15 = (S, A, C) then if S Є X 15 then baaba Є L(G) 94

Theorem • The CYK Algorithm correctly computes X i j for all i and

Theorem • The CYK Algorithm correctly computes X i j for all i and j; thus w is in L(G) if and only if S is in X 1 n. • The running time of the algorithm is O(n 3). 95

Today’s Class • • Derivation Trees Ambiguity Normal Forms CYK Algorithm 96

Today’s Class • • Derivation Trees Ambiguity Normal Forms CYK Algorithm 96