Lecture 8 Chapter 4 Syntax Analysis Practice Context

  • Slides: 57
Download presentation
Lecture # 8 Chapter # 4: Syntax Analysis

Lecture # 8 Chapter # 4: Syntax Analysis

Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b)

Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive b’s can occur but consecutive a’s can occur c) CFG for the following language: L(G)= {an b 2 n | n>=0}

Practice Answers a) S 0 A | 1 B A 1 B | 0

Practice Answers a) S 0 A | 1 B A 1 B | 0 B 0 A | 1 b) S a. S | b. T |a |b T a. S | a c) S a. Sbb | є

Example • Design a CFG for the language L(G)= {0 n 1 m |

Example • Design a CFG for the language L(G)= {0 n 1 m | n <> m} There are two cases: – For n>m – For n<m – Write two separate set of rules and combine them

Example • For n>m S 1 AB B 0 A 1 | Є A

Example • For n>m S 1 AB B 0 A 1 | Є A 0 A | 0 For n<m S 2 XY X 0 X 1 | Є Y 1 Y | 1 Combining both: S S 1 | S 2

Practice CFG • Design a CFG for the language L(G)= {0 i 1 j

Practice CFG • Design a CFG for the language L(G)= {0 i 1 j 2 k| i=j or j=k} There are two cases: – For i=j – For j=k – Write two separate set of rules and combine them

Solution of Practice • For i=j S 1 AB A 0 A 1 |

Solution of Practice • For i=j S 1 AB A 0 A 1 | ε B 2 B | ε For j=k S 2 XY X 0 X | ε Y 1 Y 2 | ε Combining both: S S 1 | S 2

Derivations • The one-step derivation is defined by A where A is a production

Derivations • The one-step derivation is defined by A where A is a production in the grammar • In addition, we define – – is leftmost lm if does not contain a nonterminal is rightmost rm if does not contain a nonterminal Transitive closure * (zero or more steps) Positive closure + (one or more steps) • The language generated by G is defined by L(G) = {w T* | S + w}

Derivation (Example) Grammar G = ({E}, {+, *, (, ), -, id}, P, E)

Derivation (Example) Grammar G = ({E}, {+, *, (, ), -, id}, P, E) with productions P = E E+E E E*E E (E) E -E E id Example derivations: E - id E rm E + id rm id + id E * E E * id + id E + id * id + id 9

Example • For the given grammar derive the string 9 -5+2 using Left most

Example • For the given grammar derive the string 9 -5+2 using Left most derivation Then derive the same string using Right most derivation

Example Grammar Context-free grammar for simple expressions: G = <{list, digit}, {+, -, 0,

Example Grammar Context-free grammar for simple expressions: G = <{list, digit}, {+, -, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, P, list> with productions P = list + digit list - digit list digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 11

Derivation for the Example Grammar list + digit list - digit + digit 9

Derivation for the Example Grammar list + digit list - digit + digit 9 - 5 + digit 9 -5+2 This is an example leftmost derivation, because we replaced the leftmost nonterminal (underlined) in each step. Likewise, a rightmost derivation replaces the rightmost nonterminal in each step 12

Chomsky Hierarchy: Language Classification • A grammar G is said to be – Regular

Chomsky Hierarchy: Language Classification • A grammar G is said to be – Regular if it is right linear where each production is of the form A w. B or A w or left linear where each production is of the form A Bw or A w – Context free if each production is of the form A where A N and (N T)* – Context sensitive if each production is of the form A where A N, , , (N T)*, | | > 0 – Unrestricted 13

Chomsky Hierarchy L(regular) L(context free) L(context sensitive) L(unrestricted) Where L(T) = { L(G) |

Chomsky Hierarchy L(regular) L(context free) L(context sensitive) L(unrestricted) Where L(T) = { L(G) | G is of type T } That is: the set of all languages generated by grammars G of type T Examples: Every finite language is regular! (construct a FSA for strings in L(G)) L 1 = { anbn | n 1 } is context free L 2 = { anbncn | n 1 } is context sensitive 14

Parse Trees • The root of the tree is labeled by the start symbol

Parse Trees • The root of the tree is labeled by the start symbol • Each leaf of the tree is labeled by a terminal (=token) or • Each interior node is labeled by a nonterminal • If A X 1 X 2 … Xn is a production, then node A has immediate children X 1, X 2, …, Xn where Xi is a (non)terminal or ( denotes the empty string) 15

Parse Tree for the Example Grammar Parse tree of the string 9 -5+2 using

Parse Tree for the Example Grammar Parse tree of the string 9 -5+2 using grammar G list digit 9 - 5 + 2 The sequence of leafs is called the yield of the parse tree 16

Example of Parse Tree • Suppose we have the following grammar E→E+E E→E*E E→(E)

Example of Parse Tree • Suppose we have the following grammar E→E+E E→E*E E→(E) E→-E E → id Perform Left most derivation, right most derivation and construct a parse tree for the string id+id*id

Two possible Parse Trees using Leftmost derivation

Two possible Parse Trees using Leftmost derivation

Parse Tree via Right most derivation

Parse Tree via Right most derivation

Ambiguity • Grammar is ambiguous if more than one parse tree is possible for

Ambiguity • Grammar is ambiguous if more than one parse tree is possible for some string as shown in the previous example. If there are more than one left most derivations or more than one right most derivations. • Ambiguity is not acceptable – Unfortunately, it’s undecidable to check whether a given CFG is ambiguous – Some CFLs are inherently ambiguous (do not have an unambiguous CFG)

Ambiguity (cont’d) Consider the following context-free grammar: G = <{string}, {+, -, 0, 1,

Ambiguity (cont’d) Consider the following context-free grammar: G = <{string}, {+, -, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, P, string> with production P = string + string | string - string | 0 | 1 | … | 9 This grammar is ambiguous, because more than one parse tree represents the string 9 -5+2 21

Two Parse Trees for the same string 9 string - 5 string + 2

Two Parse Trees for the same string 9 string - 5 string + 2 9 string - 5 string + 2 22

Practice • Show that the following grammar is ambiguous: (Find out strings and two

Practice • Show that the following grammar is ambiguous: (Find out strings and two parse trees) 1) S AB | aa. B A a | Aa B b 3) S a. Sb | SS | ε 2) S a | ab. Sb |a. Ab A b. S | a. AAb

Example Predictive Parser (Execution Step 5) type() match(‘array’) match(‘[’) match(‘num’) Input: simple() match(‘dotdot’) array

Example Predictive Parser (Execution Step 5) type() match(‘array’) match(‘[’) match(‘num’) Input: simple() match(‘dotdot’) array [ num match(‘num’) dotdot num lookahead ] of integer 24

Example Predictive Parser (Execution Step 6) type() match(‘array’) match(‘[’) match(‘num’) Input: simple() match(‘dotdot’) array

Example Predictive Parser (Execution Step 6) type() match(‘array’) match(‘[’) match(‘num’) Input: simple() match(‘dotdot’) array [ num match(‘]’) match(‘num’) dotdot num ] lookahead of integer 25

Example Predictive Parser (Execution Step 7) type() match(‘array’) match(‘[’) match(‘num’) Input: simple() match(‘dotdot’) array

Example Predictive Parser (Execution Step 7) type() match(‘array’) match(‘[’) match(‘num’) Input: simple() match(‘dotdot’) array [ num match(‘]’) match(‘of’) match(‘num’) dotdot num ] of lookahead integer 26

Example Predictive Parser (Execution Step 8) type() match(‘array’) match(‘[’) match(‘num’) simple() match(‘dotdot’) match(‘]’) match(‘of’)

Example Predictive Parser (Execution Step 8) type() match(‘array’) match(‘[’) match(‘num’) simple() match(‘dotdot’) match(‘]’) match(‘of’) match(‘num’) type() simple() match(‘integer’) Input: array [ num dotdot num ] of integer lookahead 27

FIRST( ) is the set of terminals that appear as the first symbols of

FIRST( ) is the set of terminals that appear as the first symbols of one or more strings generated from type simple | ^ id | array [ simple ] of type simple integer | char | num dotdot num FIRST(simple) = { integer, char, num } FIRST(^ id) = { ^ } FIRST(type) = { integer, char, num, ^, array } 28

How to use FIRST We use FIRST to write a predictive parser as follows

How to use FIRST We use FIRST to write a predictive parser as follows expr term rest + term rest | - term rest | procedure rest(); begin if lookahead in FIRST(+ term rest) then match(‘+’); term(); rest() else if lookahead in FIRST(- term rest) then match(‘-’); term(); rest() else return end; When a nonterminal A has two (or more) productions as in A | Then FIRST ( ) and FIRST( ) must be disjoint for predictive parsing to work 29

Predictive Parsing • • Eliminate left recursion from grammar Left factor the grammar Compute

Predictive Parsing • • Eliminate left recursion from grammar Left factor the grammar Compute FIRST and FOLLOW Two variants: – Recursive (recursive-descent parsing) – Non-recursive (table-driven parsing) 30

Predictive Parsing Program (Driver) push($) push(S) a : = lookahead repeat X : =

Predictive Parsing Program (Driver) push($) push(S) a : = lookahead repeat X : = pop() if X is a terminal or X = $ then match(X) // moves to next token and a : = lookahead else if M[X, a] = X Y 1 Y 2…Yk then push(Yk, Yk-1, …, Y 2, Y 1) // such that Y 1 is on top … invoke actions and/or produce IR output … else error() endif until X = $ 31

Non-Recursive Predictive Parsing: Table-Driven Parsing • Given an LL(1) grammar G = (N, T,

Non-Recursive Predictive Parsing: Table-Driven Parsing • Given an LL(1) grammar G = (N, T, P, S) construct a table M[A, a] for A N, a T and use a driver program with a stack input a + b $ stack X Y Z $ Predictive parsing program (driver) Parsing table M output 32

’ E TE E’ + T E’ | T F T’ T’ * F

’ E TE E’ + T E’ | T F T’ T’ * F T’ | F ( E ) | id Example id+id*id id E * E T E’ ( T F T’ F push($) push(S) repeat $ E’ E’ T’ T’ T F T’ T’ T’ ) E T E’ E’ + T E’ E’ T + F id T’ * F T’ F (E) a : = lookahead X : = pop() if X is a terminal or X = $ then match(X) // moves to next token and a : = lookahead else if M[X, a] = X Y 1 Y 2…Yk then push(Yk, Yk-1, …, Y 2, Y 1) endif

Example Table-Driven Parsing Stack $E $E ’ T ’ F $E ’ T ’

Example Table-Driven Parsing Stack $E $E ’ T ’ F $E ’ T ’ id $E ’ T ’ $E ’ T+ $E ’ T ’ F $E ’ T ’ id $E ’ T ’ F* $E ’ T ’ F $E ’ T ’ id $E ’ T ’ $E ’ $ Input Production applied id+id*id$ E TE’ id+id*id$ T FT’ id+id*id$ F id id+id*id$ T’ +id*id$ E’ +TE’ +id*id$ T FT’ id*id$ F id id*id$ T’ *FT’ *id$ F id id$ $ T ’ $ E ’ $ 34

FIRST (Revisited) • FIRST( ) = { the set of terminals that begin all

FIRST (Revisited) • FIRST( ) = { the set of terminals that begin all strings derived from } • FIRST(a) = {a} if a T • FIRST( ) = { } • FIRST(A) = A FIRST( ) for A P • FIRST(X 1 X 2…Xk) = if for all j = 1, …, i-1 : FIRST(Xj) then add non- in FIRST(Xi) to FIRST(X 1 X 2…Xk) if for all j = 1, …, k : FIRST(Xj) then add to FIRST(X 1 X 2…Xk) 35

FOLLOW • FOLLOW(A) = { the set of terminals that can immediately follow nonterminal

FOLLOW • FOLLOW(A) = { the set of terminals that can immediately follow nonterminal A } FOLLOW(A) = • for all (B A ) P do add FIRST( ){ } to FOLLOW(A) • for all (B A ) P and FIRST( ) do add FOLLOW(B) to FOLLOW(A) • for all (B A) P do add FOLLOW(B) to FOLLOW(A) • if A is the start symbol S then add $ to FOLLOW(A) 36

Constructing an LL(1) Predictive Parsing Table for each production A do for each a

Constructing an LL(1) Predictive Parsing Table for each production A do for each a FIRST( ) do add A to M[A, a] enddo if FIRST( ) then for each b FOLLOW(A) do endif enddo add A to M[A, b] enddo Mark each undefined entry in M error 37

LL(1) Grammar • A grammar G is LL(1) if it is not left recursive

LL(1) Grammar • A grammar G is LL(1) if it is not left recursive and for each collection of productions A 1 | 2 | … | n for nonterminal A the following holds: 1. FIRST( i) FIRST( j) = for all i j 2. if i * then 2. a. j * for all i j 2. b. FIRST( j) FOLLOW(A) = for all i j 39

Non-LL(1) Examples Grammar S Sa|a S a. S|a S a. R| R S| Not

Non-LL(1) Examples Grammar S Sa|a S a. S|a S a. R| R S| Not LL(1) because: Left recursive FIRST(a S) FIRST(a) S a. Ra R S| For R: FIRST(S) FOLLOW(R) For R: S * and * 40

Recursive-Descent Parsing (Recap) • Grammar must be LL(1) • Every nonterminal has one (recursive)

Recursive-Descent Parsing (Recap) • Grammar must be LL(1) • Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal’s syntactic category of input tokens • When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information 41

LL(1) Grammars are Unambiguous Ambiguous grammar S i E t S SR | a

LL(1) Grammars are Unambiguous Ambiguous grammar S i E t S SR | a SR e S | E b A S i E t S SR i e$ S a a e$ SR e S e e$ SR e$ E b b t Error: duplicate table entry a S b e S a i t $ S i E t S SR SR SR e S SR E FIRST( ) FOLLOW(A) E b SR 42

Panic Mode Recovery Add synchronizing actions to undefined entries based on FOLLOW Pro: Cons:

Panic Mode Recovery Add synchronizing actions to undefined entries based on FOLLOW Pro: Cons: Can be automated Error messages are needed id E synch: * ( ) $ E T ER synch ER + T ER T F TR TR F + E T ER ER T FOLLOW(E) = { ) $ } FOLLOW(ER) = { ) $ } FOLLOW(T) = { + ) $ } FOLLOW(TR) = { + ) $ } FOLLOW(F) = { + * ) $ } F id ER ER T F TR synch TR TR * F TR synch TR TR F (E) the driver pops current nonterminal A and skips input till synch token or skips input until one of FIRST(A) is found synch 43

Phrase-Level Recovery Change input stream by inserting missing tokens For example: id id is

Phrase-Level Recovery Change input stream by inserting missing tokens For example: id id is changed into id * id Pro: Cons: Can be automated Recovery not always intuitive Can then continue here id E + * E T ER ( ) $ E T ER synch ER + T ER ER T T F TR synch T F TR TR insert * TR TR * F TR F F id synch TR TR F (E) insert *: driver inserts missing * and retries the production synch 44

Error Productions Add “error production”: TR F T R to ignore missing *, e.

Error Productions Add “error production”: TR F T R to ignore missing *, e. g. : id id E T ER ER + T ER | T F TR TR * F TR | F ( E ) | id id E Pro: Cons: + Powerful recovery method Cannot be automated * E T ER ( ) $ E T ER synch ER + T ER ER T T F TR synch T F TR TR TR F T R TR TR * F TR F F id synch TR TR F (E) synch 45

Bottom-Up Parsing • LR methods (Left-to-right, Rightmost derivation) – SLR, Canonical LR, LALR •

Bottom-Up Parsing • LR methods (Left-to-right, Rightmost derivation) – SLR, Canonical LR, LALR • Other special cases: – Shift-reduce parsing – Operator-precedence parsing 46

Operator-Precedence Parsing • Special case of shift-reduce parsing • We will not further discuss

Operator-Precedence Parsing • Special case of shift-reduce parsing • We will not further discuss (you can skip textbook section 4. 6) • We will not further discuss SLR, Canonical LR, LALR 47

Shift-Reduce Parsing Grammar: S a. ABe A Abc|b B d Shift-reduce corresponds to a

Shift-Reduce Parsing Grammar: S a. ABe A Abc|b B d Shift-reduce corresponds to a rightmost derivation: S rm a A B e rm a A d e rm a A b c d e rm a b b c d e Reducing a sentence: abbcde a. Ade a. ABe S These match production’s right-hand sides S A A a b b c d e A B a b b c d e 48

Handles A handle is a substring of grammar symbols in a right-sentential form that

Handles A handle is a substring of grammar symbols in a right-sentential form that matches a right-hand side of a production Grammar: S a. ABe A Abc|b B d abbcde a. Ade a. ABe S abbcde a. AAe …? Handle NOT a handle, because further reductions will fail (result is not a sentential form) 49

Stack Implementation of Shift-Reduce Parsing Grammar: E E+E E E*E E (E) E id

Stack Implementation of Shift-Reduce Parsing Grammar: E E+E E E*E E (E) E id Find handles to reduce Stack $ $id $E $E+id $E+E*id $E+E*E $E+E $E Input id+id*id$ *id$ id$ $ $ Action shift reduce E id shift (or reduce? ) shift reduce E id reduce E E * E reduce E E + E accept How to resolve conflicts? 50

Conflicts • Shift-reduce and reduce-reduce conflicts are caused by – The limitations of the

Conflicts • Shift-reduce and reduce-reduce conflicts are caused by – The limitations of the LR parsing method (even when the grammar is unambiguous) – Ambiguity of the grammar 51

Shift-Reduce Parsing: Shift-Reduce Conflicts Ambiguous grammar: S if E then S | if E

Shift-Reduce Parsing: Shift-Reduce Conflicts Ambiguous grammar: S if E then S | if E then S else S | other Stack $… $…if E then S Input Action …$ … else…$ shift or reduce? Resolve in favor of shift, so else matches closest if 52

Shift-Reduce Parsing: Reduce-Reduce Conflicts Grammar: C AB A a B a Stack $ $a

Shift-Reduce Parsing: Reduce-Reduce Conflicts Grammar: C AB A a B a Stack $ $a Input aa$ a$ Action shift reduce A a or B a ? Resolve in favor of reduce A a, otherwise we’re stuck! 53

Model of an LR Parser input a 1 a 2 … ai … an

Model of an LR Parser input a 1 a 2 … ai … an $ stack sm Xm sm-1 Xm-1 … s 0 LR Parsing Program (driver) action goto shift reduce accept error DFA output Constructed with LR(0) method, SLR method, LR(1) method, or LALR(1) method 54

LR Parsing (Driver) Configuration ( = LR parser state): (s 0 X 1 s

LR Parsing (Driver) Configuration ( = LR parser state): (s 0 X 1 s 1 X 2 s 2 … Xm sm, ai ai+1 … an $) stack input If action[sm, ai] = shift s then push ai, push s, and advance input: (s 0 X 1 s 1 X 2 s 2 … Xm sm ai s, ai+1 … an $) If action[sm, ai] = reduce A and goto[sm-r, A] = s with r=| | then pop 2 r symbols, push A, and push s: (s 0 X 1 s 1 X 2 s 2 … Xm-r sm-r A s, ai ai+1 … an $) If action[sm, ai] = accept then stop If action[sm, ai] = error then attempt recovery 55

Example LR Parse Table Grammar: 1. E E + T 2. E T 3.

Example LR Parse Table Grammar: 1. E E + T 2. E T 3. T T * F 4. T F 5. F ( E ) 6. F id Shift & goto 5 Reduce by production #1 56

LL(k) and LR(k) • LL(k) parse tables computed using FIRST/FOLLOW • LR(k) parsing tables

LL(k) and LR(k) • LL(k) parse tables computed using FIRST/FOLLOW • LR(k) parsing tables computed using closure/goto • For a grammar to be LR (k) we must be able to recognize the occurrence of right hand side of the production having seen k input lookaheads 57