1 Syntax Analysis Part I Chapter 4 COP


































![35 Example Predictive Parser (Execution Step 6) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘num’) match(‘dotdot’) 35 Example Predictive Parser (Execution Step 6) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘num’) match(‘dotdot’)](https://slidetodoc.com/presentation_image_h2/98d4483a407a37785511770926cde0a8/image-35.jpg)
![36 Example Predictive Parser (Execution Step 7) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) match(‘num’) 36 Example Predictive Parser (Execution Step 7) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) match(‘num’)](https://slidetodoc.com/presentation_image_h2/98d4483a407a37785511770926cde0a8/image-36.jpg)
![37 Example Predictive Parser (Execution Step 8) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) type() 37 Example Predictive Parser (Execution Step 8) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) type()](https://slidetodoc.com/presentation_image_h2/98d4483a407a37785511770926cde0a8/image-37.jpg)


















- Slides: 55
1 Syntax Analysis Part I Chapter 4 COP 5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007 -2011
2 Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token, tokenval Get next token Lexical error Parser and rest of front-end Intermediate representation Syntax error Semantic error Symbol Table
3 The Parser • A parser implements a C-F grammar • The role of the parser is twofold: 1. To check syntax (= string recognizer) – And to report syntax errors accurately 2. To invoke semantic actions – – For static semantics checking, e. g. type checking of expressions, functions, etc. For syntax-directed translation of the source code to an intermediate representation
4 Error Handling • A good compiler should assist in identifying and locating errors – Lexical errors: important, compiler can easily recover and continue – Syntax errors: most important for compiler, can almost always recover – Static semantic errors: important, can sometimes recover – Dynamic semantic errors: hard or impossible to detect at compile time, runtime checks are required – Logical errors: hard or impossible to detect
5 Error Recovery Strategies • Panic mode – Discard input until a token in a set of designated synchronizing tokens is found • Phrase-level recovery – Perform local correction on the input to repair the error • Error productions – Augment grammar with productions for erroneous constructs – Edo |do stmt while (expr) • Global correction – Choose a minimal sequence of changes to obtain a global least-cost correction
6 Grammars (Recap) • Context-free grammar is a 4 -tuple G = (N, T, P, S) where – T is a finite set of tokens (terminal symbols) – N is a finite set of nonterminals – P is a finite set of productions of the form where (N T)* N (N T)* and (N T)* – S N is a designated start symbol
7 Derivation (Example) Grammar G = ({E}, {+, *, (, ), -, id}, P, E) with productions P = E E+E E E*E E (E) E -E E id Example derivations: E - id E rm E + id rm id + id E * E E * id + id E + id * id + id
8 Left Recursion (Recap) • Productions of the form A A |b A B | b A A | b
9 Left Recursion (Recap) • Productions of the form A A | | are left recursive • When one of the productions in a grammar is left recursive then a predictive parser loops forever on certain inputs
10 Left Recursion (Recap) A A |b A A b
11 Left Recursion (Recap) A A |b A b. B B B| A b. B b
12 A General Systematic Left Recursion Elimination Method Input: Grammar G with no cycles or -productions Arrange the nonterminals in some order A 1, A 2, …, An for i = 1, …, n do for j = 1, …, i-1 do replace each Ai Aj with A i 1 | 2 | … | k where A j 1 | 2 | … | k enddo eliminate the immediate left recursion in Ai enddo
13 Immediate Left-Recursion Elimination Rewrite every left-recursive production A A | | |A into a right-recursive production: A AR | AR AR AR |
Example Left Recursion Elim. A BC|a B CA|Ab C AB|CC|a i = 1: i = 2, j = 1: i = 3, j = 1: i = 3, j = 2: Choose arrangement: A, B, C nothing to do B CA|Ab B CA|BCb|ab (imm) B C A BR | a b BR BR C b BR | C AB|CC|a C BCB|a. B|CC|a C C A BR C B | a b BR C B | a B | C C | a (imm) C a b BR C B CR | a CR CR A B R C B C R | C C R | 14
15 Left Factoring When more than one production for nonterminal A starts with the same symbols, the FIRST sets are not disjoint stmt if expr then stmt | if expr then stmt else stmt We can use left factoring to fix the problem stmt if expr then stmt opt_else stmt |
16 Left Factoring • When a nonterminal has two or more productions whose right-hand sides start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing • Replace productions A 1 | 2 | … | n | with A AR | AR 1 | 2 | … | n
17 Parsing • Universal (any C-F grammar) – Cocke-Younger-Kasimi – Earley • Top-down (C-F grammar with restrictions) – Recursive descent (predictive parsing) – LL (Left-to-right, Leftmost derivation) methods • Bottom-up (C-F grammar with restrictions) – Operator precedence parsing – LR (Left-to-right, Rightmost derivation) methods • SLR, canonical LR, LALR
18 Top-Down Parsing • LL methods (Left-to-right, Leftmost derivation) and recursive-descent parsing Grammar: E T+T T (E) T -E T id E Leftmost derivation: E lm T + T lm id + id E T + T id + E T T T id + id
19 Top-Down Parsing E E+T |T T T*F |F F id | ( E) Derivation: id+id*id
20 Top-Down Parsing
21 Top-Down Parsing • At each step of a top-down parse, the key problem is that of determining the production to be applied for a nonterminal, say A • Recursive Descent: may require backtracking to find the correct A-production to be applied. • Predictive Parsing: a special case of recursive descent parsing, where no backtracking is required – chooses the correct A-production by looking ahead at the input a fixed number of symbols – typically one symbol
22 Top-Down Parsing • The class of grammars for which we can construct predictive parsers looking k symbols ahead in the input is sometimes called the LL(k) class. • LL(1) class: only 1 symbol ahead
23 Recursive-Descent Parsing • Recursive descent parsing is a top-down parsing method – Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal’s syntactic category of input tokens – When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information • Execution begins with the procedure for the start symbol, which halts and announces success if its procedure body scans the entire input string. • General recursive-descent may require backtracking
24 Recursive-Descent Parsing
25 Recursive-Descent Parsing E() T() E’() F() id id+id*id
26 Recursive-Descent Parsing E() T() E’() F() T’() id id+id*id
27 Predictive Parsing • A special form of recursive descent parsing where we use one lookahead token to unambiguously determine the operation • Two variants: – Recursive (recursive-descent parsing) – Non-recursive (table-driven parsing)
28 Example Predictive Parser (Grammar) type simple | ^ id | array [ simple ] of type simple integer | char | num dotdot num
29 Example Predictive Parser (Program Code) procedure match(t : token); begin if lookahead = t then lookahead : = nexttoken() else error() end; procedure type(); begin if lookahead in { ‘integer’, ‘char’, ‘num’ } then simple() else if lookahead = ‘^’ then match(‘^’); match(id) else if lookahead = ‘array’ then match(‘array’); match(‘[‘); simple(); match(‘]’); match(‘of’); type() else error() end; procedure simple(); begin if lookahead = ‘integer’ then match(‘integer’) else if lookahead = ‘char’ then match(‘char’) else if lookahead = ‘num’ then match(‘num’); match(‘dotdot’); match(‘num’) else error() end;
30 Example Predictive Parser (Execution Step 1) type() Check lookahead and call match(‘array’) Input: array lookahead [ num dotdot num ] of integer
31 Example Predictive Parser (Execution Step 2) type() match(‘array’) match(‘[’) Input: array [ num lookahead dotdot num ] of integer
32 Example Predictive Parser (Execution Step 3) type() match(‘array’) match(‘[’) simple() match(‘num’) Input: array [ num lookahead dotdot num ] of integer
33 Example Predictive Parser (Execution Step 4) type() match(‘array’) match(‘[’) simple() match(‘num’) match(‘dotdot’) Input: array [ num dotdot lookahead num ] of integer
34 Example Predictive Parser (Execution Step 5) type() match(‘array’) match(‘[’) simple() match(‘num’) match(‘dotdot’) match(‘num’) Input: array [ num dotdot num lookahead ] of integer
35 Example Predictive Parser (Execution Step 6) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘num’) match(‘dotdot’) match(‘num’) Input: array [ num dotdot num ] of lookahead integer
36 Example Predictive Parser (Execution Step 7) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) match(‘num’) match(‘dotdot’) match(‘num’) Input: array [ num dotdot num ] of integer lookahead
37 Example Predictive Parser (Execution Step 8) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) type() match(‘num’) match(‘dotdot’) match(‘num’) Input: array [ num dotdot num simple() match(‘integer’) ] of integer lookahead
38 FIRST( ) is the set of terminals that appear as the first symbols of one or more strings generated from type simple | ^ id | array [ simple ] of type simple integer | char | num dotdot num FIRST(simple) = { integer, char, num } FIRST(^ id) = { ^ } FIRST(type) = { integer, char, num, ^, array }
39 FIRST • FIRST( ) = { the set of terminals that begin all strings derived from } FIRST(a) = {a} if a T FIRST( ) = { } FIRST(A) = A FIRST( ) for A P FIRST(X 1 X 2…Xk) = if for all j = 1, …, i-1 : FIRST(Xj) then add non- in FIRST(Xi) to FIRST(X 1 X 2…Xk) if for all j = 1, …, k : FIRST(Xj) then add to FIRST(X 1 X 2…Xk)
40 FOLLOW • FOLLOW(A) = { the set of terminals that can immediately follow nonterminal A } FOLLOW(A) = for all (B A ) P do add FIRST( ){ } to FOLLOW(A) for all (B A ) P and FIRST( ) do add FOLLOW(B) to FOLLOW(A) for all (B A) P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol S then add $ to FOLLOW(A)
41 LL(1) Grammar • A grammar G is LL(1) if it is not left recursive and for each collection of productions A 1 | 2 | … | n for nonterminal A the following holds: 1. FIRST( i) FIRST( j) = for all i j 2. if i * then 2. a. j * for all i j 2. b. FIRST( j) FOLLOW(A) = for all i j
42 LL(1) Grammar • LL(1) – L: scanning the input from left to right – L: producing a left most derivation – 1: using one input symbol of lookahead • The class of LL(1) grammars is rich enough to cover most programming constructs • Care is needed in writing a suitable grammar for the source language – Eliminate ambiguity, left recursion – Left factor the grammar
43 Non-LL(1) Examples Grammar Not LL(1) because: S Sa|a Left recursive S a. S|a FIRST(a S) FIRST(a) S a. R| R S| For R: S * and * S a. Ra R S| For R: FIRST(S) FOLLOW(R)
44 Non-Recursive Predictive Parsing: Table-Driven Parsing • Given an LL(1) grammar G = (N, T, P, S) construct a table M[A, a] for A N, a T and use a driver program with a stack input stack X Y Z $ a + b $ Predictive parsing program (driver) Parsing table M output
45 Constructing an LL(1) Predictive Parsing Table for each production A do for each a FIRST( ) do add A to M[A, a] enddo if FIRST( ) then for each b FOLLOW(A) do add A to M[A, b] enddo endif enddo Mark each undefined entry in M error
Example Table E T ER ER + T ER | T F TR TR * F TR | F ( E ) | id id E FOLLOW(A) E T ER ( id $) ER + T ER + $) ER $) T F TR ( id +$) TR * F TR * +$) TR +$) F (E) ( *+$) F id id *+$) * ( T F TR $ ER ER TR TR T F TR TR F id ) E T ER ER + T ER TR F FIRST( ) E T ER ER T + A TR * F TR F (E) 46
47 LL(1) Grammars are Unambiguous Ambiguous grammar S i E t S SR | a SR e S | E b A S i E t S SR i e$ S a a e$ SR e S e e$ SR e$ E b b t Error: duplicate table entry a S b e S a i t $ S i E t S SR SR SR e S SR E FIRST( ) FOLLOW(A) E b SR
48 LL(1) Grammars are Unambiguous Ambiguous grammar S i E t S SR | a SR e S | E b Ambiguous grammar ibta ea
49 Predictive Parsing Program (Driver) push($) push(S) a : = lookahead repeat X : = pop() if X is a terminal or X = $ then match(X) // moves to next token and a : = lookahead else if M[X, a] = X Y 1 Y 2…Yk then push(Yk, Yk-1, …, Y 2, Y 1) // such that Y 1 is on top … invoke actions and/or produce IR output … else error() endif until X = $
Example Table-Driven Parsing Stack $E $ERTRF $ERTRid $ERTR $ERT+ $ERTRF $ERTRid $ERTRF* $ERTRF $ERTRid $ERTR $ER $ Input Production applied id+id*id$ E T ER id+id*id$ T F TR id+id*id$ F id id+id*id$ TR +id*id$ ER + T ER +id*id$ T F TR id*id$ F id id*id$ TR * F TR *id$ F id id$ $ TR $ ER $ 50
51 Panic Mode Recovery • Place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it is likely that parsing can continue. Stack A B. . . $ abbcbbkccabc
52 Panic Mode Recovery • Place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it is likely that parsing can continue. • If a nonterminal can generate the empty string, then the production deriving can be used as a default. • If a terminal on top of the stack cannot be matched, a simple idea is to pop the terminal, and continue parsing.
53 Panic Mode Recovery Add synchronizing actions to undefined entries based on FOLLOW Pro: Can be automated Cons: Error messages are needed id E ( ) $ E T ER synch ER ER ER + T ER T F TR TR F * E T ER ER T + FOLLOW(E) = { ) $ } FOLLOW(ER) = { ) $ } FOLLOW(T) = { + ) $ } FOLLOW(TR) = { + ) $ } FOLLOW(F) = { + * ) $ } F id T F TR synch TR TR * F TR synch TR TR F (E) synch: the driver pops current nonterminal A and skips input till synch token or skips input until one of FIRST(A) is found
54 Phrase-Level Recovery Change input stream by inserting missing tokens For example: id id is changed into id * id Pro: Can be automated Cons: Recovery not always intuitive Can then continue here id E + * E T ER ( ) $ E T ER synch ER ER ER + T ER ER T T F TR synch T F TR TR insert * TR TR * F TR F F id synch TR TR F (E) synch insert *: driver inserts missing * and retries the production
55 Error Productions Add “error production”: TR F T R to ignore missing *, e. g. : id id E T ER ER + T ER | T F TR TR * F TR | F ( E ) | id id E Pro: Powerful recovery method Cons: Cannot be automated + * E T ER ( ) $ E T ER synch ER ER ER + T ER ER T T F TR synch T F TR TR TR F T R TR TR * F TR F F id synch TR TR F (E) synch