Winter 2012 2013 Compiler Principles Syntax Analysis Parsing

  • Slides: 88
Download presentation
Winter 2012 -2013 Compiler Principles Syntax Analysis (Parsing) – Part 1 Mayer Goldberg and

Winter 2012 -2013 Compiler Principles Syntax Analysis (Parsing) – Part 1 Mayer Goldberg and Roman Manevich Ben-Gurion University

Books Compilers Principles, Techniques, and Tools Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Books Compilers Principles, Techniques, and Tools Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman Modern Compiler Implementation in Java Andrew W. Appel Modern Compiler Design D. Grune, H. Bal, C. Jacobs, K. Langendoen Advanced Compiler Design and Implementation Steven Muchnik 2

Today • Understand role of syntax analysis • Context-free grammars – Basic definitions –

Today • Understand role of syntax analysis • Context-free grammars – Basic definitions – Ambiguities • Top-down parsing – Predictive parsing • Next time: bottom-up parsing method 3

The bigger picture • Compilers include different kinds of program analyses each further constrains

The bigger picture • Compilers include different kinds of program analyses each further constrains the set of legal programs Program consists of legal tokens – Lexical constraints – Syntax constraints – Semantic constraints – “Logical” constraints (Verifying Compiler grand challenge) Program included in a given contextfree language Type checking, legal inheritance graph, variables initialized before used Memory safety: null dereference, array-out-of-bounds access, data races, assertion violation 4

Role of syntax analysis High-level Language Lexical Analysis Syntax Analysis Parsing AST Symbol Table

Role of syntax analysis High-level Language Lexical Analysis Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Generation Executable Code (scheme) • Recover structure from stream of tokens – Parse tree / abstract syntax tree • Error reporting (recovery) • Other possible tasks – Syntax directed translation (one pass compilers) – Create symbol table – Create pretty-printed version of the program, e. g. , Auto Formatting function in Eclipse 5

From tokens to abstract syntax trees 5 + (7 * x) program text Regular

From tokens to abstract syntax trees 5 + (7 * x) program text Regular expressions Finite automata Lexical Analyzer token stream Grammar: E id E num E E+E E E*E E (E) num + ( num * id ) Context-free grammars Push-down automata Parser valid syntax error + num Abstract Syntax Tree * num id 6

Example grammar S S; S S id : = E S print (L) E

Example grammar S S; S S id : = E S print (L) E id E num E E+E L L, E shorthand for Statement shorthand for Expression shorthand for List (of expressions) 7

CFG terminology S S; S S id : = E S print (L) E

CFG terminology S S; S S id : = E S print (L) E id E num E E+E L L, E Symbols: Terminals (tokens): ; : = ( ) id num print Non-terminals: S E L Start non-terminal: S Convention: the non-terminal appearing in the first derivation rule Grammar productions (rules) N α 8

Language of a CFG • A sentence ω is in L(G) (valid program) if

Language of a CFG • A sentence ω is in L(G) (valid program) if – There exists a corresponding derivation – There exists a corresponding parse tree 9

Derivations • Show that a sentence ω is in a grammar G – Start

Derivations • Show that a sentence ω is in a grammar G – Start with the start symbol – Repeatedly replace one of the non-terminals by a right-hand side of a production – Stop when the sentence contains only terminals • Given a sentence αNβ and rule N µ αNβ => αµβ • ω is in L(G) if S =>* ω – Rightmost derivation – Leftmost derivation 10

Leftmost derivation a : = 56 ; b : = 7 S S; S

Leftmost derivation a : = 56 ; b : = 7 S S; S S id : = E S print (L) E id E num E E+E L L, E + 3 S => S ; S => id : = E ; S => id : = num ; id : = E + E => id : = num ; id : = num + num 11

Rightmost derivation a : = 56 ; b : = 7 S S; S

Rightmost derivation a : = 56 ; b : = 7 S S; S S id : = E S print (L) E id E num E E+E L L, E + 3 S => S ; id : = E + num => S ; id : = num + num => id : = E ; id : = num + num => id : = num ; id : = num + num 12

Parse trees • Tree nodes are symbols, children ordered left-to-right • Each internal node

Parse trees • Tree nodes are symbols, children ordered left-to-right • Each internal node is non-terminal and its children correspond to one of its productions N N µ 1 … µ k µ 1 … µk • Root is start non-terminal • Leaves are tokens • Yield of parse tree: left-to-right walk over leaves 13

Parse tree example S S; S S id : = E S print (L)

Parse tree example S S; S S id : = E S print (L) E id E num E E+E L L, E Draw parse tree for expression id : = num ; id : = num + num 14

Parse tree example Order-independent representation S S S; S S id : = E

Parse tree example Order-independent representation S S S; S S id : = E S print (L) E id E num E E+E L L, E Equivalently add parentheses labeled by non-terminal names S S E E id : = num ; id E : = E num + num (S(Sa : = (E 56)E)S ; (Sb : = (E(E 7)E + (E 3)E)E)S)S 15

Capabilities and limitations of CFGs • CFGs naturally express – Hierarchical structure • A

Capabilities and limitations of CFGs • CFGs naturally express – Hierarchical structure • A program is a list of classes, A Class is a list of definition, A definition is either… – Beginning-end type of constraints • Balanced parentheses S (S)S | ε p. 173 • Cannot express – Correlations between unbounded strings (identifiers) – Variables are declared before use: ω S ω – Handled by semantic analysis 16

Sometimes there are two parse trees Arithmetic expressions: E id E num 1 +

Sometimes there are two parse trees Arithmetic expressions: E id E num 1 + (2 + 3) E E+E E E E*E E (E) E 1+2+3 (1 + 2) + 3 E E E num(1) + num(2) + num(3) Leftmost derivation E E+E num + E + E num + num E num(1) + E E num(2) + num(3) Rightmost derivation E E+E E + num + num 17

Is ambiguity a problem? Arithmetic expressions: E id E num 1 + (2 +

Is ambiguity a problem? Arithmetic expressions: E id E num 1 + (2 + 3) E E+E E E E*E E (E) E 1+2+3 (1 + 2) + 3 Depends on semantics E E num(1) + num(2) + num(3) Leftmost derivation E E+E num + E + E num + num E E =6 num(1) + E E num(2) + num(3) =6 Rightmost derivation E E+E E + num + num 18

Problematic ambiguity example Arithmetic expressions: E id E num 1 + (2 * 3)

Problematic ambiguity example Arithmetic expressions: E id E num 1 + (2 * 3) E E+E E E E*E E (E) E 1+2*3 (1 + 2) * 3 This is what we usually want: * has precedence over + E E E num(1) + num(2) * num(3) Leftmost derivation E E+E num + E * E num + num * num E =7 num(1) + E E num(2) * num(3) =9 Rightmost derivation E E*E E * num E + num * num 19

Ambiguous grammars • A grammar is ambiguous if there exists a sentence for which

Ambiguous grammars • A grammar is ambiguous if there exists a sentence for which there are – Two different leftmost derivations – Two different rightmost derivations – Two different parse trees • Property of grammars, not languages • Some languages are inherently ambiguous – no unambiguous grammars exist • No algorithm to detect whether arbitrary grammar is ambiguous 20

Drawbacks of ambiguous grammars • • Ambiguous semantics Parsing complexity May affect other phases

Drawbacks of ambiguous grammars • • Ambiguous semantics Parsing complexity May affect other phases Solutions – Transform grammar into non-ambiguous – Handle as part of parsing method • Using special form of “precedence” • Wait for bottom-up parsing lecture 21

Transforming ambiguous grammars to non-ambiguous by layering Ambiguous grammar E E+E E E*E E

Transforming ambiguous grammars to non-ambiguous by layering Ambiguous grammar E E+E E E*E E id E num E (E) Unambiguous grammar E E+T Layer 1 E T T T*F Layer 2 T F F id Layer 3 F num F (E) Let’s derive 1 + 2 * 3 Each layer takes care of one way of composing substrings to form a string: 1: by + 2: by * 3: atoms 22

Transformed grammar: * precedes + Ambiguous grammar E E+E E E*E E id E

Transformed grammar: * precedes + Ambiguous grammar E E+E E E*E E id E num E (E) Unambiguous grammar E E+T E T T T*F T F F id F num F (E) Derivation E => E + T => T + T => F + T => 1 + T * F => 1 + F * F => 1 + 2 * 3 Parse tree E E T T T F F 1 + 2 F * 3 23

Transformed grammar: + precedes * Ambiguous grammar E E+E E E*E E id E

Transformed grammar: + precedes * Ambiguous grammar E E+E E E*E E id E num E (E) Unambiguous grammar E E*T E T T T+F T F F id F num F (E) Derivation E => E * T => T + F * T => F + F * T => 1 + 2 * F => 1 + 2 * 3 Parse tree E E T T T F 1 F + 2 F * 3 24

Another example for layering Ambiguous grammar P ε |PP |(P) P P P (

Another example for layering Ambiguous grammar P ε |PP |(P) P P P ( ( P P P P ε ) ( ε ) ) ( ( ε ) ) ε 25

Another example for layering Unambiguous grammar S PS |ε P (S) Ambiguous grammar P

Another example for layering Unambiguous grammar S PS |ε P (S) Ambiguous grammar P ε |PP |(P) Takes care of “concatenation” Takes care of nesting s P s s ( ( P P S S ε ) ( ε s s ) ε 26

“dangling-else” example Ambiguous grammar S if E then S S | if E then

“dangling-else” example Ambiguous grammar S if E then S S | if E then S else S | other p. 174 This is what we usually want: match else to closest unmatched then if E 1 then if E 2 then S 1 else S 2 if E 1 then (if E 2 then S 1 else S 2) if E 1 then (if E 2 then S 1) else S 2 S if E S then S if E 1 E then S else S E 1 if E then S else S E 2 S 1 S 2 if E then S E 2 S 1 27

“dangling-else” example Ambiguous grammar S if E then S S | if E then

“dangling-else” example Ambiguous grammar S if E then S S | if E then S else S | other Unambiguous grammar S M|U M if E then M else M | other U if E then S | if E then M else U p. 174 Matched statements Unmatched statements if E 1 then if E 2 then S 1 else S 2 if E 1 then (if E 2 then S 1 else S 2) if E 1 then (if E 2 then S 1) else S 2 S if E S then S if E 1 E then S else S E 1 if E then S else S E 2 S 1 S 2 if E then S E 2 S 1 28

Broad kinds of parsers • Parsers for arbitrary grammars – Earley’s method, CYK method

Broad kinds of parsers • Parsers for arbitrary grammars – Earley’s method, CYK method O(n 3) – Not used in practice • Top-Down – Construct parse tree in a top-down matter – Find the leftmost derivation – Predictive: for every non-terminal and k-tokens predict the next production LL(k) – Preorder tree traversal • Bottom-Up – Construct parse tree in a bottom-up manner – Find the rightmost derivation in a reverse order – For every potential right hand side and k-tokens decide when a production is found LR(k) – Postorder tree traversal 29

Top-down vs. bottom-up • Top-down parsing – Beginning with the start symbol, try to

Top-down vs. bottom-up • Top-down parsing – Beginning with the start symbol, try to guess the productions to apply to end up at the user's program • Bottom-up parsing – Beginning with the user's program, try to apply productions in reverse to convert the program back into the start symbol 30

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) E E T T T F 1 F + 2 F * 3 31

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) We need this rule to get the * E 1 + 2 * 3 32

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) E E T 1 + 2 * 3 33

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) E E T T F 1 + 2 * 3 34

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) E E T T T F 1 + 2 F * 3 35

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) E E T T T F 1 F + 2 F * 3 36

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id

Top-down parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) E E T T T F 1 F + 2 F * 3 37

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) 1 + 2 * 3 38

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) F 1 + 2 * 3 39

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) F 1 + 2 F * 3 40

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) T F 1 + 2 F * 3 41

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) T F 1 F + 2 F * 3 42

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) T T F 1 F + 2 F * 3 43

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) T T T F 1 F + 2 F * 3 44

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) E T T T F 1 F + 2 F * 3 45

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id

Bottom-up parsing Unambiguous grammar E E*T E T T T+F T F F id F num F (E) E E T T T F 1 F + 2 F * 3 46

Challenges in top-down parsing • Top-down parsing begins with virtually no • information –

Challenges in top-down parsing • Top-down parsing begins with virtually no • information – Begins with just the start symbol, which matches every program • How can we know which productions to apply? • In general, we can‘t – There are some grammars for which the best we can do is guess and backtrack if we're wrong • If we have to guess, how do we do it? – Parsing as a search algorithm – Too expensive in theory (exponential worst-case time) and practice 47

Predictive parsing • Given a grammar G and a word w attempt to derive

Predictive parsing • Given a grammar G and a word w attempt to derive w using G • Idea – Apply production to leftmost nonterminal – Pick production rule based on next input token • General grammar – More than one option for choosing the next production based on a token • Restricted grammars (LL) – Know exactly which single rule to apply – May require some lookahead to decide 48

Boolean expressions example E LIT | (E OP E) | not E LIT true

Boolean expressions example E LIT | (E OP E) | not E LIT true | false OP and | or | xor production to apply known from next token not ( not true or false ) E E => not ( E OP E ) => not ( not LIT OP E ) => not ( not true or LIT ) => not ( not true or false ) E not ( E not LIT true OP E or LIT ) false 49

Recursive descent parsing • Define a function for every nonterminal • Every function work

Recursive descent parsing • Define a function for every nonterminal • Every function work as follows – Find applicable production rule – Terminal function checks match with next input token – Nonterminal function calls (recursively) other functions • If there are several applicable productions for a nonterminal, use lookahead 50

Matching tokens E LIT | (E OP E) | not E LIT true |

Matching tokens E LIT | (E OP E) | not E LIT true | false OP and | or | xor match(token t) { if (current == t) current = next_token() else error } • Variable current holds the current input token 51

Functions for nonterminals E LIT | (E OP E) | not E LIT true

Functions for nonterminals E LIT | (E OP E) | not E LIT true | false OP and | or | xor E() { if (current {TRUE, FALSE}) // E LIT(); else if (current == LPAREN) // E ( E OP E ) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT) // E not E match(NOT); E(); else error; } LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; } 52

Implementation via recursion E() { if (current {TRUE, FALSE}) else if (current == LPAREN)

Implementation via recursion E() { if (current {TRUE, FALSE}) else if (current == LPAREN) E → LIT | ( E OP E ) | not E LIT → true | false OP → and | or | xor else if (current == NOT) else LIT(); match(LPARENT); E(); OP(); E(); match(RPAREN); match(NOT); E(); error; } LIT() { if (current == TRUE) else if (current == FALSE) else match(TRUE); match(FALSE); error; } OP() { if (current == AND) else if (current == OR) else if (current == XOR) else } match(AND); match(OR); match(XOR); error; 53

Adding semantic actions • Can add an action to perform on each production rule

Adding semantic actions • Can add an action to perform on each production rule • Can build the parse tree – Every function returns an object of type Node – Every Node maintains a list of children – Function calls can add new children 54

Building the parse tree Node E() { result = new Node(); result. name =

Building the parse tree Node E() { result = new Node(); result. name = “E”; if (current {TRUE, FALSE}) // E LIT result. add. Child(LIT()); else if (current == LPAREN) // E ( E OP E ) result. add. Child(match(LPAREN)); result. add. Child(E()); result. add. Child(OP()); result. add. Child(E()); result. add. Child(match(RPAREN)); else if (current == NOT) // E not E result. add. Child(match(NOT)); result. add. Child(E()); else error; return result; } 55

Recursive descent void A() { choose an A-production, A X 1 X 2…Xk; for

Recursive descent void A() { choose an A-production, A X 1 X 2…Xk; for (i=1; i≤ k; i++) { if (Xi is a nonterminal) call procedure Xi(); elseif (Xi == current) advance input; else report error; } } • How do you pick the right A-production? • Generally – try them all and use backtracking • In our case – use lookahead 56

Problem 1: productions with common prefix term ID | indexed_elem ID [ expr ]

Problem 1: productions with common prefix term ID | indexed_elem ID [ expr ] • The function for indexed_elem will never be tried… – What happens for input of the form ID[expr] 57

Problem 2: null productions S Aab A a| int S() { return A() &&

Problem 2: null productions S Aab A a| int S() { return A() && match(token(‘a’)) && match(token(‘b’)); } int A() { return match(token(‘a’)) || 1; } What happens for input “ab”? What happens if you flip order of alternatives and try “aab”? 58

Problem 3: left recursion p. 127 E E - term | term int E()

Problem 3: left recursion p. 127 E E - term | term int E() { return E() && match(token(‘-’)) && term(); } What happens with this procedure? Recursive descent parsers cannot handle left-recursive grammars 59

FIRST sets • For every production rule A� α – FIRST(α) = all terminals

FIRST sets • For every production rule A� α – FIRST(α) = all terminals that α can start with – Every token that can appear as first in α under some derivation for α • In our Boolean expressions example – FIRST( LIT ) = { true, false } – FIRST( ( E OP E ) ) = { ‘(‘ } – FIRST( not E ) = { not } • No intersection between FIRST sets => can always pick a single rule • If the FIRST sets intersect, may need longer lookahead – LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens – LL(1) is an important and useful class 60

Computing FIRST sets • Assume no null productions A � 1. Initially, for all

Computing FIRST sets • Assume no null productions A � 1. Initially, for all nonterminals A, set FIRST(A) = { t | A �tω for some ω } 2. Repeat the following until no changes occur: for each nonterminal A for each production A �Bω set FIRST(A) = FIRST(A) ∪ FIRST(B) • This is known a fixed-point 61

FIRST sets computation example STMT if EXPR then STMT | while EXPR do STMT

FIRST sets computation example STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant STMT EXPR TERM 62

1. Initialization STMT if EXPR then STMT | while EXPR do STMT | EXPR

1. Initialization STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant STMT EXPR TERM if while zero? Not ++ -- id constant 63

2. Iterate 1 STMT if EXPR then STMT | while EXPR do STMT |

2. Iterate 1 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant STMT EXPR TERM if while zero? Not ++ -- id constant zero? Not ++ -- 64

2. Iterate 2 STMT if EXPR then STMT | while EXPR do STMT |

2. Iterate 2 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant STMT EXPR TERM if while zero? Not ++ -- id constant 65

2. Iterate 3 – fixed-point STMT if EXPR then STMT | while EXPR do

2. Iterate 3 – fixed-point STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant STMT EXPR TERM if while zero? Not ++ -- id constant 66

FOLLOW sets p. 189 • What do we do with nullable ( ) productions?

FOLLOW sets p. 189 • What do we do with nullable ( ) productions? – A� B C D B� C� – Use what comes afterwards to predict the right production • For every production rule A� α – FOLLOW(A) = set of tokens that can immediately follow A • Can predict the alternative Ak for a non-terminal N when the lookahead token is in the set – FIRST(Ak) �(if Ak is nullable then FOLLOW(N)) 67

LL(k) grammars • A grammar is in the class LL(K) when it can be

LL(k) grammars • A grammar is in the class LL(K) when it can be derived via: – Top-down derivation – Scanning the input from left to right (L) – Producing the leftmost derivation (L) – With lookahead of k tokens (k) – For every two productions A� α and A� β we have FIRST(α) ∩ FIRST(β) = {} and FIRST(A) ∩ FOLLOW(A) = {} • A language is said to be LL(k) when it has an LL(k) grammar 68

Back to problem 1 term ID | indexed_elem ID [ expr ] • FIRST(term)

Back to problem 1 term ID | indexed_elem ID [ expr ] • FIRST(term) = { ID } • FIRST(indexed_elem) = { ID } • FIRST/FIRST conflict 69

Solution: left factoring • Rewrite the grammar to be in LL(1) term ID |

Solution: left factoring • Rewrite the grammar to be in LL(1) term ID | indexed_elem ID [ expr ] term ID after_ID After_ID [ expr ] | Intuition: just like factoring x*y + x*z into x*(y+z) 70

Left factoring – another example S if E then S else S | if

Left factoring – another example S if E then S else S | if E then S |T S if E then S S’ |T S’ else S | 71

Back to problem 2 S Aab A a| • FIRST(S) = { a }

Back to problem 2 S Aab A a| • FIRST(S) = { a } • FIRST(A) = { a } FOLLOW(S) = { } FOLLOW(A) = { a } • FIRST/FOLLOW conflict 72

Solution: substitution S Aab A a| Substitute A in S S aab|ab Left factoring

Solution: substitution S Aab A a| Substitute A in S S aab|ab Left factoring S a after_A a b | b 73

Back to problem 3 E E - term | term • Left recursion cannot

Back to problem 3 E E - term | term • Left recursion cannot be handled with a bounded lookahead • What can we do? 74

Left recursion removal N Nα | β p. 130 N βN’ N’ αN’ |

Left recursion removal N Nα | β p. 130 N βN’ N’ αN’ | G 1 • L(G 1) = β, βαα, βααα, … • L(G 2) = same G 2 Can be done algorithmically. Problem: grammar becomes mangled beyond recognition For our 3 rd example: E E - term | term E term TE | term TE - term TE | 75

LL(k) Parsers • Recursive Descent – Manual construction – Uses recursion • Wanted –

LL(k) Parsers • Recursive Descent – Manual construction – Uses recursion • Wanted – A parser that can be generated automatically – Does not use recursion 76

LL(k) parsing via pushdown automata • Pushdown automaton uses – Prediction stack – Input

LL(k) parsing via pushdown automata • Pushdown automaton uses – Prediction stack – Input stream – Transition table • nonterminals x tokens -> production alternative • Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t 77

LL(k) parsing via pushdown automata • Two possible moves – Prediction • When top

LL(k) parsing via pushdown automata • Two possible moves – Prediction • When top of stack is nonterminal N, pop N, lookup table[N, t]. If table[N, t] is not empty, push table[N, t] on prediction stack, otherwise – syntax error – Match • When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t. If (t ≠ T) syntax error • Parsing terminates when prediction stack is empty – If input is empty at that point, success. Otherwise, syntax error 78

Model of non-recursive predictive parser a Stack X Y + b Predictive Parsing program

Model of non-recursive predictive parser a Stack X Y + b Predictive Parsing program $ Output Z $ Parsing Table 79

Example transition table (1) E → LIT (2) E → ( E OP E

Example transition table (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Which rule should be used Nonterminals Input tokens ( E LIT OP 2 ) not true false 3 1 1 4 5 and or xor 6 7 8 $ 80

Running parser example aacbb$ A �a. Ab | c Input suffix Stack content Move

Running parser example aacbb$ A �a. Ab | c Input suffix Stack content Move aacbb$ A$ predict(A, a) = A �a. Ab aacbb$ a. Ab$ match(a, a) acbb$ Ab$ predict(A, a) = A �a. Ab acbb$ a. Abb$ match(a, a) cbb$ Abb$ predict(A, c) = A �c cbb$ match(c, c) bb$ match(b, b) b$ b$ match(b, b) $ $ match($, $) – success a A A �a. Ab b c A �c 81

Illegal input example abcbb$ A �a. Ab | c Input suffix Stack content Move

Illegal input example abcbb$ A �a. Ab | c Input suffix Stack content Move abcbb$ A$ predict(A, a) = A �a. Ab abcbb$ a. Ab$ match(a, a) bcbb$ Ab$ predict(A, b) = ERROR a A A �a. Ab b c A �c 82

Error handling and recovery x = a * (p+q * ( -b * (r-s);

Error handling and recovery x = a * (p+q * ( -b * (r-s); Where should we report the error? The valid prefix property Recovery is tricky Heuristics for dropping tokens, skipping to semicolon, etc. 83

Error handling in LL parsers c$ S �a c | b S Input suffix

Error handling in LL parsers c$ S �a c | b S Input suffix Stack content Move c$ S$ predict(S, c) = ERROR • Now what? – Predict b S anyway “missing token b inserted in line XXX” S a b S �a c S �b S c 84

Error handling in LL parsers c$ S �a c | b S Input suffix

Error handling in LL parsers c$ S �a c | b S Input suffix Stack content Move bc$ S$ predict(b, c) = S �b. S bc$ b. S$ match(b, b) c$ S$ Looks familiar? • Result: infinite loop S a b S �a c S �b S c 85

Error handling • Requires more systematic treatment • Enrichment – Acceptable-set method – Not

Error handling • Requires more systematic treatment • Enrichment – Acceptable-set method – Not part of course material 86

Summary • Parsing – Top-down or bottom-up • Top-down parsing – Recursive descent –

Summary • Parsing – Top-down or bottom-up • Top-down parsing – Recursive descent – LL(k) grammars – LL(k) parsing with pushdown automata • LL(K) parsers – Cannot deal with left recursion – Left-recursion removal might result with complicated grammar 87

See you next time

See you next time