Where Are We Source code if b0 a
Where Are We? Source code: if (b==0) a = “Hi”; Lexical Analysis Token Stream: if (b == 0) a = “Hi”; Syntactic Analysis Abstract Syntax Tree (AST) if == b ; = 0 a Semantic Analysis “Hi” Do tokens conform to the language syntax? 0 CS 671 – Spring 2008
Last Time • Parse trees vs. ASTs • Derivations – Leftmost vs. Rightmost • Grammar ambiguity 1 CS 671 – Spring 2008
Parsing What is parsing? • • Discovering the derivation of a string: If one exists Harder than generating strings Two major approaches • • Top-down parsing Bottom-up parsing Won’t work on all context-free grammars • • 2 Properties of grammar determine parse-ability We may be able to transform a grammar CS 671 – Spring 2008
Two Approaches Top-down parsers • • • LL(1), recursive descent Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad “pick” may need to backtrack Bottom-up parsers • • • 3 LR(1), operator precedence Start at the leaves and grow toward root As input is consumed, encode possible parse trees in an internal state Bottom-up parsers handle a large class of grammars CS 671 – Spring 2008
Grammars and Parsers LL(1) parsers Grammars that this can handle are called LL(1) grammars LR(1) parsers Grammars that this can handle are called LR(1) grammars • • • Left-to-right input Leftmost derivation 1 symbol of look-ahead Left-to-right input Rightmost derivation 1 symbol of look-ahead Also: LL(k), LR(k), SLR, LALR, … 4 CS 671 – Spring 2008
Top-Down Parsing Start with the root of the parse tree • Root of the tree: node labeled with the start symbol Algorithm: Repeat until the fringe of the parse tree matches input string • At a node A, select a production for A • • Add a child node for each symbol on rhs If a terminal symbol is added that doesn’t match, backtrack Find the next node to be expanded (a non-terminal) Done when: • • 5 Leaves of parse tree match input string All productions exhausted in backtracking CS 671 – Spring 2008 (success) (failure)
Example Expression grammar (with precedence) # Production rule 1 expr → expr + term 2 | expr - term 3 | term 4 term → term * factor 5 | term / factor 6 | factor 7 factor → number 8 Input string 6 | identifier x – 2 * y CS 671 – Spring 2008
Example Rule Sentential form 2 3 6 8 - expr + term factor + term <id> + term <id, x> + term Current position in the input stream Input string x x x – – 2 2 2 * * * y y y Problem: • • 7 expr term fact Can’t match next terminal We guessed wrong at step 2 CS 671 – Spring 2008 x + term
Backtracking Rule Sentential form 2 3 6 8 ? expr + term factor + term <id> + term <id, x> + term Input string x x x – – 2 2 2 * * * y y y Undo all these productions Rollback productions Choose a different production for expr Continue 8 CS 671 – Spring 2008
Retrying Rule Sentential form 2 3 6 8 3 7 expr - term factor - term <id> - term <id, x> - factor <id, x> - <num> Input string x x x x – – – 2 2 2 2 * * * * expr y y y y Problem: • • 9 expr CS 671 – Spring 2008 term fact 2 x More input to read Another cause of backtracking -
Successful Parse Rule Sentential form 2 3 6 8 4 6 7 8 expr - term factor - term <id> - term <id, x> - term * fact <id, x> - fact * fact <id, x> - <num> * fact <id, x> - <num, 2> * <id> Input string x x x – – – – – 2 2 2 * * * y expr y y y y y expr - term fact x 2 All terminals match – we’re finished 10 CS 671 – Spring 2008 term * fact y
Other Possible Parses Rule Sentential form 2 2 Input string x - 2 * expr + term x – 2 * expr + term + term x – 2 * y y y Problem: termination • • • 11 Wrong choice leads to infinite expansion (More importantly: without consuming any input!) May not be as obvious as this Our grammar is left recursive CS 671 – Spring 2008
Left Recursion Formally, A grammar is left recursive if a non-terminal A such that A →* A (for some set of symbols a) What does →* mean? A→Bx B→Ay Bad news: Top-down parsers cannot handle left recursion Good news: We can systematically eliminate left recursion 12 CS 671 – Spring 2008
Removing Left Recursion Two cases of left recursion: # Production rule 1 expr → expr + term 4 term 2 | expr - term 5 | term / factor 3 | term 6 | factor → term * factor Transform as follows: 13 # Production rule 1 expr 4 term → term expr 2 → factor term 2 2 expr 2 → + term expr 2 5 term 2 → * factor term 2 3 | - term expr 2 6 4 | | / factor term 2 | CS 671 – Spring 2008
Right-Recursive Grammar # Production rule 1 expr 2 → + term expr 2 → term expr 2 3 | - term expr 2 4 | 5 term 6 term 2 → * factor term 2 7 | / factor term 2 8 | 9 factor → number 10 14 Two productions with no choice at all All other productions are uniquely identified by a terminal symbol at the start of RHS → factor term 2 | identifier We can choose the right production by looking at the next input symbol • • This is called lookahead BUT, this can be tricky… CS 671 – Spring 2008
Top-Down Parsing Goal: • Given productions A → | b , the parser should be able to choose between and b How can the next input token help us decide? Solution: FIRST sets • • 15 Informally: FIRST( ) is the set of tokens that could appear as the first symbol in a string derived from Def: x in FIRST( ) iff →* x CS 671 – Spring 2008
The LL(1) Property • Given A → and A → b, we would like: FIRST(a) FIRST(b) = 16 • Parser can make right choice by looking at one lookahead token • . . almost. . CS 671 – Spring 2008
Example: Calculating FIRST Sets # Production rule 1 goal → expr 2 expr → term expr 2 3 expr 2 → 4 | 5 | + term expr 2 - term expr 2 6 term 7 term 2 → * factor term 2 8 → factor term 2 | / factor term 2 | 10 factor → number | identifier 11 9 17 FIRST(3) = { + } FIRST(4) = { - } FIRST(5) = { } FIRST(7) = { * } FIRST(8) = { / } FIRST(9) = { } FIRST(1) = ? FIRST(1) = FIRST(2) = FIRST(6) = FIRST(10) FIRST(11) = { number, identifier } CS 671 – Spring 2008
Top-Down Parsing What about productions? • • • Complicates the definition of LL(1) Consider A → and A → b and may be empty In this case there is no symbol to identify Example: • • • # Production rule What is FIRST(3)? ={ } What lookahead symbol tells us we are matching production 3? 1 A → x 2 | y 3 | B C Solution • 18 Build a FOLLOW set for each production with CS 671 – Spring 2008
FIRST and FOLLOW Sets FIRST( ) For some (T NT)*, define FIRST( ) as the set of tokens that appear as the first symbol in some string that derives from That is, x FIRST( ) iff * x , for some FOLLOW(A) For some A NT, define FOLLOW(A) as the set of symbols that can occur immediately after A in a valid sentence. FOLLOW(G) = {EOF}, where G is the start symbol 19 CS 671 – Spring 2008
Example: Calculating Follow Sets (1) 20 # Production rule 1 goal → expr 2 expr → term expr 2 3 expr 2 → 4 | 5 | FOLLOW(goal) = { EOF } FOLLOW(expr) = FOLLOW(goal) = { EOF } + term expr 2 - term expr 2 term → factor term 2 7 term 2 → * factor term 2 | / factor 9 10 | factor → number 11 | FOLLOW(term) = ? FOLLOW(term) += FIRST(expr 2) 6 8 FOLLOW(expr 2) = FOLLOW(expr) = { EOF } term 2 += { +, -, } += { +, -, FOLLOW(expr)} += { +, -, EOF } identifier CS 671 – Spring 2008
Example: Calculating Follow Sets (2) 21 # Production rule 1 goal → expr FOLLOW(term 2) += FOLLOW(term) 2 expr → term expr 2 FOLLOW(factor) = ? 3 expr 2 → 4 | 5 | + term expr 2 - term expr 2 FOLLOW(factor) += FIRST(term 2) 6 term → factor term 2 7 term 2 → * factor term 2 8 | / factor 9 10 | factor → number 11 | += { *, / , } += { *, / , FOLLOW(term)} term 2 identifier CS 671 – Spring 2008 += { *, / , +, -, EOF }
Updated LL(1) Property Including productions • • FOLLOW(A) = the set of terminal symbols that can immediately follow A Def: FIRST+(A → ) as – FIRST( ) U FOLLOW(A), if FIRST( ) – FIRST( ), otherwise Def: a grammar is LL(1) iff A → and A → b and FIRST+(A → ) FIRST+(A → b) = 22 CS 671 – Spring 2008
Predictive Parsing Given an LL(1) Grammar • • The parser can “predict” the correct expansion Using lookahead and FIRST and FOLLOW sets Two kinds of predictive parsers • • 23 Recursive descent Often hand-written Table-driven Generate tables from First and Follow sets CS 671 – Spring 2008
Recursive Descent # Production rule 1 goal → expr 2 expr → term expr 2 3 expr 2 → 4 | 5 | + term expr 2 - term expr 2 6 term 7 term 2 → * factor term 2 8 → factor term 2 | / factor term 2 | 10 factor → number | identifier 11 9 12 24 | ( expr ) This produces a parser with six mutually recursive routines: • • • Goal Expr 2 Term 2 Factor Each recognizes one NT or T The term descent refers to the direction in which the parse tree is built. CS 671 – Spring 2008
Example Code Goal symbol: main() /* Match goal --> expr */ tok = next. Token(); if (expr() && tok == EOF) then proceed to next step; else return false; Top-level expression expr() /* Match expr --> term expr 2 */ if (term() && expr 2()); return true; else return false; 25 CS 671 – Spring 2008
Example Code Match expr 2() /* Match expr 2 --> + term expr 2 */ /* Match expr 2 --> - term expr 2 */ if (tok == ‘+’ or tok == ‘-’) tok = next. Token(); if (term()) then return expr 2(); else return false; /* Match expr 2 --> empty */ return true; 26 CS 671 – Spring 2008 Check FIRST and FOLLOW sets to distinguish
Example Code factor() /* Match factor --> ( expr ) */ if (tok == ‘(‘) tok = next. Token(); if (expr() && tok == ‘)’) return true; else syntax error: expecting ) return false /* Match factor --> num */ if (tok is a num) return true /* Match factor --> id */ if (tok is an id) return true; 27 CS 671 – Spring 2008
Top-Down Parsing So far: • • • Gives us a yes or no answer We want to build the parse tree How? Add actions to matching routines • • 28 Create a node for each production How do we assemble the tree? CS 671 – Spring 2008
Building a Parse Tree Notice: • Recursive calls match the shape of the tree Idea: use a stack • Each routine: main expr term factor expr 2 term – Pops off the children it needs – Creates its own node – Pushes that node back on the stack 29 CS 671 – Spring 2008
Building a Parse Tree With stack operations expr() /* Match expr --> term expr 2 */ if (term() && expr 2()) expr 2_node = pop(); term_node = pop(); expr_node = new expr. Node(term_node, expr 2_node) push(expr_node); return true; else return false; 30 CS 671 – Spring 2008
Recursive Descent Parsing Massage grammar to have LL(1) condition • • Remove left recursion Left factor, where possible Build FIRST (and FOLLOW) sets Define a procedure for each non-terminal • • • Implement a case for each right-hand side Call procedures as needed for non-terminals Add extra code, as needed Can we automate this process? 31 CS 671 – Spring 2008
Table-driven approach Encode mapping in a table • • 32 Row for each non-terminal Column for each terminal symbol Table[NT, symbol] = rule# if symbol FIRST+(NT rhs(#)) +, - *, / id, num expr 2 term expr 2 error term 2 error factor error (do nothing) CS 671 – Spring 2008
Code push the start symbol, G, onto Stack top of Stack loop forever if top = EOF and token = EOF then break & report success if top is a terminal then if top matches token then pop Stack // recognized top token next_token() else // top is a non-terminal if TABLE[top, token] is A B 1 B 2…Bk then pop Stack // get rid of A push Bk, Bk-1, …, B 1 // in that order top of Stack Note: Missing else conditions for errors 33 CS 671 – Spring 2008
Next Time … Bottom-up Parsers • • More powerful Widely used – yacc, bison, Java. CUP Overview of YACC • • 34 Removing shift/reduce/reduce conflicts Just in case you haven’t started your homework! CS 671 – Spring 2008
- Slides: 35