Parsing Outline Topdown v s Bottomup Topdown parsing
Parsing
Outline Top-down v. s. Bottom-up Top-down parsing l l 2301373 Recursive-descent parsing LL(1) parsing l LL(1) parsing algorithm l First and follow sets l Constructing LL(1) parsing table l Error recovery Bottom-up parsing l l l Shift-reduce parsers LR(0) parsing l LR(0) items l Finite automata of items l LR(0) parsing algorithm l LR(0) grammar SLR(1) parsing l SLR(1) parsing algorithm l SLR(1) grammar l Parsing conflict Chapter 4 Parsing 2
Introduction Parsing is a process that constructs a syntactic structure (i. e. parse tree) from the stream of tokens. We already learn how to describe the syntactic structure of a language using (context-free) grammar. So, a parser only need to do this? Stream of tokens Context-free grammar 2301373 Parser Chapter 4 Parsing Parse tree 3
Top–Down Parsing Bottom–Up Parsing A parse tree is created from root to leaves from leaves to root The traversal of parse trees is a preorder trees is a reversal of traversal postorder traversal Tracing leftmost Tracing rightmost derivation Try different Two types: Morestructures powerfuland than top -down parsing if it does not matched l Backtracking parser backtrack l Predictive parser the input Guess the structure of the parse tree 2301373 Chapter 4 Parsing from the next input 4
Parse Trees and Derivations E E + id E E E * E id id Top-down parsing E E id E + E * E id id Bottom-up parsing 2301373 E+E id + E * E id + id * id E E+E E + E * id E + id * id Chapter 4 Parsing 5
Top-down Parsing What does a parser need to decide? l Which production rule is to be used at each point of time ? How to guess? What is the guess based on? l What is the next token? l l What is the structure to be built? l 2301373 Reserved word if, open parentheses, etc. If statement, expression, etc. Chapter 4 Parsing 6
Top-down Parsing Why is it difficult? l Cannot decide until later l l l Next token: if St Matched. St | Unmatched. St if (E) St| if (E) Matched. St else Unmatched. St Matched. St if (E) Matched. St else Matched. St. . . | Production with empty string Next token: id l par. List | l l 2301373 Structure to be built: St Structure to be built: par. List exp , par. List | exp Chapter 4 Parsing 7
Recursive-Descent Write one procedure for each set of productions with the same nonterminal in the LHS Each procedure recognizes a structure described by a nonterminal. A procedure calls other procedures if it need to recognize other structures. A procedure calls match procedure if it need to recognize a terminal. 2301373 Chapter 4 Parsing 8
Recursive-Descent: Example For this grammar: E : : = F {O F} l We cannot decide which O : : = + | rule to use for E, and F : : = ( E ) | id l If we choose E E O F, procedure E procedure F it leads to infinitely E; O; F; } recursive loops. { switch token { { case (: match(‘(‘); Rewrite the grammar E; into EBNF match(‘)’); case id: match(id); procedure E default: error; { F; } while (token=+ or token=-) } { O; F; } } E EOF|F O +|F ( E ) | id 2301373 Chapter 4 Parsing 9
Match procedure match(exp. Tok) { if (token==exp. Tok) then get. Token else error } The token is not consumed until get. Token is executed. 2301373 Chapter 4 Parsing 10
Problems in Recursive-Descent Difficult to convert grammars into EBNF Cannot decide which production to use at each point Cannot decide when to use -production A 2301373 Chapter 4 Parsing 11
LL(1) Parsing LL(1( Read input from (L) left to right l Simulate (L) leftmost derivation l 1 lookahead symbol l Use stack to simulate leftmost derivation Part of sentential form produced in the leftmost derivation is stored in the stack. l Top of stack is the leftmost nonterminal symbol in the fragment of sentential form. l 2301373 Chapter 4 Parsing 12
Concept of LL(1) Parsing Simulate leftmost derivation of the input. Keep part of sentential form in the stack. If the symbol on the top of stack is a terminal, try to match it with the next input token and pop it out of stack. If the symbol on the top of stack is a nonterminal X, replace it with Y if we have a production rule X Y. l 2301373 Which production will be chosen, if there are both X Y and X Z? Chapter 4 Parsing 13
Example of LL(1) Parsing E TX FNX (E)NX (TX)NX (FNX)NX (n. ATX)NX (n+FNX)NX (n+(E)NX)NX (n+(TX)NX)NX (n+(FNX)NX)NX (n+(n)X)NX (n+(n))MFNX (n+(n))*n. X (n+(n))*n 2301373 n F T N ( ( n + ( n ) ) * n $ X E A n F + ) E T X ( T N X A T X| A - | + E X Finished M * T F N F ) n N M F N| M * T N F (E | ( n E X $Chapter 4 Parsing 14
LL(1) Parsing Algorithm Push the start symbol into the stack WHILE stack is not empty ($ is not on top of stack) and the stream of tokens is not empty (the next input token is not($ SWITCH (Top of stack, next token( CASE (terminal a, a: ( Pop stack; Get next token CASE (nonterminal A, terminal a: ( IF the parsing table entry M[A, a] is not empty THEN Get A X 1 X 2. . . Xn from the parsing table entry M[A, a] Pop stack ; Push Xn. . . X 2 X 1 into stack in that order ELSE Error CASE ($, $): Accept OTHER: Error 2301373 Chapter 4 Parsing 15
LL(1) Parsing Table If the nonterminal N is on the top of stack and the next token is t, which production rule to use? Choose a rule N X such that X * t. Y or l X * and S * WNt. Y l t N Y X Q N X t Y t … … … 2301373 Chapter 4 Parsing 16
First Set Let X be or be in V or T. First(X ) is the set of the first terminal in any sentential form derived from X. If X is a terminal or , then First(X ) ={X. { l If X is a nonterminal and X X 1 X 2. . . Xn is a rule, then l First(X 1) -{ } is a subset of First(X( l First(Xi )-{ } is a subset of First(X) if for all j<i First(Xj) contains { { l is in First(X) if for all j≤n First(Xj)contains l 2301373 Chapter 4 Parsing 17
Examples of First Set st ifst | other exp addop term | ifst if ( exp ) st elsepart term elsepart else st | addop - | + 0|1 term mulop factor | exp factor First(exp) = {0, 1{ mulop * First(elsepart) = {else, { factor (exp) | num First(ifst) = {if{ First(addop {- , +} = ( First(st) = {if, other{ First(mulop{*} = ( First(factor) = {(, num { First(term) = {(, num { First(exp) = {(, num{ exp 2301373 Chapter 4 Parsing 18
Algorithm for finding First(A( For all terminals a, First(a) = {a{ For all nonterminals A, First(A ( {} =: While there are changes to any First(A ( For each rule A X 1 X 2. . . Xn For each Xi in {X 1, X 2, …, Xn { If for all j<i First(Xj) contains , Then add First(Xi)-{ } to 2301373 Chapter 4 Parsing First(A( If A is a terminal or , then First(A) = {A. { If A is a nonterminal, then for each rule A X 1 X 2. . . Xn, First(A) contains First(X 1) - {. { If also for some i<n, First(X 1), First(X 2), . . . , and First(Xi) contain , then First(A) contains First(Xi+1)-{. { If First(X 1), First(X 2), . . . , and First(Xn) contain , then First(A) also contains . 19
Finding First Set: An Example exp term exp’ addop term exp’ | addop - | + term factor term’ mulop factor term’ | mulop * factor ( exp ) | num 2301373 Chapter 4 Parsing First exp’ addop term’ mulop factor - + ( num * ( num 20
Follow Set Let $ denote the end of input tokens If A is the start symbol, then $ is in Follow(A. ( If there is a rule B X A Y, then First(Y) { } is in Follow(A. ( If there is production B X A Y and is in First(Y), then Follow(A) contains Follow(B. ( 2301373 Chapter 4 Parsing 21
Algorithm for Finding Follow(A( Follow(S{$} = ( If A is the start symbol, then $ is FOR each A in V-{S { in Follow(A. ( Follow(A{}=( If there is a rule A WHILE change is made to some Y X Z, then Follow sets First(Z) - { } is in FOR each production A X 1 X 2. . . Follow(X. ( Xn , If there is production FOR each nonterminal Xi B X A Y and Add First(Xi+1 Xi+2. . . Xn)is in First(Y), then { } into Follow(Xi. ( Follow(A) contains ) NOTE: If i=n, Xi+1 Xi+2. . . Xn= Follow(B. ( ( Xi+2. . . X n) 2301373 IF is in First(Xi+1 Chapter 4 Parsing 22
Finding Follow Set: An Example exp term exp’ addop term exp’ | addop - | + term factor term’ mulop factor term’ | mulop * factor ( exp ) | num 2301373 First exp’ addop term’ mulop factor Chapter 4 Parsing Follow ( num $) - + $) ( num - + $ ) * * ( num 23
Constructing LL(1) Parsing Tables FOR each nonterminal A and a production A X FOR each token a in First(X( A X is in M(A, a( IF is in First(X) THEN FOR each element a in Follow(A( Add A X to M(A, a( 2301373 Chapter 4 Parsing 24
Example: Constructing LL(1) Parsing Table First Follow exp {(, num{(, $} { ( ) exp’ {+, -, {(, $} { exp addop {+, -} 1 {(, num{ term {(, num{$, (, -, +} { exp’ 3 term’ {*, -, +} { {$, (, addop mulop {*} {(, num{ factor {(, num{$, (, -, +, *} {term 6 1 exp term exp’ 2 exp’ addop term exp ’ 3 exp’ 4 addop + 5 addop 6 term factor term’ 7 term’ mulop factor term’ 8 term’ 9 mulop * 10 factor 2301373 ( exp ( 11 factor num term’ 8 mulop factor + - * n $ 1 2 2 4 5 3 6 8 8 7 8 9 10 Chapter 4 Parsing 11 25
LL(1) Grammar A grammar is an LL(1) grammar if its LL(1) parsing table has at most one production in each table entry. 2301373 Chapter 4 Parsing 26
LL(1) Parsing Table for non-LL(1) Grammar 1 exp addop term 2 exp term 3 term mulop factor 4 term factor 5 factor ( exp ( 6 factor num 7 addop + 8 addop 9 mulop * First(exp) = { (, num{ First(term) = { (, num{ First(factor) = { (, num{ First(addop{ - , + } = ( First(mulop{ * } = ( 2301373 Chapter 4 Parsing 27
Causes of Non-LL(1) Grammar What causes grammar being non-LL(1? ( Left-recursion l Left factor l 2301373 Chapter 4 Parsing 28
Left Recursion Immediate left recursion l l Can be removed very easily A A X | Y A=Y X* A A X 1 | A X 2 |…| A Xn | Y 1 | Y 2 |. . . | Ym l l A Y A’, A’ X A’| A Y 1 A’ | Y 2 A’ |. . . | Ym A’, A’ X 1 A’| X 2 A’|…| Xn A’| A={Y 1, Y 2, …, Ym} {X 1, X 2, …, Xn}* General left recursion l 2301373 A => X =>* A Y Can be removed when there is no empty-string production and no cycle in the grammar Chapter 4 Parsing 29
Removal of Immediate Left Recursion exp + term | exp - term | term * factor | factor ( exp ) | num Remove left recursion exp = term ( term)* exp term exp’ + term exp’ | - term exp’ | term factor term’ term = factor (* factor)* term’ * factor term’ | factor ( exp ) | num 2301373 Chapter 4 Parsing 30
General Left Recursion Bad News! l Can only be removed when there is no emptystring production and no cycle in the grammar. Good News!!!! l 2301373 Never seen in grammars of any programming languages Chapter 4 Parsing 31
Left Factoring Left factor causes non-LL(1( l Given A X Y | X Z. Both A X Y and A X Z can be chosen when A is on top of stack and a token in First(X) is the next token. A XY|XZ can be left-factored as A X A’ and A’ Y | Z 2301373 Chapter 4 Parsing 32
Example of Left Factor if. St if ( exp ) st else st | if ( exp ) st can be left-factored as if. St if ( exp ) st else. Part else st | seq st ; seq | st can be left-factored as seq st seq ’ seq’ ; seq | 2301373 Chapter 4 Parsing 33
End of Top-Down Parsing Credit : the creator of the slides Jaruloj Chongstitvatana Department of Mathematics and Computer Science Faculty of Science Chulalongkorn University 2301373 Chapter 4 Parsing 34
- Slides: 34