Parsing Outline Topdown v s Bottomup Topdown parsing
Parsing
Outline Top-down v. s. Bottom-up Top-down parsing l l 2301373 Recursive-descent parsing LL(1) parsing l LL(1) parsing algorithm l First and follow sets l Constructing LL(1) parsing table l Error recovery Bottom-up parsing l l l Shift-reduce parsers LR(0) parsing l LR(0) items l Finite automata of items l LR(0) parsing algorithm l LR(0) grammar SLR(1) parsing l SLR(1) parsing algorithm l SLR(1) grammar l Parsing conflict Chapter 4 Parsing 2
Introduction Parsing is a process that constructs a syntactic structure (i. e. parse tree) from the stream of tokens. We already learn how to describe the syntactic structure of a language using (context-free) grammar. So, a parser only need to do this? Stream of tokens Context-free grammar 2301373 Parser Chapter 4 Parsing Parse tree 3
Top–Down Parsing Bottom–Up Parsing A parse tree is created from root to leaves from leaves to root The traversal of parse trees is a preorder trees is a reversal of traversal postorder traversal Tracing leftmost Tracing rightmost derivation Try different Two types: Morestructures powerfuland than top -down parsing if it does not matched l Backtracking parser backtrack l Predictive parser the input Guess the structure of the parse tree 2301373 Chapter 4 Parsing from the next input 4
Parse Trees and Derivations E E + id E * E id id Top-down parsing E E id E + E * E id id Bottom-up parsing 2301373 E+E id + E * E id + id * id E E+E E + E * id E + id * id Chapter 4 Parsing 5
Top-down Parsing What does a parser need to decide? l Which production rule is to be used at each point of time ? How to guess? What is the guess based on? l What is the next token? l l What is the structure to be built? l 2301373 Reserved word if, open parentheses, etc. If statement, expression, etc. Chapter 4 Parsing 6
Top-down Parsing Why is it difficult? l Cannot decide until later l l l Next token: if St Matched. St | Unmatched. St if (E) St| if (E) Matched. St else Unmatched. St Matched. St if (E) Matched. St else Matched. St. . . | Production with empty string Next token: id l par. List | l l 2301373 Structure to be built: St Structure to be built: par. List exp , par. List | exp Chapter 4 Parsing 7
Recursive-Descent Write one procedure for each set of productions with the same nonterminal in the LHS Each procedure recognizes a structure described by a nonterminal. A procedure calls other procedures if it need to recognize other structures. A procedure calls match procedure if it need to recognize a terminal. 2301373 Chapter 4 Parsing 8
Recursive-Descent: Example For this grammar: E : : = F {O F} l We cannot decide which O : : = + | rule to use for E, and F : : = ( E ) | id l If we choose E E O F, procedure E procedure F it leads to infinitely E; O; F; } recursive loops. { switch token { { case (: match(‘(‘); Rewrite the grammar E; into EBNF match(‘)’); case id: match(id); procedure E default: error; { F; } while (token=+ or token=-) } { O; F; } } E EOF|F O +|F ( E ) | id 2301373 Chapter 4 Parsing 9
Match procedure match(exp. Tok) { if (token==exp. Tok) then get. Token else error } The token is not consumed until get. Token is executed. 2301373 Chapter 4 Parsing 10
Problems in Recursive-Descent Difficult to convert grammars into EBNF Cannot decide which production to use at each point Cannot decide when to use -production A 2301373 Chapter 4 Parsing 11
LL(1) Parsing LL(1( Read input from (L) left to right l Simulate (L) leftmost derivation l 1 lookahead symbol l Use stack to simulate leftmost derivation Part of sentential form produced in the leftmost derivation is stored in the stack. l Top of stack is the leftmost nonterminal symbol in the fragment of sentential form. l 2301373 Chapter 4 Parsing 12
Concept of LL(1) Parsing Simulate leftmost derivation of the input. Keep part of sentential form in the stack. If the symbol on the top of stack is a terminal, try to match it with the next input token and pop it out of stack. If the symbol on the top of stack is a nonterminal X, replace it with Y if we have a production rule X Y. l 2301373 Which production will be chosen, if there are both X Y and X Z? Chapter 4 Parsing 13
Example of LL(1) Parsing E TX FNX (E)NX (TX)NX (FNX)NX (n. ATX)NX (n+FNX)NX (n+(E)NX)NX (n+(TX)NX)NX (n+(FNX)NX)NX (n+(n)X)NX (n+(n))MFNX (n+(n))*n. X (n+(n))*n 2301373 n F T N ( ( n + ( n ) ) * n $ X E A n F + ) E T X ( T N X A T X| A - | + E X Finished M * T F N F ) n N M F N| M * T N F (E ) | n E X $Chapter 4 Parsing 14
LL(1) Parsing Algorithm Push the start symbol into the stack WHILE stack is not empty ($ is not on top of stack) and the stream of tokens is not empty (the next input token is not($ SWITCH (Top of stack, next token( CASE (terminal a, a: ( Pop stack; Get next token CASE (nonterminal A, terminal a: ( IF the parsing table entry M[A, a] is not empty THEN Get A X 1 X 2. . . Xn from the parsing table entry M[A, a] Pop stack ; Push Xn. . . X 2 X 1 into stack in that order ELSE Error CASE ($, $): Accept OTHER: Error 2301373 Chapter 4 Parsing 15
LL(1) Parsing Table If the nonterminal N is on the top of stack and the next token is t, which production rule to use? Choose a rule N X such that X * t. Y or l X * and S * WNt. Y l t N Y X Q N X t Y t … … … 2301373 Chapter 4 Parsing 16
First Set Let X be or be in V or T. First(X ) is the set of the first terminal in any sentential form derived from X. If X is a terminal or , then First(X ) ={X. { l If X is a nonterminal and X X 1 X 2. . . Xn is a rule, then l First(X 1) -{ } is a subset of First(X( l First(Xi )-{ } is a subset of First(X) if for all j<i First(Xj) contains { { l is in First(X) if for all j≤n First(Xj)contains l 2301373 Chapter 4 Parsing 17
Examples of First Set st ifst | other exp addop term| ifst if ( exp ) st elsepart term elsepart else st | addop - | + 0|1 term mulop factor | exp factor First(exp) = {0, 1{ mulop * First(elsepart) = {else, { factor (exp) | num First(ifst) = {if{ First(addop {- , +} = ( First(st) = {if, other{ First(mulop{*} = ( First(factor) = {(, num{ First(term) = {(, num{ First(exp) = {(, num{ exp 2301373 Chapter 4 Parsing 18
Algorithm for finding First(A) For all terminals a, First(a) = {a{ For all nonterminals A, First(A ( {} =: While there are changes to any First(A ( For each rule A X 1 X 2. . . Xn For each Xi in {X 1, X 2, …, Xn { If for all j<i First(Xj) contains , Then add First(Xi)-{ } to 2301373 Chapter 4 Parsing If A is a terminal or , then First(A) = {A. { If A is a nonterminal, then for each rule A X 1 X 2. . . Xn, First(A) contains First(X 1) - {. { If also for some i<n, First(X 1), First(X 2), . . . , and First(Xi) contain , then First(A) contains First(Xi+1)-{. { If First(X 1), First(X 2), . . . , and First(Xn) contain , then First(A) also contains . 19
Finding First Set: An Example exp term exp’ addop term exp’ | addop - | + term factor term’ mulop factor term’ | mulop * factor ( exp ) | num 2301373 Chapter 4 Parsing First exp’ addop term’ mulop factor - + ( num * ( num 20
Follow Set Let $ denote the end of input tokens If A is the start symbol, then $ is in Follow(A. ( If there is a rule B X A Y, then First(Y) { } is in Follow(A. ( If there is production B X A Y and is in First(Y), then Follow(A) contains Follow(B. ( 2301373 Chapter 4 Parsing 21
Algorithm for Finding Follow(A) Follow(S{$} = ( If A is the start symbol, then $ is FOR each A in V-{S { in Follow(A. ( Follow(A{}=( If there is a rule A WHILE change is made to some Y X Z, then Follow sets First(Z) - { } is in FOR each production A X 1 X 2. . . Follow(X. ( X n, If there is production FOR each nonterminal Xi B X A Y and Add First(Xi+1 Xi+2. . . Xn)-{ } is in First(Y), then into Follow(Xi. ( Follow(A) contains ) NOTE: If i=n, Xi+1 Xi+2. . . Xn= ( Follow(B. ( IF is in First(Xi+1 Xi+2. . . Xn) THEN 2301373 Chapter 4 Parsing 22
Finding Follow Set: An Example exp term exp’ addop term exp’ | exp addop - | + exp’ term factor term’ mulop factor term’ | addop mulop * term factor ( exp ) | num term’ First Follow ( num $) - + $) ( num - + $ ) * mulop * factor ( num 2301373 Chapter 4 Parsing 23
Constructing LL(1) Parsing Tables FOR each nonterminal A and a production A X FOR each token a in First(X( A X is in M(A, a( IF is in First(X) THEN FOR each element a in Follow(A( Add A X to M(A, a( 2301373 Chapter 4 Parsing 24
Example: Constructing LL(1) Parsing Table First Follow exp {(, num{(, $} { ( ) exp’ {+, -, {(, $} { addop {+, -} {(, num{ exp 1 term {(, num{$, (, -, +} { exp’ term’ {*, {$, (, -, +} { 3 mulop {*} {(, num{ factor {(, num{$, (, -, +, *} { addop 1 exp term exp’ 2 exp’ addop term exp’ 3 exp’ 4 addop + 5 addop 6 term factor term’ 7 term’ mulop factor term’ 8 term’ 9 mulop * 10 factor ( exp ( 11 factor num 2301373 term * n $ 1 2 2 4 5 3 6 term’ 6 8 mulop factor + - 8 8 7 8 9 10 Chapter 4 Parsing 11 25
LL(1) Grammar A grammar is an LL(1) grammar if its LL(1) parsing table has at most one production in each table entry. 2301373 Chapter 4 Parsing 26
LL(1) Parsing Table for non-LL(1) Grammar 1 exp addop term 2 exp term 3 term mulop factor 4 term factor 5 factor ( exp ( 6 factor num 7 addop + 8 addop 9 mulop * First(exp) = { (, num{ First(term) = { (, num{ First(factor) = { (, num{ First(addop{ - , + } = ( First(mulop{ * } = ( 2301373 Chapter 4 Parsing 27
Causes of Non-LL(1) Grammar What causes grammar being non-LL(1? ( Left-recursion l Left factor l 2301373 Chapter 4 Parsing 28
Left Recursion Immediate left recursion l l Can be removed very easily A A X | Y A=Y X* A A X 1 | A X 2 |…| A Xn | Y 1 | Y 2 |. . . | Ym l l A Y A’, A’ X A’| A Y 1 A’ | Y 2 A’ |. . . | Ym A’, A’ X 1 A’| X 2 A’|…| Xn A’| A={Y 1, Y 2, …, Ym} {X 1, X 2, …, Xn}* General left recursion l 2301373 A => X =>* A Y Can be removed when there is no empty-string production and no cycle in the grammar Chapter 4 Parsing 29
Removal of Immediate Left Recursion exp + term | exp - term | term * factor | factor ( exp ) | num Remove left recursion exp = term ( term)* exp term exp’ + term exp’ | - term exp’ | term factor term’ term = factor (* factor)* term’ * factor term’ | factor ( exp ) | num 2301373 Chapter 4 Parsing 30
General Left Recursion Bad News! l Can only be removed when there is no emptystring production and no cycle in the grammar. Good News!!!! l 2301373 Never seen in grammars of any programming languages Chapter 4 Parsing 31
Left Factoring Left factor causes non-LL(1( l Given A X Y | X Z. Both A X Y and A X Z can be chosen when A is on top of stack and a token in First(X) is the next token. A XY|XZ can be left-factored as A X A’ and A’ Y | Z 2301373 Chapter 4 Parsing 32
Example of Left Factor if. St if ( exp ) st else st | if ( exp ) st can be left-factored as if. St if ( exp ) st else. Part else st | seq st ; seq | st can be left-factored as seq st seq’ ; seq | 2301373 Chapter 4 Parsing 33
Bottom-up Parsing Use explicit stack to perform a parse Simulate rightmost derivation (R) from left (L) to right, thus called LR parsing More powerful than top-down parsing l Left recursion does not cause problem Two actions Shift: take next input token into the stack l Reduce: replace a string B on top of stack by a nonterminal A, given a production A B l 2301373 Chapter 4 Parsing 34
Example of Shift-reduce Parsing Grammar S’ S S (S)S | Parsing actions Stack Input $(()) $ $(() )$ $(( ))$ ) ) $S ) S )$ ) $S ) S $ $S $ 2301373 Reverse of Action shift reduce reduce accept rightmost derivation from left to right S S S 1 2 3 4 5 (S)S 6 7 8 (S)S 9 10 S’ Chapter 4 Parsing ( ( ) ) ( ( S( ( S ) S( ( S( (S)S S 35
Example of Shift-reduce Parsing Grammar S’ S S (S)S | Parsing actions Stack Input $(()) $ $(() )$ $(( ))$ ) ) $S ) S )$ ) $S ) S $ $S $ Viable prefix 2301373 Action shift reduce reduce accept S S S 1 2 3 4 5 (S)S 6 7 8 (S)S 9 10 S’ Chapter 4 Parsing ( ( ) ) ( ( S( ( S ) S( ( S( (S)S S handle 36
Terminologies Right sentential form l Right sentential form in a rightmost derivation Viable prefix l sequence of symbols on the parsing stack Handle l right sentential form + position where reduction can be performed + production used for reduction LR(0) item l 2301373 production with distinguished position in its RHS l l )S ) S( Viable prefix l l )S ) S, ( S ), ( S) , ) )S ) S, ( ( S ) , ) ) , Handle l l l )S ) S. with S ) )S ) S. ) with S ( S ) S LR(0) item l l l S S S Chapter 4 Parsing ( S ) S. (S). S (S. )S (. S)S. (S)S 37
Shift-reduce parsers There are two possible actions : l shift and reduce Parsing is completed when the input stream is empty and l the stack contains only the start symbol l The grammar must be augmented a new start symbol S’ is added l a production S’ S is added l l 2301373 To make sure that parsing is finished when S’ is on top of stack because S’ never appears on the RHS of any production. Chapter 4 Parsing 38
LR(0) parsing Keep track of what is left to be done in the parsing process by using finite automata of items l An item A w. B y means: A w B y might be used for the reduction in the future, l at the time, we know we already construct w in the parsing process, l if B is constructed next, we get the new item A w. B. Y l 2301373 Chapter 4 Parsing 39
LR(0) items LR(0) item l production with a distinguished position in the RHS Initial Item with the distinguished position on the leftmost of the production Complete Item l Item with the distinguished position on the rightmost of the production Closure Item of x l Item x together with items which can be reached from x via -transition Kernel Item l 2301373 Original item, not including closure items Chapter 4 Parsing 40
Finite automata of items Grammar: S’ . S S’ S S (S)S S 2301373 S’ S. S . (S)S Items: S’ . S S’ S. S . (S)S S (S. )S S (S)S. S . S ) S (. S)S S . S S (S. )S ( S (S). S S Chapter 4 Parsing S (S)S. 41
DFA of LR(0) Items S’ . S S ) S . S S (. S)S ) ) ( ) S (S). S S . (S)S S S (S)S. 2301373 S S (. S)S S . (S)S S . S (S. )S ) S (S). S S’ S. S (S. )S S . (S)S S S’ . S S . (S)S S . S’ S. Chapter 4 Parsing 42
LR(0) parsing algorithm 2301373 Chapter 4 Parsing 43
LR(0) Parsing Table A’ . A A . (A( A . a 0 A a a ) A (. A( A . (A( A . a 3 A ) 2301373 A’ A. 1 A a. 2 A (A(. 4 ( A (A. ( 5 Chapter 4 Parsing 44
Example of LR(0) Parsing Stack ) )0$ ) 3)0$ 3)3)0$a 2 3)3)0$A 4)5 0$A 1 2301373 Input a))$ ))$ )$ )$ shift $ $ Action shift reduce accept Chapter 4 Parsing 45
Non-LR(0)Grammar Conflict l Shift-reduce conflict l l S’ . S S . (S)S S . A state contains a complete item A x. and a shift item A x. By 0 ) S S S Reduce-reduce conflict l A state contains more than one complete items. A grammar is a LR(0) grammar if there is no conflict in the grammar. S (. S)S . (S)S . 2 ) Chapter 4 Parsing S. 1 S (S. )S 3 S ) ( S S S (S). S . (S)S . 4 S S 2301373 S’ (S)S. 5 46
SLR(1) parsing Simple LR with 1 lookahead symbol Examine the next token before deciding to shift or reduce If the next token is the token expected in an item, then it can be shifted into the stack. l If a complete item A x. is constructed and the next token is in Follow(A), then reduction can be done using A x. l Otherwise, error occurs. l Can avoid conflict 2301373 Chapter 4 Parsing 47
SLR(1) parsing algorithm 2301373 Chapter 4 Parsing 48
SLR(1) grammar Conflict l Shift-reduce conflict l l A state contains a shift item A x. Wy such that W is a terminal and a complete item B z. such that W is in Follow(B). Reduce-reduce conflict l A state contains more than one complete item with some common Follow set. A grammar is an SLR (1)grammar if there is no conflict in the grammar. 2301373 Chapter 4 Parsing 49
SLR(1) Parsing Table A (A) | a A’ . A A . (A( A . a 0 A a A’ A. 1 A a. 2 ) a A (. A( A . (A( A . a 3 A A (A(. 4 ( ) 2301373 A (A. ( 5 Chapter 4 Parsing 50
SLR(1) Grammar not LR(0) S’ . S S . (S)S S . 0 ) S S S (. S)S S . (S)S S . 2 ) S S’ S. 1 (S)S | S (S. )S 3 ( ) S (S). S S . (S)S S . 4 S S (S)S. 5 2301373 Chapter 4 Parsing 51
Disambiguating Rules for Parsing Conflict Shift-reduce conflict l Prefer shift over reduce l In case of nested if statements, preferring shift over reduce implies most closely nested rule for dangling else Reduce-reduce conflict l 2301373 Error in design Chapter 4 Parsing 52
Dangling Else S’ . S 0 S . I S . other I . if S else S other S I I . other if S. 5 if S. else S S S’ I S S. I. if other 3 other S if 1 2 I I else if I if. S 4 I if. S else S S . I S . other I . if S else S I if S else. S 6 S . I S . other I . if S else S state if 0 S 4 I . if S else S 7 other $ S 3 1 2 R 1 3 R 2 4 S 4 5 6 S I 1 2 5 2 7 2 ACC S 3 S 6 S 4 7 2301373 else S Chapter 4 Parsing R 3 S 3 R 4 53
- Slides: 53