Compilers 5 BottomUp Parsing ChihHung Wang References 1
Compilers 5. Bottom-Up Parsing Chih-Hung Wang References 1. C. N. Fischer, R. K. Cytron and R. J. Le. Blanc. Crafting a Compiler. Pearson Education Inc. , 2010. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000. 3. 1 Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2 nd Ed. 2006)
Creating a bottom-up parser automatically Left-to-right parse, Rightmost-derivation create a node when all rest_expression children are present expression handle: nodes representing the right-hand side of a production term IDENT 2 aap IDENT + ( noot rest_expr IDENT + mies )
LR(0) Parsing Theoretically important but too weak to be useful. running example: expression grammar input expression EOF expression ‘+’ term | term IDENTIFIER | ‘(’ expression ‘)’ short-hand notation Z E$ E E ‘+’ T | T T i | ‘(’ E ‘)’ 3
LR(0) Parsing keep track of progress inside potential handles when consuming input tokens LR items: N initial set 4 S 0 Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’ Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’
Closure algorithm for LR(0) The important part is the inference rule; it predicts new handle hypotheses from the hypothesis that we are looking for a certain nonterminal, and is sometimes called prediction rule; it corresponds to an move, in that it allows the automation to move to another state without consuming input. 5 Reduce item: an item with the dot at the end Shift item: the others
Transition Diagram S 0 S 2 Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’ T i E T S 1 T i E S 3 Z E $ E E ‘+’ T i ‘+’ S 4 E E ‘+’ T T i T ‘(’ E ‘)’ $ S 5 S 6 6 T Z E$ E E+T
LR(0) parsing example (1) stack S 0 input i+i$ Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’ • shift input token (i) onto the stack • compute new state 7
LR(0) parsing example (2) stack S 0 i S 1 input +i$ Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’ • reduce handle on top of the stack • compute new state 8
LR(0) parsing example (3) stack S 0 T S 2 input +i$ i Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’ reduce handle on top of the stack compute new state 9
LR(0) parsing example (4) stack S 0 E S 3 input +i$ T i 10 Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’ shift input token on top of the stack compute new state
LR(0) parsing example (5) stack S 0 E S 3 + S 4 input i$ T i 11 Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’ shift input token on top of the stack compute new state
LR(0) parsing example (6) stack S 0 E S 3 + S 4 i S 1 input $ T i 12 Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’ reduce handle on top of the stack compute new state
LR(0) parsing example (7) stack S 0 E S 3 + S 4 T S 5 T i 13 input $ i Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’ reduce handle on top of the stack compute new state
LR(0) parsing example (8) stack S 0 E S 3 input $ E + T T i 14 i Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’ shift input token on top of the stack compute new state
LR(0) parsing example (9) stack S 0 E S 3 $ S 6 input E + T T i 15 i Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’ reduce handle on top of the stack compute new state
LR(0) parsing example (10) stack S 0 Z E $ E + T T 16 i input i accept! Z E$ E E ‘+’ T E T T i T ‘(’ E ‘)’
Precomputing the item set (1) Initial item set 17
Precomputing the item set (2) Next item set 18
Complete transition diagram 19
The LR push-down automation Two major moves and a minor move Shift move Remove the first token from the present input and pushes it onto the stack Reduce move N -> are moved from the stack N is then pushed onto the stack Termination The input has been parsed successfully when it has been reduced to the start symbol. 20
GOTO and ACTION tables 21
LR(0) parsing of the input i+i$ 22
Another Example of LR(0) from Fischer (1) 23
Another Example of LR(0) from Fischer (2) 24
Another Example of LR(0) from Fischer (3) 25
Algorithm of LR(0) Construction (1) 26
Algorithm of LR(0) Construction (2) 27
LR(0) Table 28
LR comments The bottom-up parsing, unlike the top-down parsing, has no problems with left-recursion. On the other hand, bottom-up parsing has a slight problem with right-recursion. 29
LR(0) conflicts (1) shift-reduce conflict Exist in a state when table construction cannot use the next k tokens to decide whether to shift the next input token or call for a reduction. array indexing: T i [ E ] T i [E] T i (shift) (reduce) -rule: Rest. Expr Term Rest. Expr 30 (shift) (reduce)
LR(0) conflicts (2) reduce-reduce conflict Exist when table construction cannot use the next k tokens to distinguish between multiple reductions that cannot be applied in the inadequate state. assignment statement: Z V : = E $ V i (reduce) T i (reduce) (Different reduce rules) typical LR(0) table contains many conflicts 31
Handling LR(0) conflicts Use a one-token look-ahead Use a two-dimensional ACTION table different construction of ACTION table SLR(1) – Simple LR LR(1) LALR(1) – Look-Ahead LR 32
SLR(1) parsing A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N. reduce N iff token FOLLOW(N) FOLLOW(Z) = { $ } FOLLOW(E) = { ‘+’, ‘)’, $ } FOLLOW(T) = { ‘+’, ‘)’, $ } 33 FOLLOW(N)
SLR(1) ACTION table shift 34
SLR(1) ACTION/GOTO table s 7 1: Z E $ 2: E T 3: E E ‘+’ T 4: T i 5: T ‘(’ E ‘)’ sn – shift to state n rn – reduce rule n 35
Example of resolving conflicts (1) A new rule T i [E] stack symbol / look-ahead token stat e i 0 s 5 1 + ( ) $ s 3 E T s 1 s 6 s 2 r 1 s 5 s 7 s 4 4 r 3 r 3 5 r 4 r 4 6 r 2 r 2 7 36 ] s 7 2 3 [ s 5 s 7 s 8 8 s 3 s 9 9 r 5 r 5 s 6 1: Z E $ 2: E T 3: E E ‘+’ T 4: T i 5: T ‘(’ E ‘)’ 6: T i ‘[‘ E ‘]’
Example of resolving conflicts (2) stack symbol / look-ahead token stat e i 0 s 5 1 + ( ) [ ] s 7 s 3 37 T s 1 s 6 r 1 s 5 s 7 s 4 4 r 3 5 r 4 6 r 2 7 E s 2 2 3 $ s 5 s 10 r 3 r 4 r 2 s 7 s 5 s 8 8 s 3 s 9 9 r 5 r 5 1: Z E $ 2: E T 3: E E ‘+’ T 4: T i 5: T ‘(’ E ‘)’ 6: T i ‘[‘ E ‘]’ s 6 T i. [E]
Another Example of LR(0) Conflicts(1) 38
Another Example of LR(0) Conflicts(2) 39
Another Example of LR(0) Conflicts(3) num plus num times num $ 40
Another Example of LR(0) Conflicts(4) Follow(E)= {plus, $} 41
Unfortunately … SLR(1) leaves many shift-reduce conflicts unsolved problem: FOLLOW(N) set is a union of all look-aheads of all alternatives of N in all states example S A|xb A a. Ab | B B x 42 Follow (S)={$} Follow(A) = {b, $} Follow(B) = {b, $}
SLR(1) automation 43
Another Example of SLR Problem Follow(A)={b, c, $} 44
Make the Grammar SLR(1) Follow(A 1)={b, $} 45
LR(1) parsing The LR(1) technique does not rely on FOLLOW sets, but rather keeps the specific look-ahead with each item LR(1) item: N { } - closure for LR(1) item sets: if set S contains an item P N { } then for each production rule N S must contain the item N { } where = FIRST( { } ) 46
Creating look-ahead sets Extended definition of FIRST stes If FIRST( ) does not contain , FIRST( { }) is just equal to FIRST( ); if can produce , FIRST( { }) contain all the tokens in FIRST( ), excluding , plus the tokens in . 47
LR(1) automation 48
Another Example of LR(1) Construction (1) 49
Another Example of LR(1) Construction (2) 50
Another Example of LR(1) Construction (3) 51
Another Example of LR(1) Construction (4) 52
Another Example of LR(1) Construction (5) 53
LR(1) parsing comments LR(1) automation is more discriminating than the SLR(1). In fact, it is so strong that any language that can be parsed from left to right with a onetoken look-ahead in linear time can be parsed using the LR(1). LR tables are big Combine “equal” sets by merging look-ahead sets: LALR(1). 54
LALR(1) S 3 and S 10 are similar in that they are equal if one ignores the look-ahead sets, and so are S 4 and S 9, S 6 and S 11, and S 8 and S 12. 55
LALR(1) automation 56
Practice Derive the LALR(1) ACTION/GOTO table for the grammar in Fig. 2. 95 57
Making a grammar LR(1) – or not Although the chances for a grammar to be LR(1) are much larger than those being SLR(1) or LL(1), one often encounters a grammar that still is not LR(1). The reason is generally that the grammar is ambiguous. For Example if_statement -> ‘if’ ‘(’ expression ‘)’ statement | ‘if’ ‘(’expression ‘)’ statement ‘else’ statement -> … | if_statement |… The statement: if (x>0) if (y>0) p=0; else q=0; 58
Possible syntax trees (1) 59
Possible syntax trees (2) 60
Other Examples of Ambiguous Grammar (1) 61
Other Examples of Ambiguous Grammar (2) 62
Resolving shift-reduce conflicts (1) The longest possible sequence of grammar symbols is taken for reduction. In a shift-reduce conflict do shift. Another example + input: i * i + i * E E ‘+’ E E E ‘*’ E 63 E E reduce * E E + E shift E
Resolving shift-reduce conflicts (2) The use of precedences between tokens Example: a shift-reduce conflict on t: P -> t {…} (shift item) Q -> u. R {…t…} (reduce item) where R is either empty or one non-terminal. If the look-ahead is t, we perform one of the following three actions: If symbol u has a higher precedence than symbol t, we reduce If t has a higher precedence than symbol u, we shift. If both have equal precedence, we also shift 64
Bottom-up parser: yacc/bison The most widely used parser generator is yacc Yacc is an LALR(1) parser generator A yacc look-alike called bison, provided by GNU 65
A very high-level view of text analysis techniques 66
Yacc code example (constructing parser tree) 67
Yacc code example (auxiliary code) 68
- Slides: 68