Compiler Structures 242 437 Semester 2 2019 2020

  • Slides: 58
Download presentation
Compiler Structures 242 -437 , Semester 2 , 2019 -2020 6. Bottom-up (LR) Parsing

Compiler Structures 242 -437 , Semester 2 , 2019 -2020 6. Bottom-up (LR) Parsing • Objectives �describe bottom-up (LR) parsing using shift- reduce and parse tables �explain how LR parse tables are generated 1

Overview. 1 What is a LR Parser? . 2 Bottom-up using Shift-Reduce. 3 Building

Overview. 1 What is a LR Parser? . 2 Bottom-up using Shift-Reduce. 3 Building a LR Parser. 4 Generating the Parse Table. 5 LR Conflicts. 6 LL, SLR, LALR Grammars 2

Source Program In this lecture but concentrating on bottom-up parsing Lexical Analyzer Syntax Analyzer

Source Program In this lecture but concentrating on bottom-up parsing Lexical Analyzer Syntax Analyzer Semantic Analyzer Front End Int. Code Generator Intermediate Code Optimizer Target Code Generator Back End Target Lang. Prog. 3

1. What is a LR Parser? • A LR parser reads its input tokens

1. What is a LR Parser? • A LR parser reads its input tokens from Left-toright and produces a Rightmost derivation. • The parse tree is built bottom-up, starting from the leaves and working upwards to the start symbol. 4

LR in Action Grammar: S a. ABe A Abc|b B d These match production’s

LR in Action Grammar: S a. ABe A Abc|b B d These match production’s right-hand sides parse "a b b c d e" The tree corresponds to a rightmost derivation: S a. ABe a. Ade a. Abcde abbcde Reducing a sentence: abbcde a. Ade a. ABe S S A A A B A B a b b c d e 5

LR(k) Parsing • The k is to the number of input tokens that are

LR(k) Parsing • The k is to the number of input tokens that are looked at when deciding which production to use. �e. g. LR(0), LR(1) • We'll be using a variation of LR(0) parsing in this chapter. 6

LR versus LL • LR can deal with more complex (powerful) grammars than LL

LR versus LL • LR can deal with more complex (powerful) grammars than LL (top-down parsers). • LR can detect errors quicker than LL. • LR parsers can be implemented very efficiently, but they're difficult to build by hand (unlike LL parsers). 7

2. Bottom-up using Shift-Reduce • The usual way of implementing bottom-up parsing is by

2. Bottom-up using Shift-Reduce • The usual way of implementing bottom-up parsing is by using shift-reduce: �‘shift’ means read in a new input token, and push it onto a stack �‘reduce’ means to group several symbols into a single non-terminal • • by choosing a production to use 'backwards' the symbols are popped off the stack, and the production's non-terminal is pushed onto it 8

Shift-Reduce Parsing Stack Input Action $ a b b c d e $ Shift

Shift-Reduce Parsing Stack Input Action $ a b b c d e $ Shift $a bbcde$ Shift $ab bcde$ Reduce A => b $a. A bcde$ Shift $a. Abc de$ Reduce A => A b c $a. A de$ Shift $a. Ad e$ Reduce B => d $a. AB e$ Shift $a. ABe $ $ $ Reduce S => a A B e A => A b c | b B => d

3. Building a LR Parser • The standard way of writing a shift-reduce LR

3. Building a LR Parser • The standard way of writing a shift-reduce LR parser is to generate a parse table for the grammar, and 'plug' that into a standard LR compiler framework. • The table has two main parts: actions and gotos. 10

3. 1. Inside an LR Parser input tokens a 1 a 2 … ai

3. 1. Inside an LR Parser input tokens a 1 a 2 … ai … an $ push; pop X m sm stack Xm-1 sm-1 … Xo s 0 X is terminals or non-terminals, S = state LR Parser actions output (parse tree) Parse table gotos (you create this bit) possible actions are shift, reduce, accept, error gotos involve state changes 11

Parse Table for the Example State a 0 b c d e $ s

Parse Table for the Example State a 0 b c d e $ s 1 1 s 3 2 s 5 s 6 3 r 3 2 4 4 1: S => a A B e 2: A => A b c 3: A => b 4: B => d s 7 5 s 8 6 r 4 r 2 Action part Goto part s means shift to to that state r means reduce by that numbered production 7 8 S AB acc r 2

3. 2. Table Algorithm push(<$, 0>); /* push <symbol, state> pair */ curr. Token

3. 2. Table Algorithm push(<$, 0>); /* push <symbol, state> pair */ curr. Token = scanner(); while(1) { <x, state> = pair on top of stack; if (action[state, curr. Token ] == <shift new. State>) { push(<curr. Token , new. State>); curr. Token = scanner(); } : : 4 branches for the four possible actions that can be in a table cell continued 13

else if (action[state, curr. Token ] == <reduce rule. Num> ) { A -->

else if (action[state, curr. Token ] == <reduce rule. Num> ) { A --> is rule number rule. Num; body. Size = num. Elements( ); pop body. Size pairs off stack; state’ = state part of pair on top of stack; push( <A, goto[state’, A] > ); } : : continued 14

else if (action[state, curr. Token ] = accept) { S --> is the start

else if (action[state, curr. Token ] = accept) { S --> is the start symbol production; body. Size = num. Elements( ); pop body. Size pairs off stack; state’ = state part of pair on top of stack; if (state’ == 0) break; // success; can now stop else error(); } // of while loop 15

3. 3. Table Parsing Example Stack Input Action $0 a b b c d

3. 3. Table Parsing Example Stack Input Action $0 a b b c d e $ Shift 1 $0, a 1 bbcde$ Shift 3 $0, a 1, b 3 bcde$ Reduce A => b $0, a 1, A 2 bcde$ Shift 5 $0, a 1, A 2, b 5 cde$ Shift 8 $0, a 1, A 2, b 5, c 8 d e $ Reduce A => A b c $0, a 1, A 2 de$ Shift 6 $0, a 1, A 2, d 6 e$ Reduce B => d $0, a 1, A 2, B 4 e$ Shift 7 $0, a 1, A 2, B 6, e 7 $ $0 $ Accept S => a A B e A => A b c | b B => d pop 1 pair state' == 1 push(A, goto(1, A)) = push(A, 2) pop 3 pairs state' == 1 push(A, goto(1, A)) = push(A, 2)

3. 4. The LR Parse Stack • The parse stack holds the branches of

3. 4. The LR Parse Stack • The parse stack holds the branches of the tree being built bottom-up. • For example, �the stack $0, a 1, A 2, b 5, c 8 represents: A a b b c continued 17

The next stack: $0, a 1, A 2 A A a b b Later,

The next stack: $0, a 1, A 2 A A a b b Later, $0, a 1, A 2, B 6, e 7 c A A a b b B c d e continued 18

4. Generating the Parse Table • The example parse table was generated using the

4. Generating the Parse Table • The example parse table was generated using the SLR (simple LR) algorithm �an extension of LR(0) which uses the grammar's FOLLOW() sets • The other LR algorithms can be used to make a parse table: �e. g. LR(1), LALR(1( 19

Supporting Techniques • SLR table generation makes use of three techniques: �LR(0) items �the

Supporting Techniques • SLR table generation makes use of three techniques: �LR(0) items �the closure() function �the goto() function • I'll explain each one first, before the table generation algorithm. 20

4. 1. LR(0) Items • An LR(0) item is a grammar production with a

4. 1. LR(0) Items • An LR(0) item is a grammar production with a • at some position of the right-hand side. • So, a production A XYZ has four items: A • XYZ A X • YZ A XY • Z A XYZ • • Production A has one item A • 21

4. 2. The closure() Function • The closure() function generates a set of LR(0)

4. 2. The closure() Function • The closure() function generates a set of LR(0) items. • Assume that the grammar only has one production for the start symbol S, S => • The initial closure set is: closure( { S => • ( { continued 22

 • If A • B is in the set, then for each production

• If A • B is in the set, then for each production B , add the item B • to the set, if it's not already there. • Repeat until no new items can be added to the set. 23

Example use of closure() Grammar: S --> E E E+T|T T T*F|F F (E)

Example use of closure() Grammar: S --> E E E+T|T T T*F|F F (E) F id closure({ S • E }) = {S • E} Add E • {S • E E • E+T E • T } {S • E E • E+T E • T T • T*F T • F } Add T • Add F • {S • E E • E+T E • T T • T*F T • F F • (E) F • id } 24

4. 3. The goto() Function In X In+1 • goto(In, X) takes as input

4. 3. The goto() Function In X In+1 • goto(In, X) takes as input an existing closure set In, and a terminal/non-terminal symbol X. • The output is a new closure set In+1: for each item A • X in In, add closure({ A X • }) to In+1 � repeat until no more items can be added to In+1 � 25

goto() Example 1 • Grammar: S => A B A => a B =>

goto() Example 1 • Grammar: S => A B A => a B => b // rule 1, for start symbol • Initial state I 0 = closure( { S => • A B( { } =S => • A B A => • a { continued 26

 • goto( I 0, A= ( =closure( { S => A • B(

• goto( I 0, A= ( =closure( { S => A • B( { } =S => A • B, B => • b} // call it I 1 • goto( I 0, a= ( =closure( { A => a • ( { } =A => a • // { call it I 2 A I 0 I 1 a I 2 continued 27

A I 0 I 1 a • goto( I 1, B= ( =closure( {

A I 0 I 1 a • goto( I 1, B= ( =closure( { S => A B • ( { } =S => A B • // { call it I 3 �this is the end of the S production I 2 b B I 3 end state I 4 • goto( I 1, b= ( =closure( { B => b • ( { } =B => b • // { call it I 4 28

goto() Example 2 • Grammar: S => a A B e A => A

goto() Example 2 • Grammar: S => a A B e A => A b c | b B => d // rule 1, for start symbol • Initial state I 0 = closure( { S => • a A B e( { } =S => • a A B e { continued 29

 • goto( I 0, a= ( =closure( { S => a • A

• goto( I 0, a= ( =closure( { S => a • A B e( { } =S => a • A B e A => • A b c A => • b} // call it I 1 I 0 a I 1 continued 30

 • goto( I 1, A= ( =closure( { S => a A •

• goto( I 1, A= ( =closure( { S => a A • B e A => A • b c( { } =S => a A • B e A => A • b c B => • d } // call it I 2 • goto( I 1, b= ( =closure( { A => b • ( { } =A => b • // { call it I 3 I 0 a I 1 A b I 2 I 3 continued 31

I 0 • goto( I 2, B= ( =closure( { S => a A

I 0 • goto( I 2, B= ( =closure( { S => a A B • e( { } =S => a A B • e } // call it I 4 • Others �I 5: { A => A b • c{ �I 6: { B => d • { �I 7: { S => a A B e • // { end of start symbol rule �I 8: { A => A b c • { B a I 1 A b I 2 I 3 b d I 5 I 4 e I 6 c I 7 I 8 32

4. 4. Using goto() to make a Table • The columns of the table

4. 4. Using goto() to make a Table • The columns of the table should be the grammar's terminals, $, and nonterminals. • The rows should be the I 0, I 1, …, In numbers 0, 1, …, n. • what we've been calling states 33

Stage 1 • In stage 1, we add the shift, goto, and accept entries

Stage 1 • In stage 1, we add the shift, goto, and accept entries to the table. • action[i, a] gets <shift j> if goto(Ii, a) = Ij • goto[ i, A ] gets j if goto( Ii, A) == Ij continued 34

 • action[i, $] get accept if S => • in Ii (there must

• action[i, $] get accept if S => • in Ii (there must be only one S rule) 35

S --> A B A --> a B --> b Example Grammar 1 A

S --> A B A --> a B --> b Example Grammar 1 A I 0 I 1 a I 2 B I 3 b I 4 a b $ 0 s 2 1 s 4 2 3 acc 4 action[] S A B 1 3 goto[] 36

Stage 2 • In stage 2 , we add the reduce and error entries

Stage 2 • In stage 2 , we add the reduce and error entries to the table. • action[i, a] gets <reduce rule. Num> if [A => • ] in Ii and A is not S and a is in FOLLOW(A) and A => is rule number rule. Num continued 37

 • After filling the table cells with shift, goto, accept, and reduce actions,

• After filling the table cells with shift, goto, accept, and reduce actions, any remaining empty cells will trigger an error() call. 38

Finishing the Example Table • The reduce states are the state boxes at the

Finishing the Example Table • The reduce states are the state boxes at the leaves of the closure graph. �but exclude the end state A B I 0 I 1 I 3 a I 2 b I 4 • For the example 1 grammar, there are two boxes at the leaves: I 2 and I 4. 39

I 2 Reduction S --> A B A --> a B --> b •

I 2 Reduction S --> A B A --> a B --> b • I 2 = { A => a • { �A => a is rule number 2 �FOLLOW(A) == FIRST(B) = { b{ • So action[ 2, b ] gets <reduce 2< 40

I 4 Reduction S --> A B A --> a B --> b •

I 4 Reduction S --> A B A --> a B --> b • I 4 = { B => b • { �B => b is rule number 3 �FOLLOW(B{ $ } = ( • So action[ 4, $ ] gets <reduce 3< 41

Adding Reduce Entries A I 0 I 1 a I 2 B S -->

Adding Reduce Entries A I 0 I 1 a I 2 B S --> A B A --> a B --> b I 3 b I 4 a b $ 0 s 2 1 s 4 2 r 2 3 acc r 3 4 action[[ S A B 1 3 goto[] 42

Using the Example 1 Table Stack Input Action $0 ab$ Shift 2 $0, a

Using the Example 1 Table Stack Input Action $0 ab$ Shift 2 $0, a 2 b$ Reduce 2 (A --> a) $0, A 1 b$ Shift 4 $0, A 1, b 4 $ Reduce 3 (B --> b) $0, A 1, B 3 $ Accept (S --> A B) $0 $ pop 1 pair; state' = 1; push(B, goto(1, B(( == push(B, 3; ( S --> A B A --> a B --> b pop 1 pair; state' = 0; push(A, goto(0, A(( == push(A, 1; (

4. 5. Example Grammar 2 S --> a A B e A --> A

4. 5. Example Grammar 2 S --> a A B e A --> A b c | b B --> d Stage 1 I 0 B a I 1 A b I 2 I 3 b d I 5 I 4 e c I 7 I 8 I 6 a b 0 s 1 1 s 3 s 5 2 3 4 5 6 7 c d e $ S A B 2 s 6 4 s 7 s 8 acc 8 action[] goto[]44

Reduce States • For the example 2 grammar, there are three boxes at the

Reduce States • For the example 2 grammar, there are three boxes at the leaves: I 3, I 6, and I 8. 45

I 3 Reduction • I 3 = { A => b • { �A

I 3 Reduction • I 3 = { A => b • { �A => b is rule number 3 �FOLLOW(A) = {b} FIRST(B( �} = b, d{ S --> a A B e A --> A b c A --> b B --> d • So action[ 3, b ] and action[ 3, d ] gets <reduce 3< 46

I 6 Reduction • I 6 = { B => d • { �B

I 6 Reduction • I 6 = { B => d • { �B => d is rule number 4 �FOLLOW(B) = {e{ S --> a A B e A --> A b c A --> b B --> d • So action[ 6, e ] gets <reduce 4< 47

I 8 Reduction • I 8 = { A => A b c •

I 8 Reduction • I 8 = { A => A b c • { �A => A b c is rule number 2 �FOLLOW(A) = {b, d{ S --> a A B e A --> A b c A --> b B --> d • So action[ 8, b ] and action[ 8, d ] gets <reduce 2< 48

S --> a A B e A --> A b c | b B

S --> a A B e A --> A b c | b B --> d Adding Reduce Entries I 0 B a I 1 A b I 2 I 3 b d I 5 I 4 e c I 7 I 8 I 6 a b 0 s 1 1 s 3 s 5 2 3 r 3 4 5 6 7 r 2 8 c d e $ S A B 2 s 6 r 3 4 s 7 s 8 r 4 acc r 2 action[] goto[]49

5. LR Conflicts • A LR conflict occurs when a cell in the action

5. LR Conflicts • A LR conflict occurs when a cell in the action part of the parse table contains more than one action. • There are two kinds of conflict: �shift/reduce and reduce/reduce • Conflicts appear because of: �grammar ambiguity �limitations of the SLR parsing method (even when the grammar is unambiguous) 50

5. 1. Shift/Reduce • A shift/reduce conflict occurs when the parser cannot decide whether

5. 1. Shift/Reduce • A shift/reduce conflict occurs when the parser cannot decide whether to shift the next symbol or reduce with a production �typically, the default action is to shift 51

Dangling Else Example • Grammar rule: If. Stmt => if Expr then Stmt |

Dangling Else Example • Grammar rule: If. Stmt => if Expr then Stmt | if Expr then Stmt else Stmt • Example: if (a == 1) then if (b == 4) then x = 2; else. . . <-- this goes with which 'if' ? 52

On the Stack Input Action $… …$ … $…if Expr then Stmt else…$ shift

On the Stack Input Action $… …$ … $…if Expr then Stmt else…$ shift or reduce? Choose shift, so else matches closest if 53

5. 2. Reduce/Reduce • A reduce/reduce conflict occurs when the parser cannot decide which

5. 2. Reduce/Reduce • A reduce/reduce conflict occurs when the parser cannot decide which production to use to make a reduction. • Typically, the first suitable production is used. 54

Example Grammar: C AB A a B a Stack $ $a Input aa$ a$

Example Grammar: C AB A a B a Stack $ $a Input aa$ a$ Action shift reduce A a or B a ? Choose A a, since it's the first suitable one. 55

6. LL, SLR, LALR Grammars the ovals represent the complexity of the grammars that

6. LL, SLR, LALR Grammars the ovals represent the complexity of the grammars that the notation can handle LR(1) LALR(1) LL(1) SLR we've been using SLR in this chapter LR(0) LL(1) was used in chapter 5 on top-down parsing 56

LR(1) Grammars • LR(1) parsing uses one token lookahead to avoid conflicts in the

LR(1) Grammars • LR(1) parsing uses one token lookahead to avoid conflicts in the parsing table. • It can deal with more complex/powerful grammars than LR(0) or SLR. • A LR(1) grammar takes longer to convert into a parse table. 57

LALR(1) Grammars • LALR(1) parsing (Look-Ahead LR) combines LR(1) states to reduce the size

LALR(1) Grammars • LALR(1) parsing (Look-Ahead LR) combines LR(1) states to reduce the size of the parse table. • LALR(1) is less powerful than LR(1) � it may introduce reduce-reduce conflicts, but that's not likely for programming language grammars • LALR(1) is used by the YACC parsing tool � see next chapter 58