Winter 2006 2007 Compiler Construction T 3 Syntax

  • Slides: 27
Download presentation
Winter 2006 -2007 Compiler Construction T 3 – Syntax Analysis (Parsing, part 1 of

Winter 2006 -2007 Compiler Construction T 3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

Today ic Lexical Analysis Syntax Analysis AST Parsing IC Symbol Table etc. Inter. Rep.

Today ic Lexical Analysis Syntax Analysis AST Parsing IC Symbol Table etc. Inter. Rep. (IR) Code Generation Executable Language n Today: n Review n n n Grammars, parse trees, ambiguity Top-down parsing Bottom-up parsing exe code n Next week: n n n Conflict resolution Shift/Reduce parsing via Java. Cup (Error handling) AST intro. PA 2 3

Goals of parsing n Programming language has syntactic rules n n Decide whether program

Goals of parsing n Programming language has syntactic rules n n Decide whether program satisfies syntactic structure n n Context-Free Grammars Error detection Error recovery Simplification: rules on tokens Build Abstract Syntax Tree 4

From text to abstract syntax 5 + (7 * x) program text Lexical Analyzer

From text to abstract syntax 5 + (7 * x) program text Lexical Analyzer token stream num Grammar: E id E num E E+E E E*E E (E) Abstract syntax tree + ( num * id parse tree Parser E valid syntax error + num * 7 ) x E + E num ( E ) E * E num id 5

Terminology Symbols: terminals (tokens) + * ( ) id num non-terminals E Grammar rules:

Terminology Symbols: terminals (tokens) + * ( ) id num non-terminals E Grammar rules: E id E num E E+E E E*E E (E) Derivation: E E+E 1+E+E 1+2*3 Parse tree: E E 1 + E E * 2 E 3 6

Ambiguity Grammar rules: E id E num E E+E E E*E E (E) Rightmost

Ambiguity Grammar rules: E id E num E E+E E E*E E (E) Rightmost derivation Leftmost derivation Derivation: E E+E 1+E+E 1+2*3 Parse tree: E E 1 + E E * 2 E 3 Derivation: E E*3 E+E*3 E+2*3 1+2*3 Parse tree: E E 1 E * + E E 3 2 7

Grammar rewriting Non-ambiguous grammar: E E+T E T T T*F T F F id

Grammar rewriting Non-ambiguous grammar: E E+T E T T T*F T F F id F (E) Ambiguous grammar: E id E num E E+E E E*E E (E) Derivation: E E+T 1+T*F 1+F*F 1+2*3 Parse tree: E E T F 1 + T T * F F 3 2 8

Parsing methods n Top-down / predictive / recursive descent without backtracking : LL(1) n

Parsing methods n Top-down / predictive / recursive descent without backtracking : LL(1) n n n “L” – left-to-right scan of input “L” – leftmost derivation “ 1” – predict based on one token look-ahead n n For every non-terminal and token predict the next production Bottom-up : LR(0), SLR(1), LALR(1) n n “L” – left-to-right scan of input “R” – rightmost derivation (in the reversed order) n For every potential right hand side and token decide when a production is found 9

Top-down parsing n n Builds parse tree in preorder LL(1) example Grammar: S if

Top-down parsing n n Builds parse tree in preorder LL(1) example Grammar: S if E then S else S S begin S L S print E L end L ; SL E num if 5 then print 8 else… Token : rule if : S if E then S else S 5 : E num if 5 then S else S print : print E if 5 then print E else S … 10

Problem: left recursion n Left recursion: E E + T n n Symbol on

Problem: left recursion n Left recursion: E E + T n n Symbol on left also first symbol on right Predictive parsing fails when two rules can start with same token E E+T E T n n Rewrite grammar using left-factoring Nullable, FIRST, FOLLOW sets Arithmetic expressions: E E+T E T T T*F T F F id F (E) Left factored grammar: p E T E F id p p E +TE F (E) p E p T FT p p T *FT p T 11

More left recursion n Non-terminal with two rules starting with same prefix Grammar: S

More left recursion n Non-terminal with two rules starting with same prefix Grammar: S if E then S else S S if E then S Left factored grammar: S if E then S X X X else S 12

Bottom-up parsing n n No problem with left recursion Widely used in practice LR(0),

Bottom-up parsing n n No problem with left recursion Widely used in practice LR(0), SLR(1), LALR(1) Java. Cup implements LALR(1) 13

Bottom-up parsing 1 + (2) + (3) E E + (E) E i E

Bottom-up parsing 1 + (2) + (3) E E + (E) E i E + (2) + (3) E + (E) + (3) E E + (E) E E E 1 E + ( 2 E ) + ( 3 ) 14

Shift-reduce parsing n n n Parser stack: symbols (terminal and nonterminals) + automaton states

Shift-reduce parsing n n n Parser stack: symbols (terminal and nonterminals) + automaton states Parsing actions: sequence of shift and reduce operations Action determined by top of stack and k input tokens Shift: move next token to top of stack Reduce: for rule X A B C pop C, B, A then push X Convention: $ stands for end of file 15

Pushdown automaton input u t w $ V parser-table control $ stack 16

Pushdown automaton input u t w $ V parser-table control $ stack 16

LR parsing table state non-terminals 0 rk 1. . . gm goto part shift/reduce

LR parsing table state non-terminals 0 rk 1. . . gm goto part shift/reduce actions sn Shift and move to state n Reduce by rule k Goto state m 17

S E$ E T E E+T T i T ( E ) Parsing table

S E$ E T E E+T T i T ( E ) Parsing table example STATE SYMBOL ( ) i + 0 s 5 err s 7 1 err s 3 err 2 3 $ E T err 1 6 err s 2 err accept s 5 err s 7 4 reduce E E+T 5 reduce T i 6 reduce E T 7 s 5 err s 7 err 8 err s 3 err s 9 err 9 4 8 6 reduce T (E) 18

Items indicate the position inside a rule: LR(0) items are of the form A

Items indicate the position inside a rule: LR(0) items are of the form A t (LR(1) items are of the form A t, ) Grammar: S E$ E T E E+T T i T ( E ) 1: S E$ 2: S E $ 3: S E $ 4: E T 5: E T 6: E E + T 7: E E + T 8: E E + T 9: E E + T 10: T i 11: T i 12: T (E) 13: T ( E) 14: T (E ) 15: T (E) 19

Automaton states 6 5: E T 0 1: S E$ 4: E T 6:

Automaton states 6 5: E T 0 1: S E$ 4: E T 6: E E + T 10: T i 12: T (E) T i 5 11: T i i ( 9 15: T (E) ) 1 E $ 2 2: S E $ + ( 7 13: T ( E) 4: E T 6: E E + T 10: T i 12: T (E) 8 14: T (E ) 7: E E + T 2: S E $ 7: E E + T 3 7: E E + T 10: T i 12: T (E) E + T 4 8: E E + T 20

Identifying handles n Create a finite state automaton over grammar symbols n n Use

Identifying handles n Create a finite state automaton over grammar symbols n n Use automaton to build parser tables n n n shift For items A t on token t reduce For items A on every token Any grammar has n n Sets of LR(0) items Transition diagram GOTO table Not every grammar has deterministic action table When no conflicts occur use a DPDA which pushes states on the stack 21

Non-LR(0) grammars n When conflicts occur the grammar is not LR(0) n n n

Non-LR(0) grammars n When conflicts occur the grammar is not LR(0) n n n Known cases n n n Parsing table contains non-determinism shift-reduce conflicts reduce-reduce conflicts shift-shift conflicts? Operator precedence Operator associativity Dangling if-then-else Unary minus Solutions n n Develop equivalent non-ambiguous grammar Patch parsing table to shift/reduce Precedence and associativity of tokens Stronger parser algorithm: SLR/LR(1)/LALR(1) 22

Precedence and associativity n Precedence n n E E+E *E E E+E Reduce +

Precedence and associativity n Precedence n n E E+E *E E E+E Reduce + precedes * Shift * precedes + 23

Precedence and associativity n Precedence n n E E E+E *E E E+E Reduce

Precedence and associativity n Precedence n n E E E+E *E E E+E Reduce + precedes * Shift * precedes + E E 1 E + E 2 * 3 =9 E E E 1 E + 2 E * 3 =7 24

Precedence and associativity n Precedence n n n E E+E *E E E+E Reduce

Precedence and associativity n Precedence n n n E E+E *E E E+E Reduce + precedes * Shift * precedes + Associativity n n E E+E +E E E+E Shift + right-associative Reduce + left-associative 25

Dangling else/if-else ambiguity Grammar: S if E then S else S S if E

Dangling else/if-else ambiguity Grammar: S if E then S else S S if E then S S other if a then if b then e 1 else e 2 which interpretation should we use? (1) if a then { if b then e 1 else e 2 } -- standard interpretation (2) if a then { if b then e 1 } else e 2 shift/reduce conflict LR(1) items: S if E then S else S token: else (any) 26

See you next week 27

See you next week 27

Grammar hierarchy Non-ambiguous CFG LALR(1) LL(1) SLR(1) LR(0) 28

Grammar hierarchy Non-ambiguous CFG LALR(1) LL(1) SLR(1) LR(0) 28