Winter 2006 2007 Compiler Construction T 3 Syntax
- Slides: 27
Winter 2006 -2007 Compiler Construction T 3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University
Today ic Lexical Analysis Syntax Analysis AST Parsing IC Symbol Table etc. Inter. Rep. (IR) Code Generation Executable Language n Today: n Review n n n Grammars, parse trees, ambiguity Top-down parsing Bottom-up parsing exe code n Next week: n n n Conflict resolution Shift/Reduce parsing via Java. Cup (Error handling) AST intro. PA 2 3
Goals of parsing n Programming language has syntactic rules n n Decide whether program satisfies syntactic structure n n Context-Free Grammars Error detection Error recovery Simplification: rules on tokens Build Abstract Syntax Tree 4
From text to abstract syntax 5 + (7 * x) program text Lexical Analyzer token stream num Grammar: E id E num E E+E E E*E E (E) Abstract syntax tree + ( num * id parse tree Parser E valid syntax error + num * 7 ) x E + E num ( E ) E * E num id 5
Terminology Symbols: terminals (tokens) + * ( ) id num non-terminals E Grammar rules: E id E num E E+E E E*E E (E) Derivation: E E+E 1+E+E 1+2*3 Parse tree: E E 1 + E E * 2 E 3 6
Ambiguity Grammar rules: E id E num E E+E E E*E E (E) Rightmost derivation Leftmost derivation Derivation: E E+E 1+E+E 1+2*3 Parse tree: E E 1 + E E * 2 E 3 Derivation: E E*3 E+E*3 E+2*3 1+2*3 Parse tree: E E 1 E * + E E 3 2 7
Grammar rewriting Non-ambiguous grammar: E E+T E T T T*F T F F id F (E) Ambiguous grammar: E id E num E E+E E E*E E (E) Derivation: E E+T 1+T*F 1+F*F 1+2*3 Parse tree: E E T F 1 + T T * F F 3 2 8
Parsing methods n Top-down / predictive / recursive descent without backtracking : LL(1) n n n “L” – left-to-right scan of input “L” – leftmost derivation “ 1” – predict based on one token look-ahead n n For every non-terminal and token predict the next production Bottom-up : LR(0), SLR(1), LALR(1) n n “L” – left-to-right scan of input “R” – rightmost derivation (in the reversed order) n For every potential right hand side and token decide when a production is found 9
Top-down parsing n n Builds parse tree in preorder LL(1) example Grammar: S if E then S else S S begin S L S print E L end L ; SL E num if 5 then print 8 else… Token : rule if : S if E then S else S 5 : E num if 5 then S else S print : print E if 5 then print E else S … 10
Problem: left recursion n Left recursion: E E + T n n Symbol on left also first symbol on right Predictive parsing fails when two rules can start with same token E E+T E T n n Rewrite grammar using left-factoring Nullable, FIRST, FOLLOW sets Arithmetic expressions: E E+T E T T T*F T F F id F (E) Left factored grammar: p E T E F id p p E +TE F (E) p E p T FT p p T *FT p T 11
More left recursion n Non-terminal with two rules starting with same prefix Grammar: S if E then S else S S if E then S Left factored grammar: S if E then S X X X else S 12
Bottom-up parsing n n No problem with left recursion Widely used in practice LR(0), SLR(1), LALR(1) Java. Cup implements LALR(1) 13
Bottom-up parsing 1 + (2) + (3) E E + (E) E i E + (2) + (3) E + (E) + (3) E E + (E) E E E 1 E + ( 2 E ) + ( 3 ) 14
Shift-reduce parsing n n n Parser stack: symbols (terminal and nonterminals) + automaton states Parsing actions: sequence of shift and reduce operations Action determined by top of stack and k input tokens Shift: move next token to top of stack Reduce: for rule X A B C pop C, B, A then push X Convention: $ stands for end of file 15
Pushdown automaton input u t w $ V parser-table control $ stack 16
LR parsing table state non-terminals 0 rk 1. . . gm goto part shift/reduce actions sn Shift and move to state n Reduce by rule k Goto state m 17
S E$ E T E E+T T i T ( E ) Parsing table example STATE SYMBOL ( ) i + 0 s 5 err s 7 1 err s 3 err 2 3 $ E T err 1 6 err s 2 err accept s 5 err s 7 4 reduce E E+T 5 reduce T i 6 reduce E T 7 s 5 err s 7 err 8 err s 3 err s 9 err 9 4 8 6 reduce T (E) 18
Items indicate the position inside a rule: LR(0) items are of the form A t (LR(1) items are of the form A t, ) Grammar: S E$ E T E E+T T i T ( E ) 1: S E$ 2: S E $ 3: S E $ 4: E T 5: E T 6: E E + T 7: E E + T 8: E E + T 9: E E + T 10: T i 11: T i 12: T (E) 13: T ( E) 14: T (E ) 15: T (E) 19
Automaton states 6 5: E T 0 1: S E$ 4: E T 6: E E + T 10: T i 12: T (E) T i 5 11: T i i ( 9 15: T (E) ) 1 E $ 2 2: S E $ + ( 7 13: T ( E) 4: E T 6: E E + T 10: T i 12: T (E) 8 14: T (E ) 7: E E + T 2: S E $ 7: E E + T 3 7: E E + T 10: T i 12: T (E) E + T 4 8: E E + T 20
Identifying handles n Create a finite state automaton over grammar symbols n n Use automaton to build parser tables n n n shift For items A t on token t reduce For items A on every token Any grammar has n n Sets of LR(0) items Transition diagram GOTO table Not every grammar has deterministic action table When no conflicts occur use a DPDA which pushes states on the stack 21
Non-LR(0) grammars n When conflicts occur the grammar is not LR(0) n n n Known cases n n n Parsing table contains non-determinism shift-reduce conflicts reduce-reduce conflicts shift-shift conflicts? Operator precedence Operator associativity Dangling if-then-else Unary minus Solutions n n Develop equivalent non-ambiguous grammar Patch parsing table to shift/reduce Precedence and associativity of tokens Stronger parser algorithm: SLR/LR(1)/LALR(1) 22
Precedence and associativity n Precedence n n E E+E *E E E+E Reduce + precedes * Shift * precedes + 23
Precedence and associativity n Precedence n n E E E+E *E E E+E Reduce + precedes * Shift * precedes + E E 1 E + E 2 * 3 =9 E E E 1 E + 2 E * 3 =7 24
Precedence and associativity n Precedence n n n E E+E *E E E+E Reduce + precedes * Shift * precedes + Associativity n n E E+E +E E E+E Shift + right-associative Reduce + left-associative 25
Dangling else/if-else ambiguity Grammar: S if E then S else S S if E then S S other if a then if b then e 1 else e 2 which interpretation should we use? (1) if a then { if b then e 1 else e 2 } -- standard interpretation (2) if a then { if b then e 1 } else e 2 shift/reduce conflict LR(1) items: S if E then S else S token: else (any) 26
See you next week 27
Grammar hierarchy Non-ambiguous CFG LALR(1) LL(1) SLR(1) LR(0) 28
- Cross compiler in compiler design
- Yet another compiler compiler
- Syntax analysis in compiler design
- Sdt in compiler design
- Syntax of mkleaf is
- Winter kommt winter kommt flocken fallen nieder
- Heute mittwoch guten morgen mittwoch winter
- Meine lieblingsjahreszeit ist der winter
- Kenneth c. louden
- Explain front end and back end of compiler
- Compiler construction: principles and practice
- Type checking in compiler design
- Preprocessor in compiler construction
- Lexical analysis in compiler construction
- Machine independent code optimization
- Compilers and interpreters are themselves
- Thompson construction in compiler design
- Explain compiler construction tools
- The structure of a compiler
- Applications of sdd in compiler design
- Semantic analysis compiler
- What are language processing activities
- Ssrange in cobol example
- Followpos calculator
- Python history and features
- Flex compiler tutorial
- The fortran optimizing compiler
- Cousins of compiler