UMBC Introduction to Compilers CMSC 431 Shon Vick
UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02
UMBC What is a compiler? • Translates source code to target code – Source code is typically a high level programming language (Java, C++, etc) but does not have to be – Target code is often a low level language like assembly or machine code but does not have to be • Can you think of other compilers that you have used – according to this definition? 2
UMBC • • Other Compilers Javadoc -> HTML SQL Query output -> Table Poscript -> PDF High level description of a circuit -> machine instructions to fabricate circuit 3
4 The Compilation Process
UMBC The analysis Stage • Broken up into four phases – Lexical Analysis (also called scanning or tokenization) – Parsing – Semantic Analysis – Intermediate Code Generation 5
UMBC double d 1; double d 2; d 2 = d 1 * 2. 0; lexemes Lexing Example double d 1 ; double d 2 ; d 2 = d 1 * 2. 0 ; TOK_DOUBLE reserved word TOK_ID variable name TOK_PUNCT has value of “; ” TOK_ID variable name TOK_OPER has value of “=” TOK_ID variable name TOK_OPER has value of “*” TOK_FLOAT_CONST has value of 2. 0 TOK_PUNCT has value of “; ” 6
UMBC Syntax and Semantics • Syntax - the form or structure of the expressions – whether an expression is well formed • Semantics – the meaning of an expression 7
UMBC Syntactic Structure • Syntax almost always expressed using some variant of a notation called a context-free grammar (CFG) or simply grammar – BNF – EBNF 8
UMBC A CFG has 4 parts • A set of tokens (lexemes), known as terminal symbols • A set of non-terminals • A set of rules (productions) where each production consists of a left-hand side (LHS) and a right-hand side (RHS) The LHS is a non-terminal and the RHS is a sequence of terminals and/or non-terminal symbols. • A special non-terminal symbol designated as the start symbol 9
UMBC An example of BNF syntax for real numbers <r> : : = <ds> : : = <d> | <d> <ds> <d> : : = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7| 8 | 9 <> : : = | encloses non-terminal symbols 'is' or 'is made up of ' or 'derives' (sometimes denoted with an arrow ->) or 10
UMBC Example • On the example from the previous slide: – What are the tokens? – What are the lexemes? – What are the non terminals? – What are the productions? 11
UMBC BNF Points • A non terminal can have more than RHS or an OR can be used • Lists or sequences are expressed via recursion • A derivation is just a repeated set of production (rule) applications • Examples 12
UMBC Example Grammar <program> -> <stmts> -> <stmt> -> <var> -> <expr> -> <term> <stmts> <stmt> | <stmt> ; <stmts> <var> = <expr> a|b|c|d <term> + <term> | <term> -> <var> | const 13
UMBC Example Derivation <program> => <stmts> => <stmt> => <var> = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + const 14
UMBC Parse Trees • Alternative representation for a derivation • Example parse tree for the previous stmts example stmt var = a term var b expr + term const 15
UMBC Another Example Expression -> Expression + Expression | Expression - Expression |. . . Variable | Constant |. . . Variable -> T_IDENTIFIER Constant -> T_INTCONSTANT | T_DOUBLECONSTANT 16
UMBC The Parse a+2 Expression -> -> -> Expression + Expression Variable + Expression T_IDENTIFIER + Constant T_IDENTIFIER + T_INTCONSTANT 17
UMBC Parse Trees PS -> P P PS e '(' PS ')' '<' PS '>' '[' PS ']' | | What’s the parse tree for this statement ? < [][<>]> 18
UMBC EBNF - Extended BNF • • Like BNF except that Non-terminals start w/ uppercase Parens are used for grouping terminals Braces {} represent zero or more occurrences (iteration ) • Brackets [] represent an optional construct , that is a construct that appears either once or not at all. 19
UMBC Exp -> Term -> Factor -> EBNF example Term { ('+' | '-') Term } Factor { ('*' | '/') Factor } '(' Exp ')' | variable | constant 20
UMBC • • EBNF/BNF EBNF and BNF are equivalent How can {} be expressed in BNF? How can ( ) be expressed? How can [ ] be expressed? 21
UMBC Semantic Analysis • The syntactically correct parse tree (or derivation) is checked for semantic errors • Check for constructs that while valid syntax do not obey the semantic rules of the source language. • Examples: – Use of an undeclared/un-initialized variable – Function called with improper arguments – Incompatible operands and type mismatches, 22
UMBC int i; int j; i = i + 2; Examples void fun 1(int i); double d; d = fun 1(2. 1); int arr[2], c; c = arr * 10; Most semantic analysis pertains to the checking of types. 23
UMBC Intermediate Code Generation • Where the intermediate representation of the source program is created. • The representation can have a variety of forms, but a common one is called threeaddress code (TAC) • Like assembly – the TAC is a sequence of simple instructions, each of which can have at most three operands. 24
UMBC a=b*c+b*d Example _t 1 = b * c _t 2 = b * d _t 3 = _t 1 + _t 2 a = _t 3 Note temps 25
UMBC Another Example if (a <= b) a = a - c; c = b * c; Note Temps Symbolic addresses _t 1 = a > b if _t 1 goto L 0 _t 2 = a - c a = _t 2 L 0: t 3 = b * c c = _t 3 26
UMBC Next Time • Finish introduction to compilation stages • Read Aho/Sethi/Ullman Chapter 1 27
UMBC Selected References • Compilers Principles, Techniques and Tools, Aho, Sethi, and Ullman • http: //www. stanford. edu/class/cs 143/ 28
- Slides: 28