Fall 2014 2015 Compiler Principles Lecture 7 Intermediate

  • Slides: 55
Download presentation
Fall 2014 -2015 Compiler Principles Lecture 7: Intermediate Representation Roman Manevich Ben-Gurion University

Fall 2014 -2015 Compiler Principles Lecture 7: Intermediate Representation Roman Manevich Ben-Gurion University

Tentative syllabus Front End Intermediate Representation Optimizations Code Generation Local Optimizations Register Allocation Top-down

Tentative syllabus Front End Intermediate Representation Optimizations Code Generation Local Optimizations Register Allocation Top-down Parsing (LL) Dataflow Analysis Instruction Selection Bottom-up Parsing (LR) Loop Optimizations Scanning Lowering Attribute Grammars mid-term exam 2

Previously • Becoming parsing ninjas – Going from text to an Abstract Syntax Tree

Previously • Becoming parsing ninjas – Going from text to an Abstract Syntax Tree By Admiral Ham [GFDL (http: //www. gnu. org/copyleft/fdl. html) or CC-BY-SA-3. 0 (http: //creativecommons. org/licenses/by-sa/3. 0)], via Wikimedia Commons 3

From scanning to parsing 59 + (1257 * x. Position) program text Lexical Analyzer

From scanning to parsing 59 + (1257 * x. Position) program text Lexical Analyzer token stream Grammar: E id E num E E+E E E*E E (E) num + ( num * id ) Parser valid syntax error + num Abstract Syntax Tree * num x 4

Agenda • Why compilers need Intermediate Representations (IR) • Translating from abstract syntax (AST)

Agenda • Why compilers need Intermediate Representations (IR) • Translating from abstract syntax (AST) to IR – Three-Address Code 5

Role of intermediate representation • Bridge between front-end and back-end High-level Language Lexical Analysis

Role of intermediate representation • Bridge between front-end and back-end High-level Language Lexical Analysis Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Generation Executable Code (scheme) • Allow implementing optimizations independent of source language and executable (target) language 6

Motivation for intermediate representation 7

Motivation for intermediate representation 7

Intermediate representation • A language that is between the source language and the target

Intermediate representation • A language that is between the source language and the target language – Not specific to any source language of machine language • Goal 1: retargeting compiler components for different source languages/target machines Java C++ Pyhton Pentium IR Java bytecode Sparc 8

Intermediate representation • A language that is between the source language and the target

Intermediate representation • A language that is between the source language and the target language – Not specific to any source language of machine language • Goal 1: retargeting compiler components for different source languages/target machines • Goal 2: machine-independent optimizer – Narrow interface: small number of node types (instructions) Lowering Java C++ Pyhton Code Gen. optimize IR Pentium Java bytecode Sparc 9

Multiple IRs • Some optimizations require high-level structure • Others more appropriate on low-level

Multiple IRs • Some optimizations require high-level structure • Others more appropriate on low-level code • Solution: use multiple IR stages AST optimize HIR LIR Pentium Java bytecode Sparc 10

Multiple IRs example Elixir – a language for parallel graph algorithms Elixir Program Delta

Multiple IRs example Elixir – a language for parallel graph algorithms Elixir Program Delta Inferencer Query Elixir Program + delta HIR Lowering Answer HIR IL Synthesizer Planning Problem Plan LIR C++ backend C++ code Automated Reasoning (Boogie+Z 3) Automated Planner Mini-project on parallel graph algorithms Galois Library 11

AST vs. LIR for imperative languages AST LIR • Rich set of language constructs

AST vs. LIR for imperative languages AST LIR • Rich set of language constructs • Rich type system • Declarations: types (classes, interfaces), functions, variables • Control flow statements: ifthen-else, while-do, breakcontinue, switch, exceptions • Data statements: assignments, array access, field access • Expressions: variables, constants, arithmetic operators, logical operators, function calls • An abstract machine language • Very limited type system • Only computation-related code • Labels and conditional/ unconditional jumps, no looping • Data movements, generic memory access statements • No sub-expressions, logical as numeric, temporaries, constants, function calls – explicit argument passing 12

Lowering to three address code 13

Lowering to three address code 13

Three-Address Code IR • A popular form of IR Chapter 8 • High-level assembly

Three-Address Code IR • A popular form of IR Chapter 8 • High-level assembly where instructions have at most three operands • There exist other types of IR – For example, IR based on acyclic graphs – more amenable for analysis and optimizations 14

TAC sub-expressions example Source int a; int b; int c; int d; a :

TAC sub-expressions example Source int a; int b; int c; int d; a : = b + c + d; b : = a * a + b * b; LIR (unoptimized) Where have the declarations gone? _t 0 : = b a : = _t 0 _t 1 : = a _t 2 : = b b : = _t 1 + + * * + c; d; a; b; _t 2; 15

TAC sub-expressions example Source int a; int b; int c; int d; a :

TAC sub-expressions example Source int a; int b; int c; int d; a : = b + c + d; b : = a * a + b * b; LIR (unoptimized) _t 0 : = b a : = _t 0 _t 1 : = a _t 2 : = b b : = _t 1 + + * * + c; d; a; b; _t 2; Temporaries explicitly store intermediate values resulting from sub -expressions 16

Elements of TAC-based IR 17

Elements of TAC-based IR 17

Variable assignments • • var : = constant; var 1 : = var 2

Variable assignments • • var : = constant; var 1 : = var 2 op var 3; var 1 : = constant op var 2; var 1 : = var 2 op constant; var : = constant 1 op constant 2; Permitted operators are +, -, *, /, % 18

Booleans • Boolean variables are represented as integers that have zero or nonzero values

Booleans • Boolean variables are represented as integers that have zero or nonzero values • In addition to the arithmetic operator, TAC supports <, ==, ||, and && • How might you compile the following? b : = (x <= y); _t 0 : = x < y; _t 1 : = x == y; b : = _t 0 || _t 1; 19

Booleans • Boolean variables are represented as integers that have zero or nonzero values

Booleans • Boolean variables are represented as integers that have zero or nonzero values • In addition to the arithmetic operator, TAC supports <, ==, ||, and && • How might you compile the following? b : = (x <= y); _t 0 : = x < y; _t 1 : = x == y; b : = _t 0 + _t 1; 20

Unary operators • How would you compile the following assignments from unary statements? y

Unary operators • How would you compile the following assignments from unary statements? y : = -x; y : = 0 - x; y : = -1 * x; z : = !w; z : = w == 0; 21

Control flow instructions • Label introduction _label_name: Indicates a point in the code that

Control flow instructions • Label introduction _label_name: Indicates a point in the code that can be jumped to • Unconditional jump: go to instruction following label L Goto L; • Conditional jump: test condition variable t; if 0, jump to label L If. Z t Goto L; • Similarly : test condition variable t; if 1, jump to label L If. NZ t Goto L; 22

Control-flow example – conditions int x; int y; int z; if (x < y)

Control-flow example – conditions int x; int y; int z; if (x < y) z : = x; else z : = y; z : = z * z; ? 23

Control-flow example – conditions int x; int y; int z; if (x < y)

Control-flow example – conditions int x; int y; int z; if (x < y) z : = x; else z : = y; z : = z * z; _t 0 : = x < y; If. Z _t 0 Goto _L 0; z : = x; Goto _L 1; _L 0: z : = y; _L 1: z : = z * z; 24

Control-flow example – loops int x; int y; while (x < y) { x

Control-flow example – loops int x; int y; while (x < y) { x : = x * 2; } y : = x; ? 25

Control-flow example – loops int x; int y; while (x < y) { x

Control-flow example – loops int x; int y; while (x < y) { x : = x * 2; } _L 0: _t 0 : = x < y; If. Z _t 0 Goto _L 1; x : = x * 2; Goto _L 0; _L 1: y : = x; 26

Data as control flow x : = y || z; x : = y

Data as control flow x : = y || z; x : = y If. Z z Goto _L 1; x : = 1; _L 1: 27

Functions • Store local variables/temporaries in a stack • A function call instruction pushes

Functions • Store local variables/temporaries in a stack • A function call instruction pushes arguments to stack and jumps to the function label A statement x: =f(a 1, …, an); looks like Push a 1; … Push an; Call f; Pop x; // copy returned value • Returning a value is done by pushing it to the stack (return x; ) Push x; • Return control to caller (and roll up stack) Return; 28

A logical stack frame Param N Parameters (actual arguments) Param N-1 … Param 1

A logical stack frame Param N Parameters (actual arguments) Param N-1 … Param 1 _t 0 … Locals and temporaries Stack frame for function f(a 1, …, a. N) _tk x … y 29

Functions example int Simple. Fn(int z) { int x, y; x : = x

Functions example int Simple. Fn(int z) { int x, y; x : = x * y * z; return x; } void main() { int w; w : = Simple. Fn(137); } _Simple. Fn: Pop z; _t 0 : = x * y; _t 1 : = _t 0 * z; x : = _t 1; Push x; Return; main: _t 0 : = 137; Push _t 0; Call _Simple. Fn; Pop w; 30

Memory access instructions • Copy instruction: a = b • Load/store instructions: a :

Memory access instructions • Copy instruction: a = b • Load/store instructions: a : = *b *a : = b • Address of instruction a : = &b • Array accesses: a : = b[i] a[i] : = b • Field accesses: constant a : = b[f] a[f] : = b • Memory allocation: a : = alloc(size) constant – Sometimes left out (e. g. , malloc is a procedure in C) 31

Lowering AST to TAC via syntax directed translation 32

Lowering AST to TAC via syntax directed translation 32

TAC generation • At this stage in compilation, we have – an AST –

TAC generation • At this stage in compilation, we have – an AST – annotated with scope information – and annotated with type information • To generate TAC for the program, we do recursive tree traversal – Generate TAC for any subexpressions and substatements – Using the result, generate TAC for the overall expression (bottom-up manner) 33

TAC generation for expressions • Define a function cgen(expr) that generates TAC that computes

TAC generation for expressions • Define a function cgen(expr) that generates TAC that computes an expression, stores it in a temporary variable, then hands back the name of that temporary • Define cgen directly for atomic expressions (constants, this, identifiers, etc. ) • Define cgen recursively for compound expressions (binary operators, function calls, etc. ) 34

cgen for basic expressions cgen(k) = { // k is a constant Choose a

cgen for basic expressions cgen(k) = { // k is a constant Choose a new temporary t Emit( t : = k ) Return t } cgen(id) = { // id is an identifier Choose a new temporary t Emit( t : = id ) Return t } 35

Naive cgen for binary expressions • Maintain a counter for temporaries in c •

Naive cgen for binary expressions • Maintain a counter for temporaries in c • Initially: c = 0 • cgen(e 1 op e 2) = { Let A = cgen(e 1) c=c+1 The translation emits code Let B = cgen(e 2) to evaluate e before e. c=c+1 Why is that? Emit( _tc : = A op B; ) Return _tc } 1 2 36

Example: cgen for binary expressions cgen( (a*b)-d) 37

Example: cgen for binary expressions cgen( (a*b)-d) 37

Example: cgen for binary expressions c=0 cgen( (a*b)-d) 38

Example: cgen for binary expressions c=0 cgen( (a*b)-d) 38

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { Let A = cgen(a*b)

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { Let A = cgen(a*b) c=c+1 Let B = cgen(d) c=c+1 Emit( _tc : = A - B; ) Return _tc } 39

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { Let A = cgen(a)

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { Let A = cgen(a) c=c+1 Let B = cgen(b) c=c+1 Emit( _tc : = A * B; ) Return tc } c=c+1 Let B = cgen(d) c=c+1 Emit( _tc : = A - B; ) Return _tc } 40

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { here A=_t 0 Let

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc : = a; ), return _tc } c=c+1 Let B = { Emit(_tc : = b; ), return _tc } c=c+1 Emit( _tc : = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc : = d; ), return _tc } c=c+1 Emit( _tc : = A - B; ) Return _tc } Code 41

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { here A=_t 0 Let

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc : = a; ), return _tc } c=c+1 Let B = { Emit(_tc : = b; ), return _tc } c=c+1 Emit( _tc : = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc : = d; ), return _tc } c=c+1 Emit( _tc : = A - B; ) Return _tc } Code _t 0: =a; 42

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { here A=_t 0 Let

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc : = a; ), return _tc } c=c+1 Let B = { Emit(_tc : = b; ), return _tc } c=c+1 Emit( _tc : = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc : = d; ), return _tc } c=c+1 Emit( _tc : = A - B; ) Return _tc } Code _t 0: =a; _t 1: =b; 43

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { here A=_t 0 Let

Example: cgen for binary expressions c=0 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc : = a; ), return _tc } c=c+1 Let B = { Emit(_tc : = b; ), return _tc } c=c+1 Emit( _tc : = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc : = d; ), return _tc } c=c+1 Emit( _tc : = A - B; ) Return _tc } Code _t 0: =a; _t 1: =b; _t 2: =_t 0*_t 1 44

Example: cgen for binary expressions c=0 here A=_t 2 cgen( (a*b)-d) = { here

Example: cgen for binary expressions c=0 here A=_t 2 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc : = a; ), return _tc } c=c+1 Let B = { Emit(_tc : = b; ), return _tc } c=c+1 Emit( _tc : = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc : = d; ), return _tc } c=c+1 Emit( _tc : = A - B; ) Return _tc } Code _t 0: =a; _t 1: =b; _t 2: =_t 0*_t 1 45

Example: cgen for binary expressions c=0 here A=_t 2 cgen( (a*b)-d) = { here

Example: cgen for binary expressions c=0 here A=_t 2 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc : = a; ), return _tc } c=c+1 Let B = { Emit(_tc : = b; ), return _tc } c=c+1 Emit( _tc : = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc : = d; ), return _tc } c=c+1 Emit( _tc : = A - B; ) Return _tc } Code _t 0: =a; _t 1: =b; _t 2: =_t 0*_t 1 _t 3: =d; 46

Example: cgen for binary expressions c=0 here A=_t 2 cgen( (a*b)-d) = { here

Example: cgen for binary expressions c=0 here A=_t 2 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc : = a; ), return _tc } c=c+1 Let B = { Emit(_tc : = b; ), return _tc } c=c+1 Emit( _tc : = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc : = d; ), return _tc } c=c+1 Emit( _tc : = A - B; ) Return _tc } Code _t 0: =a; _t 1: =b; _t 2: =_t 0*_t 1 _t 3: =d; _t 4: =_t 2 -_t 3 47

cgen as recursive AST traversal cgen(5 + x) visit Add. Expr left right t

cgen as recursive AST traversal cgen(5 + x) visit Add. Expr left right t : = t 1 + t 2 visit (right) visit (left) t 1 : = 5; t 2 : = x; t : = t 1 + t 2; Num val = 5 t 1: =5 Ident name = x t 2: =x 48

cgen for short-circuit disjunction cgen(e 1 || e 2) Emit(_t 1 : = 0;

cgen for short-circuit disjunction cgen(e 1 || e 2) Emit(_t 1 : = 0; ) Emit(_t 2 : = 0; ) Let Lafter be a new label Let _t 1 = cgen(e 1) Emit( If. NZ _t 1 Goto Lafter) Let _t 2 = cgen(e 2) Emit( Lafter: ) Emit( _t : = _t 1 || _t 2; ) Return _t 49

cgen for statements • We can extend the cgen function to operate over statements

cgen for statements • We can extend the cgen function to operate over statements as well • Unlike cgen for expressions, cgen for statements does not return the name of a temporary holding a value – (Why? ) 50

cgen for simple statements cgen(expr; ) = { cgen(expr) } 51

cgen for simple statements cgen(expr; ) = { cgen(expr) } 51

cgen for if-then-else cgen(if (e) s 1 else s 2) Let _t = cgen(e)

cgen for if-then-else cgen(if (e) s 1 else s 2) Let _t = cgen(e) Let Ltrue be a new label Let Lfalse be a new label Let Lafter be a new label Emit( If. Z _t Goto Lfalse; ) cgen(s 1) Emit( Goto Lafter; ) Emit( Lfalse: ) cgen(s 2) Emit( Lafter: ) 52

cgen for while loops cgen(while (expr) stmt) Let Lbefore be a new label Let

cgen for while loops cgen(while (expr) stmt) Let Lbefore be a new label Let Lafter be a new label Emit( Lbefore: ) Let t = cgen(expr) Emit( If. Z t Goto Lafter; ) cgen(stmt) Emit( Goto Lbefore; ) Emit( Lafter: ) 53

Exercise: cgen for try-catch cgen(try { try-stmt } catch (v) { catch-stmt } )

Exercise: cgen for try-catch cgen(try { try-stmt } catch (v) { catch-stmt } ) ? 54

Next lecture: IR part 2

Next lecture: IR part 2