Compilation 0368 3133 201415 a Lecture 7 Getting
- Slides: 78
Compilation 0368 -3133 2014/15 a Lecture 7 Getting into the back-end Noam Rinetzky 1
But first, a short reminder 2
What is a compiler? “A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language). The most common reason for wanting to transform source code is to create an executable program. ” --Wikipedia 3
Where we were txt Source text Process text input characters Lexical Analysis ✓ tokens Syntax Analysis ✓ AST Semantic Analysis ✓ Front-End Annotated AST Intermediate code generation IR Intermediate code optimization Code generation IR Symbolic Instructions Back-End Target code optimization SI Machine code generation MI Write executable output exe Executable code 4
Lexical Analysis program text ((23 + 7) * x) Lexical Analyzer token stream ( ( 23 + 7 ) * x ) LP LP Num OP Num RP OP Id RP 5
From scanning to parsing program text ((23 + 7) * x) Lexical Analyzer token stream ( ( 23 + 7 ) * x ) LP LP Num OP Num RP OP Id RP Grammar: E . . . | Id Id ‘a’ |. . . | ‘z’ Parser syntax error valid Op(*) Op(+) Abstract Syntax Tree Id(b) Num(23) Num(7) 6
Context Analysis Op(*) Type rules E 1 : int Op(+) E 2 : int E 1 + E 2 : int Abstract Syntax Tree Id(b) Num(23) Num(7) Semantic Error Valid + Symbol Table 7
Code Generation … Op(*) Op(+) Id(b) Valid Abstract Syntax Tree Symbol Table Num(23) Num(7) Verification (possible runtime) Errors/Warnings input Executable Code output 8
What is a compiler? “A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language). The most common reason for wanting to transform source code is to create an executable program. ” 9
A CPU is (a sort of) an Interpreter “A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language). The most common reason for wanting to transform source code is to create an executable program. ” • Interprets machine code … – Why not AST? • Do we want to go from AST directly to MC? – We can, but … • Machine specific • Very low level 10
Code Generation in Stages Op(*) … Op(+) Id(b) Valid Abstract Syntax Tree Symbol Table Num(23) Num(7) Verification (possible runtime) Errors/Warnings Intermediate Representation (IR) input Executable Code output 11
Where we are txt Source text Process text input characters Lexical Analysis ✓ tokens Syntax Analysis ✓ AST Sem. Analysis ✓ Front-End Annotated AST Intermediate code generation IR Back-End Intermediate code optimization IR Code generation Symbolic Instructions Target code optimization SI Machine code generation MI Write executable output exe Executable code 12
1 Note: Compile Time vs Runtime • Compile time: Data structures used during program compilation • Runtime: Data structures used during program execution – Activation record stack – Memory management • The compiler generates code that allows the program to interact with the runtime 13
Intermediate Representation 14
Code Generation: IR Source code Lexical Analysis Syntax Analysis AST Parsing (program) Symbol Table etc. Inter. Rep. (IR) Source code Code Generation (executable) • Translating from abstract syntax (AST) to intermediate representation (IR) – Three-Address Code • … 15
Three-Address Code IR Chapter 8 • A popular form of IR • High-level assembly where instructions have at most three operands 16
IR by example 17
Sub-expressions example Source int a; int b; int c; int d; a = b + c + d; b = a * a + b * b; IR _t 0 = b + c; a = _t 0 + d; _t 1 = a * a; _t 2 = b * b; b = _t 1 + _t 2; 18
Sub-expressions example Source int a; int b; int c; int d; a = b + c + d; b = a * a + b * b; LIR (unoptimized) _t 0 = b + c; a = _t 0 + d; _t 1 = a * a; _t 2 = b * b; b = _t 1 + _t 2; Temporaries explicitly store intermediate values resulting from sub -expressions 19
Variable assignments • • var = constant; var 1 = var 2 op var 3; var 1 = constant op var 2; var 1 = var 2 op constant; var = constant 1 op constant 2; Permitted operators are +, -, *, /, % 20
Booleans • Boolean variables are represented as integers that have zero or nonzero values • In addition to the arithmetic operator, TAC supports <, ==, ||, and && • How might you compile the following? b = (x <= y); _t 0 = x < y; _t 1 = x == y; b = _t 0 || _t 1; 21
Unary operators • How might you compile the following assignments from unary statements? y = -x; y = 0 - x; y = -1 * x; z : = !w; z = w == 0; 22
Control flow instructions • Label introduction _label_name: Indicates a point in the code that can be jumped to • Unconditional jump: go to instruction following label L Goto L; • Conditional jump: test condition variable t; if 0, jump to label L If. Z t Goto L; • Similarly : test condition variable t; if not zero, jump to label L If. NZ t Goto L; 23
Control-flow example – conditions int x; int y; int z; if (x z else z z = z _t 0 = x < y; If. Z _t 0 Goto _L 0; z = x; Goto _L 1; < y) = x; = y; * z; _L 0: z = y; _L 1: z = z * z; 24
Control-flow example – loops int x; int y; while (x < y) { x = x * 2; } _L 0: _t 0 = x < y; If. Z _t 0 Goto _L 1; x = x * 2; Goto _L 0; _L 1: y = x; 25
Procedures / Functions p(){ int y=1, x=0; x=f(a 1, …, an); print(x); } • What happens in runtime? p f 26
Memory Layout (popular convention) High addresses Global Variables Stack Heap Low addresses 27
A logical stack frame Parameters (actual arguments) Param N-1 … Param 1 _t 0 … Locals and temporaries _tk x Stack frame for function f(a 1, …, an) … y 28
Procedures / Functions • A procedure call instruction pushes arguments to stack and jumps to the function label A statement x=f(a 1, …, an); looks like Push a 1; … Push an; Call f; Pop x; // pop returned value, and copy to it • Returning a value is done by pushing it to the stack (return x; ) Push x; • Return control to caller (and roll up stack) Return; 29
Functions example int Simple. Fn(int z) { int x, y; x = x * y * z; return x; } _Simple. Fn: _t 0 = x * y; _t 1 = _t 0 * z; x = _t 1; Push x; Return; void main() { int w; w = Simple. Function(137); } main: _t 0 = 137; Push _t 0; Call _Simple. Fn; Pop w; 30
Memory access instructions • Copy instruction: a = b • Load/store instructions: a = *b *a = b • Address of instruction a=&b • Array accesses: a = b[i] a[i] = b • Field accesses: a = b[f] a[f] = b • Memory allocation instruction: a = alloc(size) – Sometimes left out (e. g. , malloc is a procedure in C) 31
Memory access instructions • Copy instruction: a = b • Load/store instructions: a = *b *a = b • Address of instruction a=&b • Array accesses: a = b[i] a[i] = b • Field accesses: a = b[f] a[f] = b • Memory allocation instruction: a = alloc(size) – Sometimes left out (e. g. , malloc is a procedure in C) 32
Array operations x : = y[i] t 1 : = &y ; t 1 = address-of y t 2 : = t 1 + i ; t 2 = address of y[i] x : = *t 2 ; loads the value located at y[i] x[i] : = y t 1 : = &x ; t 1 = address-of x t 2 : = t 1 + i ; t 2 = address of x[i] *t 2 : = y ; store through pointer 33
IR Summary 34
Intermediate representation • A language that is between the source language and the target language – not specific to any machine • Goal 1: retargeting compiler components for different source languages/target machines Java C++ Pyhton Pentium IR Java bytecode Sparc 35
Intermediate representation • A language that is between the source language and the target language – not specific to any machine • Goal 1: retargeting compiler components for different source languages/target machines • Goal 2: machine-independent optimizer – Narrow interface: small number of instruction types Lowering Java C++ Pyhton Code Gen. optimize IR Pentium Java bytecode Sparc 36
Multiple IRs • Some optimizations require high-level structure • Others more appropriate on low-level code • Solution: use multiple IR stages AST optimize HIR LIR Pentium Java bytecode Sparc 37
AST vs. LIR for imperative languages AST LIR • Rich set of language constructs • Rich type system • Declarations: types (classes, interfaces), functions, variables • Control flow statements: ifthen-else, while-do, breakcontinue, switch, exceptions • Data statements: assignments, array access, field access • Expressions: variables, constants, arithmetic operators, logical operators, function calls • An abstract machine language • Very limited type system • Only computation-related code • Labels and conditional/ unconditional jumps, no looping • Data movements, generic memory access statements • No sub-expressions, logical as numeric, temporaries, constants, function calls – explicit argument passing 38
Lowering AST to TAC 39
IR Generation Op(*) … Op(+) Id(b) Valid Abstract Syntax Tree Symbol Table Num(23) Num(7) Verification (possible runtime) Errors/Warnings Intermediate Representation (IR) input Executable Code output 40
TAC generation • At this stage in compilation, we have – an AST – annotated with scope information – and annotated with type information • To generate TAC for the program, we do recursive tree traversal – Generate TAC for any subexpressions or substatements – Using the result, generate TAC for the overall expression 41
TAC generation for expressions • Define a function cgen(expr) that generates TAC that computes an expression, stores it in a temporary variable, then hands back the name of that temporary – Define cgen directly for atomic expressions (constants, this, identifiers, etc. ) • Define cgen recursively for compound expressions (binary operators, function calls, etc. ) 42
cgen for basic expressions cgen(k) = { // k is a constant Choose a new temporary t Emit( t = k ) Return t } cgen(id) = { // id is an identifier Choose a new temporary t Emit( t = id ) Return t } 43
cgen for binary operators cgen(e 1 + e 2) = { Choose a new temporary t Let t 1 = cgen(e 1) Let t 2 = cgen(e 2) Emit( t = t 1 + t 2 ) Return t } 44
cgen example cgen(5 + x) = { Choose a new temporary t Let t 1 = cgen(5) Let t 2 = cgen(x) Emit( t = t 1 + t 2 ) Return t } 45
cgen example cgen(5 + x) = { Choose a new temporary t Let t 1 = { Choose a new temporary t Emit( t = 5; ) Return t } Let t 2 = cgen(x) Emit( t = t 1 + t 2 ) Return t } 46
cgen example cgen(5 + x) = { Choose a new temporary t Let t 1 = { Choose a new temporary t Emit( t = 5; ) Return t } Let t 2 = { Choose a new temporary t Emit( t = x; ) Return t } Emit( t = t 1 + t 2; ) Return t } Returns an arbitrary fresh name t 1 = 5; t 2 = x; t = t 1 + t 2; 47
cgen example cgen(5 + x) = { Choose a new temporary t Let t 1 = { Choose a new temporary t Emit( t = 5; ) Return t } Let t 2 = { Choose a new temporary t Emit( t = x; ) Return t } Emit( t = t 1 + t 2; ) Return t } Returns an arbitrary fresh name _t 18 = 5; _t 29 = x; _t 6 = _t 18 + _t 29; Inefficient translation, but we will improve this later 48
cgen as recursive AST traversal cgen(5 + x) visit Add. Expr left right t = t 1 + t 2 visit (right) visit (left) t 1 = 5; t 2 = x; t = t 1 + t 2; Num val = 5 t 1 = 5 Ident name = x t 2 = x 49
Naive cgen for expressions • Maintain a counter for temporaries in c • Initially: c = 0 • cgen(e 1 op e 2) = { Let A = cgen(e 1) c=c+1 Let B = cgen(e 2) c=c+1 Emit( _tc = A op B; ) Return _tc } 50
Example cgen( (a*b)-d) 51
Example c=0 cgen( (a*b)-d) 52
Example c=0 cgen( (a*b)-d) = { Let A = cgen(a*b) c=c+1 Let B = cgen(d) c=c+1 Emit( _tc = A - B; ) Return _tc } 53
Example c=0 cgen( (a*b)-d) = { Let A = cgen(a) c=c+1 Let B = cgen(b) c=c+1 Emit( _tc = A * B; ) Return tc } c=c+1 Let B = cgen(d) c=c+1 Emit( _tc = A - B; ) Return _tc } 54
Example c=0 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc = a; ), return _tc } c=c+1 Let B = { Emit(_tc = b; ), return _tc } c=c+1 Emit( _tc = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc = d; ), return _tc } c=c+1 Emit( _tc = A - B; ) Return _tc } Code 55
Example c=0 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc = a; ), return _tc } c=c+1 Let B = { Emit(_tc = b; ), return _tc } c=c+1 Emit( _tc = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc = d; ), return _tc } c=c+1 Emit( _tc = A - B; ) Return _tc } Code _t 0=a; 56
Example c=0 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc = a; ), return _tc } c=c+1 Let B = { Emit(_tc = b; ), return _tc } c=c+1 Emit( _tc = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc = d; ), return _tc } c=c+1 Emit( _tc = A - B; ) Return _tc } Code _t 0=a; _t 1=b; 57
Example c=0 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc = a; ), return _tc } c=c+1 Let B = { Emit(_tc = b; ), return _tc } c=c+1 Emit( _tc = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc = d; ), return _tc } c=c+1 Emit( _tc = A - B; ) Return _tc } Code _t 0=a; _t 1=b; _t 2=_t 0*_t 1 58
Example c=0 here A=_t 2 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc = a; ), return _tc } c=c+1 Let B = { Emit(_tc = b; ), return _tc } c=c+1 Emit( _tc = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc = d; ), return _tc } c=c+1 Emit( _tc = A - B; ) Return _tc } Code _t 0=a; _t 1=b; _t 2=_t 0*_t 1 59
Example c=0 here A=_t 2 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc = a; ), return _tc } c=c+1 Let B = { Emit(_tc = b; ), return _tc } c=c+1 Emit( _tc = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc = d; ), return _tc } c=c+1 Emit( _tc = A - B; ) Return _tc } Code _t 0=a; _t 1=b; _t 2=_t 0*_t 1 _t 3=d; 60
Example c=0 here A=_t 2 cgen( (a*b)-d) = { here A=_t 0 Let A = { Emit(_tc = a; ), return _tc } c=c+1 Let B = { Emit(_tc = b; ), return _tc } c=c+1 Emit( _tc = A * B; ) Return _tc } c=c+1 Let B = { Emit(_tc = d; ), return _tc } c=c+1 Emit( _tc = A - B; ) Return _tc } Code _t 0=a; _t 1=b; _t 2=_t 0*_t 1 _t 3=d; _t 4=_t 2 -_t 3 61
cgen for statements • We can extend the cgen function to operate over statements as well • Unlike cgen for expressions, cgen for statements does not return the name of a temporary holding a value. – (Why? ) 63
cgen for if-then-else cgen(if (e) s 1 else s 2) Let _t = cgen(e) Let Ltrue be a new label Let Lfalse be a new label Let Lafter be a new label Emit( If. Z _t Goto Lfalse; ) cgen(s 1) Emit( Goto Lafter; ) Emit( Lfalse: ) cgen(s 2) Emit( Goto Lafter; ) Emit( Lafter: ) 65
cgen for while loops cgen(while (expr) stmt) Let Lbefore be a new label. Let Lafter be a new label. Emit( Lbefore: ) Let t = cgen(expr) Emit( If. Z t Goto Lafter; ) cgen(stmt) Emit( Goto Lbefore; ) Emit( Lafter: ) 66
Our first optimization 67
Naive cgen for expressions • Maintain a counter for temporaries in c • Initially: c = 0 • cgen(e 1 op e 2) = { Let A = cgen(e 1) c=c+1 Let B = cgen(e 2) c=c+1 Emit( _tc = A op B; ) Return _tc } 68
Naïve translation • cgen translation shown so far very inefficient – Generates (too) many temporaries – one per subexpression – Generates many instructions – at least one per subexpression • Expensive in terms of running time and space • Code bloat • We can do much better … 69
Naive cgen for expressions • Maintain a counter for temporaries in c • Initially: c = 0 • cgen(e 1 op e 2) = { Let A = cgen(e 1) c=c+1 Let B = cgen(e 2) c=c+1 Emit( _tc = A op B; ) Return _tc } • Observation: temporaries in cgen(e 1) can be reused in cgen(e 2) 70
Improving cgen for expressions • Observation – naïve translation needlessly generates temporaries for leaf expressions • Observation – temporaries used exactly once – Once a temporary has been read it can be reused for another sub-expression • cgen(e 1 op e 2) = { Let _t 1 = cgen(e 1) Let _t 2 = cgen(e 2) Emit( _t =_t 1 op _t 2; ) Return t } • Temporaries cgen(e 1) can be reused in cgen(e 2) 71
Sethi-Ullman translation • Algorithm by Ravi Sethi and Jeffrey D. Ullman to emit optimal TAC – Minimizes number of temporaries • Main data structure in algorithm is a stack of temporaries – Stack corresponds to recursive invocations of _t = cgen(e) – All the temporaries on the stack are live • Live = contain a value that is needed later on 72
Live temporaries stack • Implementation: use counter c to implement live temporaries stack – Temporaries _t(0), … , _t(c) are alive – Temporaries _t(c+1), _t(c+2)… can be reused – Push means increment c, pop means decrement c • In the translation of _t(c)=cgen(e 1 op e 2) _t(c) = cgen(e 1) c = c + 1 _t(c) = cgen(e 2) _t(c) = _t(c) op c = c - 1 _t(c+1) 73
Using stack of temporaries example _t 0 = cgen( ((c*d)-(e*f))+(a*b) ) c = 0 _t 0 = c*d c = c + 1 _t 0 = cgen(c*d)-(e*f)) _t 1 = e*f c = c - 1 _t 0 = _t 0 -_t 1 c = c + 1 _t 1 = a*b c = c - 1 _t 0 = _t 0 + _t 1 74
Temporaries Weighted register allocation • Suppose we have expression e 1 op e 2 – e 1, e 2 without side-effects • That is, no function calls, memory accesses, ++x – cgen(e 1 op e 2) = cgen(e 2 op e 1) – Does order of translation matter? • Sethi & Ullman’s algorithm translates heavier sub-tree first – Optimal local (per-statement) allocation for sideeffect-free statements 75
Example _t 0 = cgen( a+(b+(c*d)) ) + and * are commutative operators left child first _t 0 + _t 0 a right child first _t 0 + _t 1 b _t 1 a _t 2 * _t 2 c 4 temporaries _t 0 + _t 1 b _t 3 d _t 0 * _t 1 c _t 0 d 2 temporary 76
Weighted register allocation • Can save registers by re-ordering subtree computations • Label each node with its weight – Weight = number of registers needed – Leaf weight known – Internal node weight • w(left) > w(right) then w = left • w(right) > w(left) then w = right • w(right) = w(left) then w = left + 1 • Choose heavier child as first to be translated • WARNING: have to check that no side-effects exist before attempting to apply this optimization – pre-pass on the tree 77
Weighted reg. alloc. example _t 0 = cgen( a+b[5*c] ) Phase 1: - check absence of side-effects in expression tree - assign weight to each AST node + w=0 w=1 a base array access b w=1 index * w=0 5 w=1 c w=0 78
Weighted reg. alloc. example _t 0 = cgen( a+b[5*c] ) Phase 2: - use weights to decide on order of translation _t 0 + w=1 Heavier sub-tree w=0 _t 0 = c a _t 1 = 5 _t 0 = _t 1 * _t 0 _t 1 = b _t 0 = _t 1[_t 0] _t 1 = a _t 0 = _t 1 + _t 0 base array access b _t 1 w=1 index w=0 _t 0 * 5 _t 1 Heavier sub-tree w=1 c _t 0 w=0 79
Note on weighted register allocation • Must reset temporaries counter after every statement: x=y; y=z – should not be translated to _t 0 = y; x = _t 0; _t 1 = z; y = _t 1; – But rather to _t 0 = y; x = _t 0; # Finished translating statement. Set c=0 _t 0 = z; y= _t 0; 80
- Getting ahead
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Contoh compilation
- School work immersion coordinator
- Previous ipdb not found, fall back to full compilation.
- Robert van engelen
- What is data compilation
- Phases of reverse engineering
- Conclusion of vedic period
- Front end of a compiler
- Hotinpl
- Jav compliation
- Bus crash compilation
- Reverse compilation
- Mushaf e usmani
- What is pure and impure interpreter
- Hakan kutucu
- Getting it right
- Pregnancy odds
- A prayer before a test or exam
- Advantages and disadvantages of arranged marriage
- Splunk free training
- I'm a real powerhouse that's plain to see
- Chapter 5 lesson 4 getting help answer key
- Getting on and falling out
- Getting organizational redesign right
- Routinally
- Trigonometry memorization
- Symptoms before period
- Unit 2 mental and emotional health
- 1984 newspeak activity
- Ring occlusion test hernia
- Unit 1 getting to know you
- Getting ready to cook
- The sandlot hero's journey
- Unit 1 getting started
- Management is an art of getting things done
- Getting started with vivado
- The fur coat characters
- Linkedin getting started
- Getting everyone back together safely
- Unit 2 getting acquainted with the vehicle answer key
- When does elena receive dolls from her family members
- Hedgehog concept examples
- Outlook tutorial 2010
- Rancher getting started
- The art of getting along
- Power seeking behavior
- Ways of getting food
- Genetics yellow and blue make answer key
- What excites lennie most about his dream life with george
- Stay ready to keep from getting ready
- Getting wind knocked out of you
- Getting embedded software into the target system
- Get to know each other questions
- Oui hai
- Coverdale's 'systematic approach
- Ready cook safety
- Getting started with vivado
- Poll everywhere register
- Living by chemistry
- Dr jeffrey roach
- Getting nerdy llc
- Are we getting
- Fireworks exploding physical or chemical
- Phrenology
- Local environment getting started
- 3.05 getting the most for your money
- The art of getting things done through people
- Getting started with eclipse
- Getting started with excel
- Bates floaters
- What does lennie hallucinate
- What did nina find
- Getting acquainted with the vehicle
- Seal getting on and falling out
- What is a herbivore
- American researcher who involved in getting heart rate
- Getting