Compiler Optimization and Code Generation Professor Sc D

  • Slides: 33
Download presentation
Compiler Optimization and Code Generation Professor: Sc. D. , Professor Vazgen Melikyan 1 Synopsys

Compiler Optimization and Code Generation Professor: Sc. D. , Professor Vazgen Melikyan 1 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Course Overview n Introduction: Overview of Optimizations q n Intermediate-Code Generation q n 2

Course Overview n Introduction: Overview of Optimizations q n Intermediate-Code Generation q n 2 lectures Machine-Independent Optimizations q n 1 lecture 3 lectures Code Generation q 2 lectures 2 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Code Generation 3 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved.

Code Generation 3 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Machine Code Generation n Input: intermediate code + symbol tables q q q In

Machine Code Generation n Input: intermediate code + symbol tables q q q In this case, three-address code All variables have values that machines can directly manipulate Assume program is free of errors n n Output: q q q n Type checking has taken place, type conversion done Absolute/relocatable machine code or assembly code In this case, use assembly Architecture variations: RISC, CISC, stack-based Issues: q Memory management, instruction selection and scheduling, register allocation and assignment 4 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Retargetable Back End Generator Machine Description n Pattern. Matching engine Build retargetable compilers q

Retargetable Back End Generator Machine Description n Pattern. Matching engine Build retargetable compilers q q q n Tables Instruction Selector Isolate machine dependent info Compilers on different machines share a common IR n Can have common front and mid ends Table-based back ends share common algorithms Table-based instruction selector q Create a description of target machine, use back-end generator 5 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Translating from Three-Address Code n No more support for structured control-flow q n Every

Translating from Three-Address Code n No more support for structured control-flow q n Every three-address instruction is translated into one or more target machine instructions q n Function calls => explicit memory management and goto jumps The original evaluation order is maintained Memory management q Every variable must have a location to store its value n q Register, stack, heap, static storage Memory allocation convention n Scalar/atomic values and addresses => registers, stacks Arrays => heap Global variables => static storage 6 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Assigning Storage Locations n Compilers must choose storage locations for all values q Procedure-local

Assigning Storage Locations n Compilers must choose storage locations for all values q Procedure-local storage n q Local variables not preserved across procedural calls Procedure-static storage n Local variables preserved across procedural calls Global storage - global variables q Run-time heap - dynamically allocated storage Registers - temporary storage for applying operations to values q Unambiguous values can be assigned to registers with no backup storage q n 7 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Function Call and Return n At each function call q q Allocate an new

Function Call and Return n At each function call q q Allocate an new AR on stack Save return address in new AR Set parameter values and return results Go to caller’s code n n Return result Control link Access link Save SP and other regs; set AL if necessary q q Restore SP and regs Go to return address in caller’s AR Pop caller’s AR off stack Register save area Local variables Return address sp Different languages may implement this differently 8 Return address parameters At each function return q n p 1 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan parameters Return result Control link Access link Register save area Local variables

Translating Function Calls n n Use a register SP to store address of activation

Translating Function Calls n n Use a register SP to store address of activation record on top of stack q SP, AL and other registers saved/restored by caller Use C(Rs) address mode to access parameters and local variables /* code for s */ Action 1 Param 5 Call q, 1 Action 2 Halt …… /* code for q */ Action 3 return LD stack. Start =>SP /* initialize stack*/ …… 108: ACTION 1 128: Add SP, ssize=>SP /*now call sequence*/ 136: ST 160 =>*SP /*push return addr*/ 144: ST 5 => 2(SP) /* push param 1*/ 152: BR 300 /* call q */ 160: SUB SP, ssize =>SP /*restore SP*/ 168: ACTION 2 190: HALT …… /* code for q*/ 300: save SP, AL and other regs ACTION 3 restore SP, AL and other regs 400: BR *0(SP) /* return to caller*/ 9 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Translating Variable Assignment n Keep track of locations for variables in symbol table q

Translating Variable Assignment n Keep track of locations for variables in symbol table q q n The current value of a variable may reside in a register, a stack memory location, a static memory location, or a set of these Use symbol table to store locations of variables Allocation of variables to registers q q Assume infinite number of pseudo registers Relocate pseudo registers afterwards Statements Generated code Register descriptor Address descriptor t : = a - b LD a => r 0 LD b => r 1 SUB r 0, r 1=>r 0 contains t r 1 contains b t in r 0 b in r 1 u : = t + c LD c => r 2 ADD r 0, r 2=>r 0 contains u r 1 contains b r 2 contains c u in r 0 b in r 1 c in r 2 10 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Translating Arrays n Arrays are allocated in heap Statement i in register ‘ri’ i

Translating Arrays n Arrays are allocated in heap Statement i in register ‘ri’ i in memory ‘Mi’ i in stack a : = b[i] Mult ri, elsize=>r 1 LD b(r 1)=>ra LD Mi => ri Mult Ri, elsize=>r 1 LD b(r 1) =>ra LD i(SP) => ri Mult ri, elsize=>r 1 LD b(r 1) =>ra a[i] : = b Mult ri, elsize=>r 1 ST rb => a(r 1) LD Mi => ri Mult Ri, elsize=>r 1 ST rb => a(r 1) LD i(SP) => ri Mult ri, elsize=>r 1 ST rb => a(r 1) 11 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Translating Conditional Statements n Condition determined after ADD or SUB If x < y

Translating Conditional Statements n Condition determined after ADD or SUB If x < y goto z SUB rx, ry =>rt BLTZ z X : = y + z if (x < 0) goto L ADD ry, rz => rx BLTZ L 12 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Peephole Optimization n Use a simple scheme to match IR to machine code q

Peephole Optimization n Use a simple scheme to match IR to machine code q Efficiently discover local improvements by examining short sequences of adjacent operations Store. AI r 1 => SP, 8 load. AI SP, 8 => r 15 Store. AI r 1 => SP, 8 r 2 r r 1 => r 15 add. I r 2, 0 => r 7 Mult r 4, r 7 => r 10 Mult r 4, r 2 => r 10 jump. I -> L 10: jump. I -> L 11 13 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Efficiency of Peephole Optimization n Design issues q Dead values n n q Control

Efficiency of Peephole Optimization n Design issues q Dead values n n q Control flow operations n q n Adjacent operations may be irrelevant Sliding window includes ops that define or use common values RISC vs. CISC architectures q n Complicates simplifier: Clear window vs. special-case handling Physical vs. logical windows n n May intervene with valid simplification Need to be recognized expansion process RISC architectures makes instruction selection easier Additional issues q q Automatic tools to generate large pattern libraries for different architectures Front ends that generate LLIR make compilers more portable 14 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Register Allocation n Problem q n Allocation of variables (pseudo-registers) to hardware registers in

Register Allocation n Problem q n Allocation of variables (pseudo-registers) to hardware registers in a procedure Features q The most important optimization n q Useful for other optimizations n n Directly reduces running time (memory access => register access) E. g. CSE assumes old values are kept in registers Goals q q Find an allocation for all pseudo-registers, if possible. If there are not enough registers in the machine, choose registers to spill to memory 15 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

An Abstraction for Allocation and Assignment n n Two pseudo-registers interfere if at some

An Abstraction for Allocation and Assignment n n Two pseudo-registers interfere if at some point in the program they cannot both occupy the same register. Interference graph: an undirected graph, where q q n Nodes = pseudo-registers There is an edge between two nodes if their corresponding pseudo-registers interfere What is not represented q q Extent of the interference between uses of different variables Where in the program is the interference 16 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Register Allocation and Coloring n n n A graph is n-colorable if: q Every

Register Allocation and Coloring n n n A graph is n-colorable if: q Every node in the graph can be colored with one of the n colors such that two adjacent nodes do not have the same color. Assigning n register (without spilling) = Coloring with n colors q Assign a node to a register (color) such that no two adjacent nodes are assigned same registers(colors) Spilling is necessary = The graph is n-colorable 17 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Algorithm n Step 1: Build an interference graph q q n Refining notion of

Algorithm n Step 1: Build an interference graph q q n Refining notion of a node Finding the edges Step 2. Coloring q Use heuristics to try to find an n-coloring n Success: q n Colorable and we have an assignment Failure: q q Graph not colorable Graph is colorable, but it is too expensive to color 18 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Live Ranges and Merged Live Ranges n Motivation: to create an interference graph that

Live Ranges and Merged Live Ranges n Motivation: to create an interference graph that is easier to color q q n n Eliminate interference in a variable’s “dead” zones Increase flexibility in allocation n Can allocate same variable to different registers A live range consists of a definition and all the points in a program (e. g. end of an instruction) in which that definition is live. Two overlapping live ranges for the same variable must be merged a=… …=a 19 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Example Live Variables Reaching Definitions {A} {A, B} {D} {A 1, B 1} {A

Example Live Variables Reaching Definitions {A} {A, B} {D} {A 1, B 1} {A 1, B 1, D 2} A =. . . (A 1) IF A goto L 1 B =. . . (B 1) =A D = B (D 2) A = 2 (A 2) {A, D} {A 2, B 1, C 1, D 2} 20 {} {A} {} {A 1} L 1: C = … (C 1) =A D = … (D 1) {D} {A, D} =A Ret D Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan {A} {A, C} {D} {A 1, C 1} {A 1, C 1, D 1} {A 1, B 1, C 1, D 2} {A 2, B 1, C 1, D 2} Merge

Merging Live Ranges n Merging definitions into equivalence classes q q Start by putting

Merging Live Ranges n Merging definitions into equivalence classes q q Start by putting each definition in a different equivalence class For each point in a program: n If (i) variable is live, and (ii) there are multiple reaching definitions for the variable, then: q n Merge the equivalence classes of all such definitions into one equivalence class From now on, refer to merged live ranges simply as live ranges q Merged live ranges are also known as “webs” 21 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Edges of Interference Graph n n Two live ranges (necessarily of different variables) may

Edges of Interference Graph n n Two live ranges (necessarily of different variables) may interfere if they overlap at some point in the program. Algorithm q n At each point in the program enter an edge for every pair of live ranges at that point. An optimized definition & algorithm for edges: q Algorithm: n q q Check for interference only at the start of each live range Faster Better quality 22 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Coloring n n Coloring for n > 2 is NP-complete Observations: q n A

Coloring n n Coloring for n > 2 is NP-complete Observations: q n A node with degree < n can always color it successfully, given its neighbors’ colors Coloring Algorithm q q Iterate until stuck or done n Pick any node with degree < n n Remove the node and its edges from the graph If done (no nodes left) reverse process and add colors Example ( n=3 ) B E A C Note: degree of a node may drop in iteration D 23 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

When Coloring Fails n Using heuristics to improve its chance of success and to

When Coloring Fails n Using heuristics to improve its chance of success and to spill code Build interference graph Iterative until there are no nodes left If there exists a node v with less than n neighbor place v on stack to register allocate else v = node chosen by heuristics (least frequently executed, has many neighbors) place v on stack to register allocate (mark as spilled) remove v and its edges from graph While stack is not empty Remove v from stack Reinsert v and its edges into the graph Assign v a color that differs from all its neighbors (guaranteed to be possible for nodes not marked as spilled) 24 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Register Allocation: Summary n n Problem: q Find an assignment for all pseudo-registers, whenever

Register Allocation: Summary n n Problem: q Find an assignment for all pseudo-registers, whenever possible. Solution: q Abstract on Abstraction: an interference graph n n q Register Allocation and Assignment problems n q Nodes: live ranges Edges: presence of live range at time of definition Equivalent to n-colorability of interference graph Heuristics to find an assignment for n colors n n Successful: colorable, and finds assignment Not successful: colorability unknown and no assignment 25 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Instruction Scheduling: The Goal n n Assume that the remaining instructions are all essential:

Instruction Scheduling: The Goal n n Assume that the remaining instructions are all essential: otherwise, earlier passes would have eliminated them The way to perform fixed amount of work in less time: execute the instructions in parallel Time a = 1 + x; B = 2 + y; c = z + 3; 26 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan c = z + 3;

Hardware Support for Parallel Execution n Three forms of parallelism are found in modern

Hardware Support for Parallel Execution n Three forms of parallelism are found in modern machines: q Pipelining q Superscalar Processing q Multiprocessing 27 Instruction Scheduling Automatic Parallelization Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Pipelining n Basic idea: q n Break instruction into stages that can be overlapped

Pipelining n Basic idea: q n Break instruction into stages that can be overlapped Example: q Simple 5 -stage pipeline from early RISC machines Instruction IF RF EX ME WB Time 28 IF = Instruction Fetch RF = Decode & Register Fetch EX = Execute on ALU ME = Memory Access WB = Write Back to Register File Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Pipelining Illustration Time IF RF EX ME WB IF RF EX ME WB IF

Pipelining Illustration Time IF RF EX ME WB IF RF EX ME WB IF RF EX ME WB In a given cycle, each instruction is in a different stage 29 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Beyond Pipelining: “Superscalar” Processing Basic idea: Abstract Representation 30 Hardware for Scalar pipeline 1

Beyond Pipelining: “Superscalar” Processing Basic idea: Abstract Representation 30 Hardware for Scalar pipeline 1 ALU Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan r 3 + r 4 Pipe Register r 1 + r 2 Pipe Register EX Pipe Register q Pipe Register Multiple (independent) instructions can proceed simultaneously through the same pipeline stages Requires additional hardware q Pipe Register n r 1 + r 2 Hardware for 2 way superscalar 2 ALUs

Superscalar Pipeline Illustration IF RF EX ME WB n Original (scalar) pipeline: q IF

Superscalar Pipeline Illustration IF RF EX ME WB n Original (scalar) pipeline: q IF RF EX ME WB n IF RF EX ME WB Superscalar pipeline: q IF RF EX ME WB Only one instruction in a given pipe stage at a given time Multiple instructions in the same pipe stage at the same time IF RF EX ME WB Time IF RF EX ME WB 31 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Limitations upon Scheduling n Hardware Resources q q Processors have finite resources, and there

Limitations upon Scheduling n Hardware Resources q q Processors have finite resources, and there are often constraints on how these resources can be used. Examples: n n Data Dependences q n Finite issue width Limited functional units (FUs) per given instruction type Limited pipelining within a given functional unit (FU) While reading or writing a data location too early, the program may behave incorrectly. Control Dependences q q Impractical to schedule for all possible paths Choosing an expected path may be difficult 32 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan

Predictable Success 33 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved.

Predictable Success 33 Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 4 Developed By: Vazgen Melikyan