Case Studies of Compilers and Future Trends Chapter

Goals • Learn on exiting compilers – Which of the studied subjects implemented –

Compilers Studied • SUN compilers for SPARC 8, 9 • IBM XL compilers for

int length, width, radius; enum figure {RECTANGLE, CIRCLE} ; main() { int area=0, volume=0;

integer a(500, 500), k, l; do 20 k=1, 500 do 20 l=1, 500 a(k,

The SPARC Architecture • Sparc 8 – 32 bit RISC superscalar system with pipeline

The Sun SPARC Compilers • • Front-ends for C, C++, Fortran 77, Pascal Originated

The Sun SPARC Compiler Front-End Sun IR Automatic inliner aliaser iropt yabe (global optimizer)

$ENTRY “s 1_” {IS_EXT_ENTRY, ENTRY_IS_GLOBAL} goto LAB_32: LTEMP. 1 = (. n { ACCESS$

SUNOptimization Levels • O 1 Limited optimizations • O 2 – Optimize expressions not

iropt • • • Processes each procedure separately Use basic blocks Control flow analysis

Optimizations in iropt • Scalar replacement of aggregates and expansion of Fortran arithmetic on

Dependence Based Analysis • • • • Constant propagation dead-code elimination structural control flow

Code Generator • • • • First translate Sun-IR to asm+ instruction selection inline

Optimizations on main • Removal of unreachable code in else (except ) • Move

Missed optimizations on main • Removal of computations • Compute area in one instruction

Optimizations on Fortran example • Procedure integration of s 1 (n=500) • Common subexpression

Optimizations missed Fortran example • Eliminating s 1 • Eliminating addition in loop via

POWER/Power. PC Architecture • Power – 32 bit RISC superscalar system with – branch,

The IBM XL Compilers • • • Front-ends for PL. 8, C, C++, Fortran

The IBM/XIL Compiler Translator Optimizer XIL Instruction scheduler Register allocation Instruction scheduler Instruction Selection

TOBEY • Processes each procedure separately • Use basic blocks • Control flow analysis

Optimizations in TOBEY • • • • Switches Compare|Table branch Mapping local variables to

Final Assembly • Two passes on XIL – Peephole optimizations – Generate relocatable immage

Optimizations on main • • • Removal of unreachable code in else Move of

Missed optimizations on main • Identifying tail call • Compute area in one instruction

Optimizations on Fortran example • n=500 • Common subexpression elimination of “a[k, j]” •

Optimizations missed Fortran example • Procedure integration of s 1 • Eliminating addition in

Intel 386 Architecture • • 32 bit CISC system 8 32 bit integer registers

The Intel Compilers • Front-ends for C, C++, Fortran 77, Fortran 90 • Front-End

The Intel Compiler Front-End Interprocedural Optimizer IL-1+IL-2 Memory optimizer IL-1+IL-2 Global optimizer IL-1+IL-2 Code

Interprocedural Optimizer • Cross module • Saves intermediate representations • Interprocedural constant propagation

Memory Optimizer • • Improves memory and caches loop transformations Uses SSA form Data

Global Optimizer • • Constant propagation dead code elimination local common subexpression elimination copy

Optimizations on main • • Removal of unreachable code in else Move of loop

Missed optimizations on main • Compute area in one instruction • Identifying tail call

Optimizations on Fortran example • Inlinining s 1 n=500 • Common subexpression elimination of

Optimizations missed Fortran example • Eliminating s 1 • Loop unrolling in the inlined

opt Sun IBM Intel (almost) strength reduction loop-unrolling 4 2 5 register allocation stack-frame

opt Sun IBM Intel CSE a(k, j) integrate s 1 loop-unrolling 4 2 instructions

Future Trends in Compiler Design and Implementation • SSA is being used more and

Other Trends • More and more work will be shifted from hardware to compilers

Slides: 43

Download presentation

Case Studies of Compilers and Future Trends Chapter 21 Mooly Sagiv

Goals • Learn on exiting compilers – Which of the studied subjects implemented – Mention techniques not studied • • Future trends (textbook) Other trends Techniques used in optimizing compilers Have some fun

Compilers Studied • SUN compilers for SPARC 8, 9 • IBM XL compilers for Power and Poewer. PC architectures • Digital compiler for Alpha • Intel reference compiler for 386 • Comparison criteria – Duration and history – Structure – Optimizations performed on two programs

int length, width, radius; enum figure {RECTANGLE, CIRCLE} ; main() { int area=0, volume=0; height; enum figure kind=RECTANGLE; for (height=0; height < 10; height++) {if (kind == RECTANGLE) { area += length * width; volume += length * width * height; } else if (kind==CIRCLE){ area += 3. 14 * radius ; volume += 3. 14 * radius * height; } } process(area, volume); }

integer a(500, 500), k, l; do 20 k=1, 500 do 20 l=1, 500 a(k, l)= k+l 20 continue call s 1(a, 500) end subroutine s 1(a, n) integer a(500, 500), n do 100 i = 1 1, n do 100 j = i + 1, n do 100 k = 1, n l = a(k, i) m = a(k, j) = l + m 100 continue end

The SPARC Architecture • Sparc 8 – 32 bit RISC superscalar system with pipeline – integer and floating point units – 8 general purpose integer registers (r 0 0) – load, store, arithmetic, shift, branch, call and system control – addresses (register+register, register+displ. ) – Three address instructions – Several 24 register windows (spilling by OS) • Sparc 9 – 64 bit architecture (upward compatible)

The Sun SPARC Compilers • • Front-ends for C, C++, Fortran 77, Pascal Originated from Berkeley 4. 2 BSD Unix Developed at Sun since 1982 Original backend for Motorola 68010 Migrated to M 6800 and then to SPARC Global optimization developed at 1984 Interprocedural optimization began at 1984 Mixed compiler model

The Sun SPARC Compiler Front-End Sun IR Automatic inliner aliaser iropt yabe (global optimizer) Sun IR Relocatable Code generator Relocatable

$ENTRY “s 1_” {IS_EXT_ENTRY, ENTRY_IS_GLOBAL} goto LAB_32: LTEMP. 1 = (. n { ACCESS$

ENTRY “s 1_” {IS_EXT_ENTRY, ENTRY_IS_GLOBAL} goto LAB_32: LTEMP. 1 = (. n { ACCESS V 41} ); i=1 CBRANCH (i <= LTEMP. 1, 1: LAB_36, 0: LAB_35); LAB_36: LTEMP. 2 = (. n { ACCESS V 41} ); j=i+1 CBRANCH (j <= LTEMP. 2, 1: LAB_41, 0: LAB_40); LAB_41: LTEMP. 3 = (. n { ACCESS V 41} ); k=1 CBRANCH (k <= LTEMP. 3, 1: LAB_46, 0: LAB_45); LAB_46: l = (. a[k, i] ACCESS V 20} ); m = (. a[k, j] ACCESS V 20}); *(a[k, j] = l+m {ACCESS V 20, INT}); LAB_34: k = k+1; CBRANCH(k>LEMP. 3, 1: LAB_45, 0: LAB_46); LAB_45: j=j+1 … LAB_35:

SUNOptimization Levels • O 1 Limited optimizations • O 2 – Optimize expressions not involving global, aliased local, and volatile variables • O 3 Worst case assumptions on pointer aliases – – Automatic inlining software pipelining loop unrolling instruction scheduling • O 4 Front-end provides aliases

iropt • • • Processes each procedure separately Use basic blocks Control flow analysis using dominators Parallelizer uses structural analysis Other optimizations using iterative algorithms • Optimizations translate Sun-IR

Optimizations in iropt • Scalar replacement of aggregates and expansion of Fortran arithmetic on complex numbers • dependence-based analysis and transformations (O 3, O 4) • linearization of array addresses • algebraic simplification and reassociation of address expressions • loop invariant code motion • strength reduction and induction variable removal • global common-subexpression elimination • dead-code elimination

Dependence Based Analysis • • • • Constant propagation dead-code elimination structural control flow analysis loop discovery (index variables, lower and upper bounds) segregation of loops that have calls and early exists dependence analysis using GCD loop distribution (split loops 20) loop interchange loop fusion scalar replacement of array elements recognition of reductions data-cache tiling profitability analysis for parallel code generation

Code Generator • • • • First translate Sun-IR to asm+ instruction selection inline of assembly language templates local optimizations (dead-code elimination, branch chaining, …) macro expansion data-flow analysis of live variables early instruction selection register allocation by graph coloring stack frame layout macro expansion (MAX, MIN, MOV) late instruction scheduling inline of assembly language constructs macro expansion emission of relocatable code

Optimizations on main • Removal of unreachable code in else (except ) • Move of loop invariant “length*width” • Strength reduction of “height” • Loop unrolling by factor of four • Local variables in registers • All computations in registers • Identifying tail call • Stack frame eliminated

Missed optimizations on main • Removal of computations • Compute area in one instruction • Completely unroll the loop

Optimizations on Fortran example • Procedure integration of s 1 (n=500) • Common subexpression elimination of “a[k, j]” • Loop unrolling • Local variables in registers • Software pipelining

Optimizations missed Fortran example • Eliminating s 1 • Eliminating addition in loop via linear function test replacement

POWER/Power. PC Architecture • Power – 32 bit RISC superscalar system with – branch, integer and floating point units – optional multiprocessors (one branch) – 32 (shared) general purpose integer registers (gr 0 0) – load, store, arithmetic, shift, branch, call and system control – addresses (register+register, register+displ. ) – Three address instructions • Power. PC – Both 32 and 64 bit architecture

The IBM XL Compilers • • • Front-ends for PL. 8, C, C++, Fortran 77, Pascal Originated in 1983 Written in PL. 8 First released for PC/RT Generates code for Power, Intel 386, SPARC and Power. PC • No interprocedural optimizations • (Almost) all optimizations on low level IR (XIR)

The IBM/XIL Compiler Translator Optimizer XIL Instruction scheduler Register allocation Instruction scheduler Instruction Selection XIL Final assembly Relocatable

TOBEY • Processes each procedure separately • Use basic blocks • Control flow analysis in using DFS and intervals • YIL a higher level representation – loops – SSA form • Data-flow analysis by interval analysis • Iterative algorithm for non reducible

Optimizations in TOBEY • • • • Switches Compare|Table branch Mapping local variables to register+offset Inline for current module Aggressive value numbering global common subexpression elimination loop-invariant code motion downward store motion dead-store motion reassociation, strength reduction global constant propagation architecture specific optimizations (MAX) value numbering global common subexpression elimination dead code elimination of dead induction variables

Final Assembly • Two passes on XIL – Peephole optimizations – Generate relocatable immage

Optimizations on main • • • Removal of unreachable code in else Move of loop invariant “length*width” Strength reduction of “height” Loop unrolling by factor of two Local variables in registers All computations in registers

Missed optimizations on main • Identifying tail call • Compute area in one instruction

Optimizations on Fortran example • n=500 • Common subexpression elimination of “a[k, j]” • Loop unrolling 9 • Local variables in registers • Software pipelining

Optimizations missed Fortran example • Procedure integration of s 1 • Eliminating addition in loop via linear function test replacement

Intel 386 Architecture • • 32 bit CISC system 8 32 bit integer registers Support 16 and 8 bit registers Dedicated registers (e. g. , stack frame) Many address modes Two address instructions 80 bits floating point

The Intel Compilers • Front-ends for C, C++, Fortran 77, Fortran 90 • Front-End from Multiflow and Edison Design Group (EDG) • Generates 386 code • Interprocedural optimization were added (1991) • Mixed optimization mode • Many optimizations based on partial redundency elimination

The Intel Compiler Front-End Interprocedural Optimizer IL-1+IL-2 Memory optimizer IL-1+IL-2 Global optimizer IL-1+IL-2 Code selector Register allocation Instruction scheduler Relocatable

Interprocedural Optimizer • Cross module • Saves intermediate representations • Interprocedural constant propagation

Memory Optimizer • • Improves memory and caches loop transformations Uses SSA form Data dependence

Global Optimizer • • Constant propagation dead code elimination local common subexpression elimination copy propagation partial redundency elimination copy propagation dead code elimination

Optimizations on main • • Removal of unreachable code in else Move of loop invariant “length*width” Strength reduction of “height” Local variables in registers

Missed optimizations on main • Compute area in one instruction • Identifying tail call • Loop unrolling

Optimizations on Fortran example • Inlinining s 1 n=500 • Common subexpression elimination of “a[k, j]” • Local variables in registers • Linear function test replacement

Optimizations missed Fortran example • Eliminating s 1 • Loop unrolling in the inlined loop

opt Sun IBM Intel (almost) strength reduction loop-unrolling 4 2 5 register allocation stack-frame eliminated tail-call constant kind dead-code loop invariant

opt Sun IBM Intel CSE a(k, j) integrate s 1 loop-unrolling 4 2 instructions (inner-loop) 21 9 4 linear function test replace software pipelined register allocation

Future Trends in Compiler Design and Implementation • SSA is being used more and more: – generalizes basic block optimizations to extended basic blocks – leads to performance improvements • Partial redundency elimination is being used more • Partial redundency and SSA are being combined • Paralizations and vectorization are being integrated into production compilers • Data-dependence testing, data-cache optimization and software pipeline will advance significantly • The most active research area in scalar compilation will be optimization

Other Trends • More and more work will be shifted from hardware to compilers • More advanced hardware will be available • Higher order programming languages will be used – Memory management will be simpler – Modularity facilities – Assembly programming will hardly be used • Dynamic (runtime) compilation will become more significant

Theoretical Techniques in Compilers