VLIW Compilation Techniques in a Superscalar Environment Kemal

Why do we need a special compiler when we have “Super Beast” superscalar processors

VLIW Scheduling Techniques n n n n Speculative Load/Store Motion out of Loops Unspeculation

Speculative Load/Store Motion out of Loops n Loads and Stores can be moved if:

2. The base register of each group is not written to in the loop.

Original Code: L 1: Ld r 4, a(r 2) …. Ld r 12, a(r

Unspeculation n n Instructions moved above conditional branches to improve performance can lower performance

n To perform unspeculation on an instruction (or group of), conditions must be met:

Scheduling n n Loop Unrolling Renaming Global Scheduling Software Pipelining

Limited Combining n Similar to value numbering, but spans multiple blocks. 1. 2. 3.

n n n If the search succeeds, the entire sequence of instructions, from the

Original Code Mv r 5, r 4 …. Br L 3 …. L 3:

Basic Block Expansion n Main goal is to eliminate unconditional jumps at the end

Original Code Transformed Code …. Bz r 1, L 1 Op Op Bz r

Prolog Tailoring n n When entering and exiting a procedure, registers must be saved

n Prolog Tailoring Algorithm: 1. Generate a “Must. Kill” set for each node in

Results SPECint 92 Measurements (yeah!) Benchmark Xlc Time Xlc Specmark VLIW Time VLIW Specmark

Slides: 19

Download presentation

VLIW Compilation Techniques in a Superscalar Environment Kemal Ebcioglu, Randy D. Groves, Ki. Chang Kim, Gabriel M. Silberman and Isaac Ziv PLDI 1994. Presented by Jason Horihan

Why do we need a special compiler when we have “Super Beast” superscalar processors that extract ILP for us? n n Processor hardware can only look ahead a small distance to extract ILP Branch Prediction is not perfect and can only take us so far.

VLIW Scheduling Techniques n n n n Speculative Load/Store Motion out of Loops Unspeculation Scheduling Limited Combining Basic Block Expansion Prolog Tailoring All of these are implemented at the code generation stage of the compiler.

Speculative Load/Store Motion out of Loops n Loads and Stores can be moved if: 1. Within each group of loads and stores: - Each instruction uses the same base register - Each instruction has the same displacement from this base - Each instruction operates on identical operand data length and type

2. The base register of each group is not written to in the loop. 3. There is no overlap with the group operands and any other memory reference in the loop 4. On every path to the entrance of the loop, a load of an address constant to the base register -or- a load or store to the same location to insure “safe” operation

Original Code: L 1: Ld r 4, a(r 2) …. Ld r 12, a(r 2) Ai r 12, 6 St r 12, a(r 2) …. . Br L 1 Transformed Code: Ld r 4, a(r 2) …. Ld r 10, a(r 2) L 1: Mv r 12, r 10 Ai r 12, 6 Mv r 10, r 10 …. . Br L 1 St r 10, a(r 2)

Unspeculation n n Instructions moved above conditional branches to improve performance can lower performance when execution goes down the path where the speculative instructions were not needed. Moving some of these speculative instructions down into one of the paths can increase performance

n To perform unspeculation on an instruction (or group of), conditions must be met: 1. The destination register(s) of the speculative group on one of the paths must ALL be dead. 2. Any instructions between the speculative instruction and the conditional branch must not define or use any of the registers used in the speculative instructions. 3. Instructions cannot have side-effects

Scheduling n n Loop Unrolling Renaming Global Scheduling Software Pipelining

Limited Combining n Similar to value numbering, but spans multiple blocks. 1. 2. 3. Starts with a load immediate or a move register Searches sequence of following instructions, following non-conditional jumps, until a last use is found. Source or destination registers of starting instruction can not be set in the sequence

n n n If the search succeeds, the entire sequence of instructions, from the instruction after the starting instruction to the last use instruction is inserted in place of the starting instruction. Occurrences of the destination register from the starting instructions are replaced with its source register. A branch from the “new” last use instruction is inserted to jump to the instruction after the “old” last use instruction.

Original Code Mv r 5, r 4 …. Br L 3 …. L 3: Ld r 3, 4(r 5) …. Br L 4 …. L 4: Ld r 7, 8(r 5) Transformed Code …. Ld r 3, 4(r 4) …. Ld r 7, 8(r 4) Br L 10 L 3: Ld r 3, 4(r 5) …. Br L 4 …. L 4: Ld r 7, 8(r 5) L 10:

Basic Block Expansion n Main goal is to eliminate unconditional jumps at the end of some basic blocks. Begin by copying instructions at the target of the unconditional branch and inserting them before the unconditional branch. When enough consecutive non-branch instructions have been gathered, the copy stops.

Original Code Transformed Code …. Bz r 1, L 1 Op Op Bz r 3, Lx Op 1 Op 2 Br L 2 …. L 2: Bz r 3, Lx L 3: L 2: Op 1 Bz r 3, Lx Op 2 Op 1 Br L 2 Op 2 L 2 a L 3: Br L 2

Prolog Tailoring n n When entering and exiting a procedure, registers must be saved and restored in the prolog and epilog. Prolog Tailoring delays the saving of the registers until absolutely necessary. This shortens the execution path and only saves what is necessary for a given path Exception handlers must be changed

n Prolog Tailoring Algorithm: 1. Generate a “Must. Kill” set for each node in program graph. 2. If at a given node, a register that hasn’t been savedbefore will definitely be killed, code must be generated to save this register

Proc p 1 save r 1, r 2, r 3, r 4 …. Ld r 2, . . . Ld r 1, … …. restore r 1, r 2 return L 1: ld r 3, . . …. ld r 4, . . ld r 3, … …. <r 4 not used here> restore r 3, r 4 return Proc p 1 …. save r 1, r 2 Ld r 2, . . . Ld r 1, … …. restore r 1, r 2 return L 1: save r 3 ld r 3, . . …. save r 4 ld r 4, . . ld r 3, … restore r 4 …. <r 4 not used here> restore r 3 return

Results SPECint 92 Measurements (yeah!) Benchmark Xlc Time Xlc Specmark VLIW Time VLIW Specmark Espresso 41. 70 54. 44 38. 30 59. 27 Li 99. 00 62. 66 81. 90 75. 82 Eqntott 13. 60 80. 88 10. 70 102. 80 Compress 53. 90 51. 39 48. 10 57. 59 Sc 69. 20 65. 46 62. 40 72. 60 Gcc 91. 40 59. 61 90. 20 60. 53 SPECint 92 61. 73 Measurements done on a RS/6000 model 980 69. 93

Questions? ? ? ? ? ?