CMPUT 229 Fall 2003 Topic G IA64 Highlights

  • Slides: 45
Download presentation
CMPUT 229 - Fall 2003 Topic G: IA-64 Highlights José Nelson Amaral http: //www.

CMPUT 229 - Fall 2003 Topic G: IA-64 Highlights José Nelson Amaral http: //www. cs. ualberta. ca/~amaral/courses/680 CMPUT 680 - Compiler Design and Optimization 1

Some Highlights of the EPIC Architecture z. Control Speculation z. Data Speculation z. Predication

Some Highlights of the EPIC Architecture z. Control Speculation z. Data Speculation z. Predication z. Rotating Registers z. Hardware-Supported Software Pipelining CMPUT 680 - Compiler Design and Optimization 2

Control Speculation ld 8 r 3=[r 5] br. cond. dptk L 1 ld 8

Control Speculation ld 8 r 3=[r 5] br. cond. dptk L 1 ld 8 r 3=[r 5] shr r 7=r 3, r 87 chks r 3=recovery shr r 7=r 3, r 87 Before Control Speculation After Control Speculation CMPUT 680 - Compiler Design and Optimization 3

Data Speculation An advanced load allows a load to be moved above a store

Data Speculation An advanced load allows a load to be moved above a store even if it is not known wether the load and the store may reference overlapping memory locations. st 8 ld 8 shr [r 55]=r 45 r 3=[r 5] ; ; r 7=r 3, r 87 // r 55 may or may not contain // the same address as r 5 ld 8. a r 3=[r 5] ; ; // Advanced Load // other, unrelated instructions st 8 [r 55]=r 45 ld 8. c r 3=[r 5] ; ; shr r 7=r 3, r 87 CMPUT 680 - Compiler Design and Optimization 4

Moving Up Loads + Uses: Recovery Code Original Code Speculative Code st 8 ld

Moving Up Loads + Uses: Recovery Code Original Code Speculative Code st 8 ld 8 add st 8 [r 4] = r 12 r 6 = [r 8] ; ; r 5 = r 6, r 7 [r 18] = r 5 // cycle 0: ambiguous store // cycle 0: load to advance // cycle 2 // cycle 3 ld 8. a add st 8 chk. a back: st 8 r 6 = [r 8] ; ; r 5 = r 6, r 7 [r 4]=r 12 r 6, recover // cycle -3 // cycle -1; add that uses r 6 // cycle 0: check // Return point from jump to recover [r 18] = r 5 // cycle 0 recover: ld 8 r 6 = [r 8] ; ; add r 5 = r 6, r 7 br back // Reload r 6 from [r 8] // Re-execute the add // Jump back to main code CMPUT 680 - Compiler Design and Optimization 5

If-conversion uses predicates to transform a conditional code into a single control stream code.

If-conversion uses predicates to transform a conditional code into a single control stream code. if(r 4) { add r 1= r 2, r 3 ld 8 r 6=[r 5] } if(r 1) r 2 = r 3 + r 3 else r 7 = r 6 - r 5 cmp. ne (p 1) add (p 1) ld 8 p 1, p 0=r 4, 0 ; ; // Set predicate reg r 1=r 2, r 3 r 6=[r 5] cmp. ne (p 1) add (p 2) sub p 1, p 2 = r 1, 0 ; ; // Set predicate reg r 2 = r 3, r 4 r 7 = r 6, r 5 CMPUT 680 - Compiler Design and Optimization 6

In the old days…. for(k=1 ; k<=5 ; k++) y[k] = x[k]+1; MIPS Assembly:

In the old days…. for(k=1 ; k<=5 ; k++) y[k] = x[k]+1; MIPS Assembly: # $ao = x[] # $a 1 = y[] # $t 0 = k addi Loop: sll add lw addi add sw addi ble CMPUT 680 - Compiler Design and Optimization $t 0, $zero, 1 $t 1, $zero, 5 $t 2, $t 0, 2 $t 3, $a 0, $t 2 $t 4, 0($t 3) $t 4, 1 $t 5, $a 1, $t 2 $t 4, 0($t 5) $t 0, 1 $t 0, $t 1, Loop 7

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 32 33 34 35 36 37 38 39 General Registers (Logical) Predicate Registers 1 0 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 4 3 RRB 0 CMPUT 680 - Compiler Design and Optimization 8

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 32 33 34 35 36 37 38 39 General Registers (Logical) Predicate Registers 1 0 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 4 3 RRB 0 CMPUT 680 - Compiler Design and Optimization 9

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 32 33 34 35 36 37 38 39 General Registers (Logical) Predicate Registers 1 0 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 4 3 RRB 0 CMPUT 680 - Compiler Design and Optimization 10

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 4 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 11

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 3 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 12

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 3 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 13

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 y 1 x 2 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 3 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 14

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 y 1 x 2 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 3 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 15

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 y 1 x 2 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 3 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 16

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 y 1 x 2 34 35 36 37 38 39 32 33 General Registers (Logical) Predicate Registers 1 1 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 2 3 RRB -2 CMPUT 680 - Compiler Design and Optimization 17

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 y 1 x 3 x 2 34 35 36 37 38 39 32 33 General Registers (Logical) Predicate Registers 1 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 2 3 RRB -2 CMPUT 680 - Compiler Design and Optimization 18

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory y 2 y 1 x 3 x 2 34 35 36 37 38 39 32 33 General Registers (Logical) Predicate Registers 1 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 2 3 RRB -2 CMPUT 680 - Compiler Design and Optimization 19

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory y 2 y 1 x 3 x 2 34 35 36 37 38 39 32 33 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 1 1 16 17 18 y 1 LC EC 2 3 RRB -2 CMPUT 680 - Compiler Design and Optimization 20

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory y 2 y 1 x 3 x 2 34 35 36 37 38 39 32 33 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 1 1 16 17 18 y 1 LC EC 2 3 RRB -2 CMPUT 680 - Compiler Design and Optimization 21

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 y 2 y 1 x 3 x 2 35 36 37 38 39 32 33 34 General Registers (Logical) Predicate Registers 1 1 16 17 18 y 1 LC EC 1 3 RRB -3 CMPUT 680 - Compiler Design and Optimization 22

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 4 x 3 y 2 y 1 35 36 37 38 39 32 33 34 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 x 2 1 1 16 17 18 y 1 LC EC 1 3 RRB -3 CMPUT 680 - Compiler Design and Optimization 23

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 4 x 3 y 2 y 1 35 36 37 38 39 32 33 34 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 LC EC 1 3 RRB -3 CMPUT 680 - Compiler Design and Optimization 24

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 4 x 3 y 2 y 1 35 36 37 38 39 32 33 34 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 LC EC 1 3 RRB -3 CMPUT 680 - Compiler Design and Optimization 25

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 4 x 3 y 2 y 1 35 36 37 38 39 32 33 34 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 LC EC 1 3 RRB -3 CMPUT 680 - Compiler Design and Optimization 26

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 x 4 x 3 y 2 y 1 y 3 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 1 1 16 17 18 y 1 y 2 LC EC 0 3 RRB -4 CMPUT 680 - Compiler Design and Optimization 27

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 x 4 x 3 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 LC EC 0 3 RRB -4 CMPUT 680 - Compiler Design and Optimization 28

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 x 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 LC EC 0 3 RRB -4 CMPUT 680 - Compiler Design and Optimization 29

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 x 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 3 RRB -4 CMPUT 680 - Compiler Design and Optimization 30

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 x 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 3 RRB -4 CMPUT 680 - Compiler Design and Optimization 31

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 x 4 y 2 y 1 y 3 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 0 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 2 RRB -5 CMPUT 680 - Compiler Design and Optimization 32

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 x 4 y 2 y 1 y 3 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 0 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 2 RRB -5 CMPUT 680 - Compiler Design and Optimization 33

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 x 4 y 2 y 1 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 2 RRB -5 CMPUT 680 - Compiler Design and Optimization 34

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 2 RRB -5 CMPUT 680 - Compiler Design and Optimization 35

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 y 4 CMPUT 680 - Compiler Design and Optimization LC EC 0 2 RRB -5 36

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 y 4 CMPUT 680 - Compiler Design and Optimization LC EC 0 2 RRB -5 37

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 y 5 y 4 y 2 y 1 y 3 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 0 0 1 16 17 18 y 1 y 2 y 3 y 4 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 38

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 0 1 16 17 18 y 1 y 2 y 3 y 4 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 39

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 0 1 16 17 18 y 1 y 2 y 3 y 4 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 40

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 0 1 16 17 18 y 1 y 2 y 3 y 4 y 5 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 41

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 0 1 16 17 18 y 1 y 2 y 3 y 4 y 5 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 42

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 0 1 16 17 18 y 1 y 2 y 3 y 4 y 5 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 43

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 y 5 y 4 y 2 y 1 y 3 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 0 16 17 18 y 1 y 2 y 3 y 4 y 5 CMPUT 680 - Compiler Design and Optimization LC EC 0 0 RRB -7 44

The Software Pipelining Branch Instruction LC? LC = Loop Counter EC = Epilog Counter

The Software Pipelining Branch Instruction LC? LC = Loop Counter EC = Epilog Counter RRB = Rotating Register Base PR = Predicate Register = 0 (epilog) >1 EC? (prolog/kernel) 0 =0 =1 LC-EC-- EC PR[16]=1 PR[16]=0 RRB-- branch CMPUT 680 - Compiler Design and Optimization fall-thru 45