CMPUT 229 Fall 2003 Topic G IA64 Highlights


![Control Speculation ld 8 r 3=[r 5] br. cond. dptk L 1 ld 8 Control Speculation ld 8 r 3=[r 5] br. cond. dptk L 1 ld 8](https://slidetodoc.com/presentation_image_h2/232b6fd6f2a0b097002f881ad7e4e813/image-3.jpg)



![In the old days…. for(k=1 ; k<=5 ; k++) y[k] = x[k]+1; MIPS Assembly: In the old days…. for(k=1 ; k<=5 ; k++) y[k] = x[k]+1; MIPS Assembly:](https://slidetodoc.com/presentation_image_h2/232b6fd6f2a0b097002f881ad7e4e813/image-7.jpg)






































- Slides: 45

CMPUT 229 - Fall 2003 Topic G: IA-64 Highlights José Nelson Amaral http: //www. cs. ualberta. ca/~amaral/courses/680 CMPUT 680 - Compiler Design and Optimization 1

Some Highlights of the EPIC Architecture z. Control Speculation z. Data Speculation z. Predication z. Rotating Registers z. Hardware-Supported Software Pipelining CMPUT 680 - Compiler Design and Optimization 2
![Control Speculation ld 8 r 3r 5 br cond dptk L 1 ld 8 Control Speculation ld 8 r 3=[r 5] br. cond. dptk L 1 ld 8](https://slidetodoc.com/presentation_image_h2/232b6fd6f2a0b097002f881ad7e4e813/image-3.jpg)
Control Speculation ld 8 r 3=[r 5] br. cond. dptk L 1 ld 8 r 3=[r 5] shr r 7=r 3, r 87 chks r 3=recovery shr r 7=r 3, r 87 Before Control Speculation After Control Speculation CMPUT 680 - Compiler Design and Optimization 3

Data Speculation An advanced load allows a load to be moved above a store even if it is not known wether the load and the store may reference overlapping memory locations. st 8 ld 8 shr [r 55]=r 45 r 3=[r 5] ; ; r 7=r 3, r 87 // r 55 may or may not contain // the same address as r 5 ld 8. a r 3=[r 5] ; ; // Advanced Load // other, unrelated instructions st 8 [r 55]=r 45 ld 8. c r 3=[r 5] ; ; shr r 7=r 3, r 87 CMPUT 680 - Compiler Design and Optimization 4

Moving Up Loads + Uses: Recovery Code Original Code Speculative Code st 8 ld 8 add st 8 [r 4] = r 12 r 6 = [r 8] ; ; r 5 = r 6, r 7 [r 18] = r 5 // cycle 0: ambiguous store // cycle 0: load to advance // cycle 2 // cycle 3 ld 8. a add st 8 chk. a back: st 8 r 6 = [r 8] ; ; r 5 = r 6, r 7 [r 4]=r 12 r 6, recover // cycle -3 // cycle -1; add that uses r 6 // cycle 0: check // Return point from jump to recover [r 18] = r 5 // cycle 0 recover: ld 8 r 6 = [r 8] ; ; add r 5 = r 6, r 7 br back // Reload r 6 from [r 8] // Re-execute the add // Jump back to main code CMPUT 680 - Compiler Design and Optimization 5

If-conversion uses predicates to transform a conditional code into a single control stream code. if(r 4) { add r 1= r 2, r 3 ld 8 r 6=[r 5] } if(r 1) r 2 = r 3 + r 3 else r 7 = r 6 - r 5 cmp. ne (p 1) add (p 1) ld 8 p 1, p 0=r 4, 0 ; ; // Set predicate reg r 1=r 2, r 3 r 6=[r 5] cmp. ne (p 1) add (p 2) sub p 1, p 2 = r 1, 0 ; ; // Set predicate reg r 2 = r 3, r 4 r 7 = r 6, r 5 CMPUT 680 - Compiler Design and Optimization 6
![In the old days fork1 k5 k yk xk1 MIPS Assembly In the old days…. for(k=1 ; k<=5 ; k++) y[k] = x[k]+1; MIPS Assembly:](https://slidetodoc.com/presentation_image_h2/232b6fd6f2a0b097002f881ad7e4e813/image-7.jpg)
In the old days…. for(k=1 ; k<=5 ; k++) y[k] = x[k]+1; MIPS Assembly: # $ao = x[] # $a 1 = y[] # $t 0 = k addi Loop: sll add lw addi add sw addi ble CMPUT 680 - Compiler Design and Optimization $t 0, $zero, 1 $t 1, $zero, 5 $t 2, $t 0, 2 $t 3, $a 0, $t 2 $t 4, 0($t 3) $t 4, 1 $t 5, $a 1, $t 2 $t 4, 0($t 5) $t 0, 1 $t 0, $t 1, Loop 7

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 32 33 34 35 36 37 38 39 General Registers (Logical) Predicate Registers 1 0 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 4 3 RRB 0 CMPUT 680 - Compiler Design and Optimization 8

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 32 33 34 35 36 37 38 39 General Registers (Logical) Predicate Registers 1 0 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 4 3 RRB 0 CMPUT 680 - Compiler Design and Optimization 9

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 32 33 34 35 36 37 38 39 General Registers (Logical) Predicate Registers 1 0 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 4 3 RRB 0 CMPUT 680 - Compiler Design and Optimization 10

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 4 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 11

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 3 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 12

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 3 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 13

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 y 1 x 2 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 3 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 14

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 y 1 x 2 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 3 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 15

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 y 1 x 2 33 34 35 36 37 38 39 32 General Registers (Logical) Predicate Registers 1 1 0 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 3 3 RRB -1 CMPUT 680 - Compiler Design and Optimization 16

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 y 1 x 2 34 35 36 37 38 39 32 33 General Registers (Logical) Predicate Registers 1 1 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 2 3 RRB -2 CMPUT 680 - Compiler Design and Optimization 17

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 y 1 x 3 x 2 34 35 36 37 38 39 32 33 General Registers (Logical) Predicate Registers 1 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 2 3 RRB -2 CMPUT 680 - Compiler Design and Optimization 18

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory y 2 y 1 x 3 x 2 34 35 36 37 38 39 32 33 General Registers (Logical) Predicate Registers 1 16 17 18 x 1 x 2 x 3 x 4 x 5 LC EC 2 3 RRB -2 CMPUT 680 - Compiler Design and Optimization 19

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory y 2 y 1 x 3 x 2 34 35 36 37 38 39 32 33 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 1 1 16 17 18 y 1 LC EC 2 3 RRB -2 CMPUT 680 - Compiler Design and Optimization 20

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory y 2 y 1 x 3 x 2 34 35 36 37 38 39 32 33 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 1 1 16 17 18 y 1 LC EC 2 3 RRB -2 CMPUT 680 - Compiler Design and Optimization 21

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 y 2 y 1 x 3 x 2 35 36 37 38 39 32 33 34 General Registers (Logical) Predicate Registers 1 1 16 17 18 y 1 LC EC 1 3 RRB -3 CMPUT 680 - Compiler Design and Optimization 22

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 4 x 3 y 2 y 1 35 36 37 38 39 32 33 34 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 x 2 1 1 16 17 18 y 1 LC EC 1 3 RRB -3 CMPUT 680 - Compiler Design and Optimization 23

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 4 x 3 y 2 y 1 35 36 37 38 39 32 33 34 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 LC EC 1 3 RRB -3 CMPUT 680 - Compiler Design and Optimization 24

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 4 x 3 y 2 y 1 35 36 37 38 39 32 33 34 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 LC EC 1 3 RRB -3 CMPUT 680 - Compiler Design and Optimization 25

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 4 x 3 y 2 y 1 35 36 37 38 39 32 33 34 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 LC EC 1 3 RRB -3 CMPUT 680 - Compiler Design and Optimization 26

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 x 4 x 3 y 2 y 1 y 3 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 1 1 16 17 18 y 1 y 2 LC EC 0 3 RRB -4 CMPUT 680 - Compiler Design and Optimization 27

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 x 4 x 3 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 LC EC 0 3 RRB -4 CMPUT 680 - Compiler Design and Optimization 28

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 x 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 LC EC 0 3 RRB -4 CMPUT 680 - Compiler Design and Optimization 29

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 x 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 3 RRB -4 CMPUT 680 - Compiler Design and Optimization 30

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 x 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 3 RRB -4 CMPUT 680 - Compiler Design and Optimization 31

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 x 4 y 2 y 1 y 3 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 0 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 2 RRB -5 CMPUT 680 - Compiler Design and Optimization 32

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 x 4 y 2 y 1 y 3 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 0 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 2 RRB -5 CMPUT 680 - Compiler Design and Optimization 33

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 x 4 y 2 y 1 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 2 RRB -5 CMPUT 680 - Compiler Design and Optimization 34

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 LC EC 0 2 RRB -5 CMPUT 680 - Compiler Design and Optimization 35

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 y 4 CMPUT 680 - Compiler Design and Optimization LC EC 0 2 RRB -5 36

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 1 1 16 17 18 y 1 y 2 y 3 y 4 CMPUT 680 - Compiler Design and Optimization LC EC 0 2 RRB -5 37

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 y 5 y 4 y 2 y 1 y 3 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 0 0 1 16 17 18 y 1 y 2 y 3 y 4 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 38

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 0 1 16 17 18 y 1 y 2 y 3 y 4 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 39

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 0 1 16 17 18 y 1 y 2 y 3 y 4 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 40

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 0 1 16 17 18 y 1 y 2 y 3 y 4 y 5 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 41

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 0 1 16 17 18 y 1 y 2 y 3 y 4 y 5 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 42

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 5 y 4 y 2 y 1 36 37 38 39 32 33 34 35 General Registers (Logical) Predicate Registers 0 x 1 x 2 x 3 x 4 x 5 y 3 0 1 16 17 18 y 1 y 2 y 3 y 4 y 5 CMPUT 680 - Compiler Design and Optimization LC EC 0 1 RRB -6 43

Software Pipelining Example in the IA-64 loop: (p 16) (p 17) (p 18) General Registers (Physical) 32 33 34 35 36 37 38 39 ldl r 32 = [r 12], 1 add r 34 = 1, r 33 stl [r 13] = r 35, 1 br. ctop loop Memory x 1 x 2 x 3 x 4 x 5 y 5 y 4 y 2 y 1 y 3 37 38 39 32 33 34 35 36 General Registers (Logical) Predicate Registers 0 0 16 17 18 y 1 y 2 y 3 y 4 y 5 CMPUT 680 - Compiler Design and Optimization LC EC 0 0 RRB -7 44

The Software Pipelining Branch Instruction LC? LC = Loop Counter EC = Epilog Counter RRB = Rotating Register Base PR = Predicate Register = 0 (epilog) >1 EC? (prolog/kernel) 0 =0 =1 LC-EC-- EC PR[16]=1 PR[16]=0 RRB-- branch CMPUT 680 - Compiler Design and Optimization fall-thru 45