CSL 718 Pipelined Processors Improving Branch Performance 19
CSL 718 : Pipelined Processors Improving Branch Performance 19 th Jan, 2006 Anshul Kumar, CSE IITD
Improving Branch Performance • Branch Elimination – replace branch with other instructions • Branch Speed Up – reduce time for computing CC and TIF • Branch Prediction – guess the outcome and proceed, undo if necessary • Branch Target Capture – make use of history Anshul Kumar, CSE IITD 2
Branch Elimination F C Use conditional instructions (predicated execution) T S OP 1 BC CC = Z, + 2 ADD R 3, R 2, R 1 OP 2 Anshul Kumar, CSE IITD C: S OP 1 ADD R 3, R 2, R 1, NZ OP 2 3
Branch Elimination - contd. IF OP 1 BC IF IF D AG DF DF DF EX EX IF IF IF D AG TIF TIF IF D’ D IF IF D AG DF DF DF EX EX ADD/OP 2 ADD (cond) CC IF Anshul Kumar, CSE IITD AG 4
Improving Branch Performance • Branch Elimination – replace branch with other instructions • Branch Speed Up – reduce time for computing CC and TIF • Branch Prediction – guess the outcome and proceed, undo if necessary • Branch Target Capture – make use of history Anshul Kumar, CSE IITD 5
Branch Speed Up : early target address generation • • Assume each instruction is Branch Generate target address while decoding If target in same page omit translation After decoding discard target address if not Branch BC IF IF IF Anshul Kumar, CSE IITD D TIF TIF AG 6
Branch Speed Up : increase CC - branch gap Increase the gap between condition checking and branching • Early CC setting • Delayed branch Anshul Kumar, CSE IITD 7
Early CC setting: insert n instructions (branch taken) I-1 IF I T T+1 CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ D IF IF’ D’ IF IF n=0 AG D delay = 6 (Delay can be reduced with larger target buffer) Anshul Kumar, CSE IITD 8
Early CC setting: insert n instructions I-1 IF J I T CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ D AG IF IF’ D’ IF IF T+1 Anshul Kumar, CSE IITD n=1 D delay = 5 9
Early CC setting: insert n instructions I-1 IF J K I T CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ IF IF’ D’ T+1 Anshul Kumar, CSE IITD n=2 D AG IF IF D delay = 4 10
Early CC setting: insert n instructions I-1 IF J K L I CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ IF IF’ D’ T T+1 Anshul Kumar, CSE IITD n=3 D AG IF IF D delay = 4 11
Early CC setting: insert n instructions (branch not taken) IF I-1 I I+1 I+2 CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ D IF IF’ D’ IF D Anshul Kumar, CSE IITD n=0 AG delay = 5 12
Early CC setting: insert n instructions IF I-1 J I I+1 CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ D AG IF IF’ D’ IF D I+2 Anshul Kumar, CSE IITD n=1 delay = 4 13
Early CC setting: insert n instructions I-1 IF J K I I+1 CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ D IF IF’ D’ IF D I+2 Anshul Kumar, CSE IITD n=2 AG delay = 3 14
Early CC setting: insert n instructions I-1 J K L I IF CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ D IF IF’ D’ IF D I+1 I+2 n=3 AG delay = 2 Anshul Kumar, CSE IITD 15
Delayed Branch: insert n instructions (branch taken) I-1 IF I T T+1 CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ D IF IF’ D’ IF IF Anshul Kumar, CSE IITD n=0 AG D delay = 6 16
Delayed Branch : insert n instructions I-1 IF I J T CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D AG AG DF DF EX EX IF IF D’ D AG IF IF’ D’ IF IF T+1 Anshul Kumar, CSE IITD n=1 D delay = 5 17
Delayed Branch : insert n instructions I-1 IF I J K T CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D AG AG DF DF EX EX IF IF D’ D AG IF IF’ D’ IF IF T+1 Anshul Kumar, CSE IITD n=2 D delay = 4 18
Delayed Branch : insert n instructions I-1 IF I J K L CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D AG AG DF DF EX EX IF IF D AG AG DF DF IF IF D’ IF IF’ D’ IF IF T T+1 Anshul Kumar, CSE IITD n=3 D EX EX AG D delay = 3 19
Delayed Branch : insert n instructions (branch not taken) IF I-1 I I+1 I+2 CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ D IF IF’ D’ IF D Anshul Kumar, CSE IITD n=0 AG delay = 5 20
Delayed Branch : insert n instructions IF I-1 I J I+1 CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D AG AG DF DF EX EX IF IF D’ D AG IF IF’ D’ IF D I+2 Anshul Kumar, CSE IITD n=1 delay = 4 21
Delayed Branch : insert n instructions I-1 IF I J K I+1 CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D AG AG DF DF EX EX IF IF D’ D AG IF IF’ D’ IF D I+2 Anshul Kumar, CSE IITD n=2 delay = 3 22
Delayed Branch : insert n instructions I-1 I J K L IF CC IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D AG AG DF DF EX EX IF IF D AG AG DF DF IF IF D’ D IF IF’ D’ IF D I+1 I+2 n=3 EX EX AG delay = 2 Anshul Kumar, CSE IITD 23
delayed early CC branch setting Summary - Branch Speed Up uncond (T) cond (I) Anshul Kumar, CSE IITD n=0 4 6 5 n=1 4 5 4 3 5 4 n=2 4 4 3 2 4 3 n=3 4 4 2 1 3 2 n=4 4 4 1 0 2 1 n=5 4 4 0 0 1 0 24
Improving Branch Performance • Branch Elimination – replace branch with other instructions • Branch Speed Up – reduce time for computing CC and TIF • Branch Prediction – guess the outcome and proceed, undo if necessary • Branch Target Capture – make use of history Anshul Kumar, CSE IITD 25
Branch Prediction • Treat conditional branches as unconditional branches / NOP • Undo if necessary Strategies: – Fixed (always guess inline) – Static (guess on the basis of instruction type) – Dynamic (guess based on recent history) Anshul Kumar, CSE IITD 26
Static Branch Prediction Total 68. 2% Anshul Kumar, CSE IITD 27
Branch Prediction (guess inline, go inline) IF I-1 I I+1 I+2 IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D Anshul Kumar, CSE IITD CC delay = 0 28
Branch Prediction (guess inline, goto target) CC I-1 IF I T T+1 IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ D IF IF’ D’ IF IF Anshul Kumar, CSE IITD AG D delay = 6 29
Branch Prediction (guess target, go inline) IF I-1 I IF D AG AG DF DF EX EX IF IF D CC AG AG TIF D T I+1 I+2 Anshul Kumar, CSE IITD D’ D delay = 5 30
Branch Prediction (guess target, goto target) CC I-1 IF I T T+1 IF D AG AG DF DF EX EX IF IF D AG AG TIF IF IF D’ IF IF’ D’ D AG IF IF D delay = 4 Same as unconditional branch Anshul Kumar, CSE IITD 31
Static prediction strategy Let p = probability of taking branch guess target: delayt = 4 p + 5 (1 - p) = 5 - p guess inline: delayi = 6 p + 0 (1 - p) = 6 p if (delayt < delayi) guess target else guess inline (delayt < delayi) 5 - p < 6 p p > 5/7 =. 71 Anshul Kumar, CSE IITD 32
Static prediction strategy thresholds for different instructions I-1 I IF IF D AG AG DF DF EX EX IF IF D CC AG AG TIF actual T I guess T 4 5 I 6 0 guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p) i. e. p >. 71 Anshul Kumar, CSE IITD 33
Static prediction strategy thresholds for different instructions I-1 I IF IF D AG AG DF DF EX EX IF IF D CC AG AG TIF EX EX actual Loop control T I guess T 4 6 I 7 1 guess target if 4 p + 6 (1 - p) < 7 p + 1 (1 - p) i. e. p >. 62 Anshul Kumar, CSE IITD 34
Static prediction strategy thresholds for different instructions I-1 I IF IF D AG AG DF DF EX EX IF IF D CC AG TIF actual register address T I guess T 3 5 I 6 0 guess target if 3 p + 5 (1 - p) < 6 p + 0 (1 - p) i. e. p >. 62 Anshul Kumar, CSE IITD 35
Delayed Branch with Nullification • • (Also called annulment ) Delay slot is used optionally Branch instruction specifies the option Option may be exercised based on correctness of branch prediction Helps in better utilization of delay slots Anshul Kumar, CSE IITD 36
Variants of Nullification 1. No annulment 2. Annul if not taken 3. Annul If taken 4. Annul always (branch-with-execute) (branch-or-skip) (branch-with-skip) bc D D D D Examples • SPARC: • MC 88100: • i 860: • HP PA: Anshul Kumar, CSE IITD 1, 2 1, 4 2, 4 1, 2, 3 37
Annulment illustration use branch-or-skip use branch-with-skip bc D Anshul Kumar, CSE IITD 38
- Slides: 38