CSCE 513 Computer Architecture Lecture 10 Tomasulos Algorithm

  • Slides: 35
Download presentation
CSCE 513 Computer Architecture Lecture 10 Tomasulo’s Algorithm Topics n Dynamic Scheduling Review n

CSCE 513 Computer Architecture Lecture 10 Tomasulo’s Algorithm Topics n Dynamic Scheduling Review n Tomasulo’s l structure l Examples l Algorithm details n Speculation Readings: Chapter 3: 2. 4 -2. 6 October 4, 2017

Overview Last Time n Control Hazards: n Data Hazards Review New n Tomasulo Overview,

Overview Last Time n Control Hazards: n Data Hazards Review New n Tomasulo Overview, examples revisited l Figures 2. 10 right one, 2. 11 n n Tomasulo’s Algorithm details fig 2. 12 Tomasulo + Re. Order Buffer (ROB) fig 2. 14, 2. 15, 2. 16 References n n Chapter 2 section 2. 6 Test 1 Tuesday September 30 – One Week l Review ? ? ? – 2– CSCE 513 Fall 2017

Links and things http: //csg. csail. mit. edu/6. 823/syllabusreadings. html • Reading list showing

Links and things http: //csg. csail. mit. edu/6. 823/syllabusreadings. html • Reading list showing overlap of H&P IV and H&P III http: //www. cosmolearning. com/video-lectures/computer -architecture-is-back-parallel-computing-landscape/ • Patterson Lecture -- Parallel is Back http: //www. ecs. umass. edu/ece/koren/architecture/ • – 3– Simulators and other tools CSCE 513 Fall 2017

Tournament Predictors Note 2 -bit etc just use information about history of this branch(local

Tournament Predictors Note 2 -bit etc just use information about history of this branch(local info) Correlating predictors incorporate some info about other branches (global info) Tournament predictors uses multi level • Local predictor • Global predictor • Selector predictor to chose between • – 4– (choose the one that has been most successful in the recent past) CSCE 513 Fall 2017

Pipelined Functional Units Addition in Scientific Notation – 5– Floating point addition CSCE 513

Pipelined Functional Units Addition in Scientific Notation – 5– Floating point addition CSCE 513 Fall 2017

Floating Adder Stages – 6– CSCE 513 Fall 2017

Floating Adder Stages – 6– CSCE 513 Fall 2017

Out of Order Execution – 7– CSCE 513 Fall 2017

Out of Order Execution – 7– CSCE 513 Fall 2017

– 8– CSCE 513 Fall 2017

– 8– CSCE 513 Fall 2017

Figure 3. 6 Tomasulo 1. CDB 2. Register Renaming 3. Out of order execution

Figure 3. 6 Tomasulo 1. CDB 2. Register Renaming 3. Out of order execution – 9– CSCE 513 Fall 2017

Tomasulo’s Multiple Reservation Stations for each Unit • OP • Qj, Qk • Vj,

Tomasulo’s Multiple Reservation Stations for each Unit • OP • Qj, Qk • Vj, Vk • A • Busy Register File • Qi • Notes 1. 2. – 10 – the reservation stations serve as extra temporary registers! To support Oo. OE – ID factored to “Issue” and “Read Operands” CSCE 513 Fall 2017

Register Renaming Again DIV F 0, F 2, F 4 ADD. D F 6,

Register Renaming Again DIV F 0, F 2, F 4 ADD. D F 6, F 0, F 8 S. D F 6, 0(R 1) SUB. D F 8, F 10, F 14 MUL. D F 6, F 10, F 8 – 11 – CSCE 513 Fall 2017

Tomasulo’s Example L. D F 6, 32(R 2) L. D F 2, 44(R 3)

Tomasulo’s Example L. D F 6, 32(R 2) L. D F 2, 44(R 3) MUL. D F 0, F 2, F 4 SUB. D F 8, F 2, F 6 DIV. D F 10, F 6 ADD. D F 6, F 8, F 2 – 12 – CSCE 513 Fall 2017

Figure 3. 7 – Example which Cycle? – 13 – CSCE 513 Fall 2017

Figure 3. 7 – Example which Cycle? – 13 – CSCE 513 Fall 2017

Figure 3 3. 8 – 14 – CSCE 513 Fall 2017

Figure 3 3. 8 – 14 – CSCE 513 Fall 2017

Figure 3. 9. a Tomasulo Issue – 15 – CSCE 513 Fall 2017

Figure 3. 9. a Tomasulo Issue – 15 – CSCE 513 Fall 2017

Figure 3. 9. b Tomasulo Execute – 16 – CSCE 513 Fall 2017

Figure 3. 9. b Tomasulo Execute – 16 – CSCE 513 Fall 2017

Figure 3. 9. c Tomasulo Write Result – 17 – CSCE 513 Fall 2017

Figure 3. 9. c Tomasulo Write Result – 17 – CSCE 513 Fall 2017

Tomasulo Loop Example Loop: L. D F 0, 0(R 1) MUL. D F 4,

Tomasulo Loop Example Loop: L. D F 0, 0(R 1) MUL. D F 4, F 0, F 2 S. D F 4, 0(R 1) DADDIU R 1, -8 BNE R 1, R 2, Loop Can’t be done on simulator! Can’t input DADDIU or BNE. Tomasulo is just for the Floating Point and Memory. – 18 – CSCE 513 Fall 2017

Tomasulo Loop Unrolled As Hardware would do L. D F 0, 0(R 1) MUL.

Tomasulo Loop Unrolled As Hardware would do L. D F 0, 0(R 1) MUL. D F 4, F 0, F 2 S. D F 4, 0(R 1) L. D F 0, -8(R 1) MUL. D F 4, F 0, F 2 S. D F 4, -8(R 1) – 19 – CSCE 513 Fall 2017

Figure 3. 10 - Two active Iterations of loop – 20 – CSCE 513

Figure 3. 10 - Two active Iterations of loop – 20 – CSCE 513 Fall 2017

Observations on Tomasulo’s Alg 1. Tomasulo designed for the IBM 360/91 n http: //www.

Observations on Tomasulo’s Alg 1. Tomasulo designed for the IBM 360/91 n http: //www. columbia. edu/acis/history/36091. html 2. Does not require compiler to do all of the work n Changes to hardware do not require changes to compiler (adding another multiplier) 3. Designed before caches, but Oo. OE really helps with cache misses 4. Dynamic scheduling required for “speculation” – 21 – CSCE 513 Fall 2017

Branch Prediction Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the

Branch Prediction Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register file when instruction is no longer speculative Need an additional piece of hardware to prevent any irrevocable action until an instruction commits n I. e. updating state or taking an execution – 22 –Copyright © 2012, Elsevier Inc. All rights reserved. CSCE 513 Fall 2017

Figure 3. 11 – 23 – CSCE 513 Fall 2017

Figure 3. 11 – 23 – CSCE 513 Fall 2017

Speculation 1. Issue 2. Execute 3. Write result 4. Commit – 24 – CSCE

Speculation 1. Issue 2. Execute 3. Write result 4. Commit – 24 – CSCE 513 Fall 2017

Koren’s Tools Again http: //www. ecs. umass. edu/ece/koren/architecture/ – 25 – CSCE 513 Fall

Koren’s Tools Again http: //www. ecs. umass. edu/ece/koren/architecture/ – 25 – CSCE 513 Fall 2017

Figure 3. 12 Tomasulo + ROB example – 26 – CSCE 513 Fall 2017

Figure 3. 12 Tomasulo + ROB example – 26 – CSCE 513 Fall 2017

Figure 3. 13 Tomasulo + ROB example – 27 – CSCE 513 Fall 2017

Figure 3. 13 Tomasulo + ROB example – 27 – CSCE 513 Fall 2017

Fig 2. 17 a Tomasulo+ROB Details – 28 – CSCE 513 Fall 2017

Fig 2. 17 a Tomasulo+ROB Details – 28 – CSCE 513 Fall 2017

Fig 2. 17 b Tomasulo+ROB Execute – 29 – CSCE 513 Fall 2017

Fig 2. 17 b Tomasulo+ROB Execute – 29 – CSCE 513 Fall 2017

Fig 2. 17 c Tomasulo+ROB Write-result – 30 – CSCE 513 Fall 2017

Fig 2. 17 c Tomasulo+ROB Write-result – 30 – CSCE 513 Fall 2017

Fig 2. 17 d Tomasulo+ROB Commit – 31 – CSCE 513 Fall 2017

Fig 2. 17 d Tomasulo+ROB Commit – 31 – CSCE 513 Fall 2017

Figure 3. 15 Multiple Issue Approaches – 32 – CSCE 513 Fall 2017

Figure 3. 15 Multiple Issue Approaches – 32 – CSCE 513 Fall 2017

Unrolling for VLIW For i=1, 10000 x[i] = x[i]+ c Loop: L. D F

Unrolling for VLIW For i=1, 10000 x[i] = x[i]+ c Loop: L. D F 0, 0(R 1) ADD. D F 4, F 0, F 2 S. D F 4, 0(R 1) DADDUI R 1, -8 BNE R 1, R 2, – 33 – Registers for Load Sum F 0 F 4 F 6 F 8 F 10 F 12 F 14 F 16 F 18 F 20 F 22 F 24 F 26 F 28 CSCE 513 Fall 2017

Figure 3. 16 VLIW – 34 – CSCE 513 Fall 2017

Figure 3. 16 VLIW – 34 – CSCE 513 Fall 2017

Itanium. – 35 – CSCE 513 Fall 2017

Itanium. – 35 – CSCE 513 Fall 2017