Lecture 6 Advanced Pipelines Multicycle inorder pipelines and

  • Slides: 20
Download presentation
Lecture 6: Advanced Pipelines • Multi-cycle in-order pipelines and out-of-order pipelines (Appendix A, Sections

Lecture 6: Advanced Pipelines • Multi-cycle in-order pipelines and out-of-order pipelines (Appendix A, Sections 3. 5 -3. 6) 1

Control Hazards • Simple techniques to handle control hazard stalls: Ø for every branch,

Control Hazards • Simple techniques to handle control hazard stalls: Ø for every branch, introduce a stall cycle (note: every 6 th instruction is a branch!) Ø assume the branch is not taken and start fetching the next instruction – if the branch is taken, need hardware to cancel the effect of the wrong-path instruction Ø fetch the next instruction (branch delay slot) and execute it anyway – if the instruction turns out to be on the correct path, useful work was done – if the instruction turns out to be on the wrong path, hopefully program state is not lost 2

Branch Delay Slots 3

Branch Delay Slots 3

Slowdowns from Stalls • Perfect pipelining with no hazards an instruction completes every cycle

Slowdowns from Stalls • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num instructions) speedup = increase in clock speed = num pipeline stages • With hazards and stalls, some cycles (= stall time) go by during which no instruction completes, and then the stalled instruction completes • Total cycles = number of instructions + stall cycles • Slowdown because of stalls = 1/ (1 + stall cycles per instr) 4

Pipeline Implementation • Signals for the muxes have to be generated – some of

Pipeline Implementation • Signals for the muxes have to be generated – some of this can happen during ID • Need look-up tables to identify situations that merit bypassing/stalling – the number of inputs to the muxes goes up 5

Detecting Control Signals Situation Example code Action No dependence LD R 1, 45(R 2)

Detecting Control Signals Situation Example code Action No dependence LD R 1, 45(R 2) DADD R 5, R 6, R 7 DSUB R 8, R 6, R 7 OR R 9, R 6, R 7 No hazards Dependence requiring stall LD R 1, 45(R 2) DADD R 5, R 1, R 7 DSUB R 8, R 6, R 7 OR R 9, R 6, R 7 Detect use of R 1 during ID of DADD and stall Dependence overcome by forwarding LD R 1, 45(R 2) DADD R 5, R 6, R 7 DSUB R 8, R 1, R 7 OR R 9, R 6, R 7 Detect use of R 1 during ID of DSUB and set mux control signal that accepts result from bypass path Dependence with accesses in order LD R 1, 45(R 2) DADD R 5, R 6, R 7 DSUB R 8, R 6, R 7 OR R 9, R 1, R 7 No action required 6

Multicycle Instructions Functional unit Latency Initiation interval Integer ALU 1 1 Data memory 2

Multicycle Instructions Functional unit Latency Initiation interval Integer ALU 1 1 Data memory 2 1 FP add 4 1 FP multiply 7 1 FP divide 25 25 7

Effects of Multicycle Instructions • Structural hazards if the unit is not fully pipelined

Effects of Multicycle Instructions • Structural hazards if the unit is not fully pipelined (divider) • Frequent RAW hazard stalls • Potentially multiple writes to the register file in a cycle • WAW hazards because of out-of-order instr completion • Imprecise exceptions because of o-o-o instr completion 8

Precise Exceptions • On an exception: Ø must save PC of instruction where program

Precise Exceptions • On an exception: Ø must save PC of instruction where program must resume Ø all instructions after that PC that might be in the pipeline must be converted to NOPs (other instructions continue to execute and may raise exceptions of their own) Ø temporary program state not in memory (in other words, registers) has to be stored in memory Ø potential problems if a later instruction has already modified memory or registers • A processor that fulfils all the above conditions is said to provide precise exceptions (useful for debugging and of course, correctness) 9

Dealing with these Effects • Multiple writes to the register file: increase the number

Dealing with these Effects • Multiple writes to the register file: increase the number of ports, stall one of the writers during ID, stall one of the writers during WB (the stall will propagate) • WAW hazards: detect the hazard during ID and stall the later instruction • Imprecise exceptions: buffer the results if they complete early or save more pipeline state so that you can return to exactly the same state that you left at 10

ILP • Instruction-level parallelism: overlap among instructions: pipelining or multiple instruction execution • What

ILP • Instruction-level parallelism: overlap among instructions: pipelining or multiple instruction execution • What determines the degree of ILP? Ø dependences: property of the program Ø hazards: property of the pipeline 11

Types of Dependences • Data dependences: an instr produces a result for another (true

Types of Dependences • Data dependences: an instr produces a result for another (true dependence, results in RAW hazards in a pipeline) • Name dependences: two instrs that use the same names (anti and output dependences, result in WAR and WAW hazards in a pipeline) • Control dependences: an instruction’s execution depends on the result of a branch – re-ordering should preserve exception behavior and dataflow 12

An Out-of-Order Processor Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3 Instr

An Out-of-Order Processor Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Branch prediction and instr fetch R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 R 1 R 3+R 2 Instr Fetch Queue Decode & Rename T 1 T 2 T 3 T 4 T 5 T 6 T 1 R 1+R 2 T 2 T 1+R 3 BEQZ T 2 T 4 T 1+T 2 T 5 T 4+T 2 Register File R 1 -R 32 ALU ALU Results written to ROB and tags broadcast to IQ Issue Queue (IQ) 13

Design Details - I • Instructions enter the pipeline in order • No need

Design Details - I • Instructions enter the pipeline in order • No need for branch delay slots if prediction happens in time • Instructions leave the pipeline in order – all instructions that enter also get placed in the ROB – the process of an instruction leaving the ROB (in order) is called commit – an instruction commits only if it and all instructions before it have completed successfully (without an exception) • To preserve precise exceptions, a result is written into the register file only when the instruction commits – until then, the result is saved in a temporary register in the ROB 14

Design Details - II • Instructions get renamed and placed in the issue queue

Design Details - II • Instructions get renamed and placed in the issue queue – some operands are available (T 1 -T 6; R 1 -R 32), while others are being produced by instructions in flight (T 1 -T 6) • As instructions finish, they write results into the ROB (T 1 -T 6) and broadcast the operand tag (T 1 -T 6) to the issue queue – instructions now know if their operands are ready • When a ready instruction issues, it reads its operands from T 1 -T 6 and R 1 -R 32 and executes (out-of-order execution) • Can you have WAW or WAR hazards? By using more names (T 1 -T 6), name dependences can be avoided 15

Design Details - III • If instr-3 raises an exception, wait until it reaches

Design Details - III • If instr-3 raises an exception, wait until it reaches the top of the ROB – at this point, R 1 -R 32 contain results for all instructions up to instr-3 – save registers, save PC of instr-3, and service the exception • If branch is a mispredict, flush all instructions after the branch and start on the correct path – mispredicted instrs will not have updated registers (the branch cannot commit until it has completed and the flush happens as soon as the branch completes) • Potential problems: ? 16

Managing Register Names Temporary values are stored in the register file and not the

Managing Register Names Temporary values are stored in the register file and not the ROB Logical Registers R 1 -R 32 Physical Registers P 1 -P 64 At the start, R 1 -R 32 can be found in P 1 -P 32 Instructions stop entering the pipeline when P 64 is assigned R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 P 33 P 1+P 2 P 34 P 33+P 3 BEQZ P 34 P 35 P 33+P 34 What happens on commit? 17

The Commit Process • On commit, no copy is required • The register map

The Commit Process • On commit, no copy is required • The register map table is updated – the “committed” value of R 1 is now in P 33 and not P 1 – on an exception, P 33 is copied to memory and not P 1 • An instruction in the issue queue need not modify its input operand when the producer commits • When instruction-1 commits, we no longer have any use for P 1 – it is put in a free pool and a new instruction can now enter the pipeline for every instr that commits, a new instr can enter the pipeline number of in-flight instrs is a constant = number of extra (rename) registers 18

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Branch prediction and instr fetch R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 R 1 R 3+R 2 Instr Fetch Queue Decode & Rename Register Map Table R 1 P 1 R 2 P 33 P 1+P 2 P 34 P 33+P 3 BEQZ P 34 P 35 P 33+P 34 P 36 P 35+P 34 Register File P 1 -P 64 ALU ALU Results written to regfile and tags broadcast to IQ Issue Queue (IQ) 19

Title • Bullet 20

Title • Bullet 20