CS 152 Computer Architecture and Engineering Lecture 12

  • Slides: 36
Download presentation
CS 152 Computer Architecture and Engineering Lecture 12 Introduction to Pipelining: Datapath and Control

CS 152 Computer Architecture and Engineering Lecture 12 Introduction to Pipelining: Datapath and Control March 8 th, 2004 John Kubiatowicz (www. cs. berkeley. edu/~kubitron) lecture slides: http: //inst. eecs. berkeley. edu/~cs 152/

The Big Picture: Where are We Now? ° The Five Classic Components of a

The Big Picture: Where are We Now? ° The Five Classic Components of a Computer Processor Input Control Memory Datapath Output ° Today’s Topics: • • • 3/8/04 Recap last lecture/finish datapath Pipelined Control/ Do it yourself Pipelined Control Administrivia Hazards/Forwarding Exceptions Review MIPS R 3000 pipeline ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 2

Can pipelining get us into trouble? ° Yes: Pipeline Hazards • structural hazards: attempt

Can pipelining get us into trouble? ° Yes: Pipeline Hazards • structural hazards: attempt to use the same resource two different ways at the same time - E. g. , combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) • data hazards: attempt to use item before it is ready - E. g. , one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer - instruction depends on result of prior instruction still in the pipeline • control hazards: attempt to make a decision before condition is evaulated - E. g. , washing football uniforms and need to get proper detergent level; need to see after dryer before next load in - branch instructions ° Can always resolve hazards by waiting • pipeline control must detect the hazard • take action (or delay action) to resolve hazards 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 3

Recap: Data Hazards I-Fet ch DCD Mem. Op. Fetch IFetch Structural Hazard I-Fet ch

Recap: Data Hazards I-Fet ch DCD Mem. Op. Fetch IFetch Structural Hazard I-Fet ch DCD Op. Fetch Jump IFetch IF DCD EX IF Mem WB DCD EX IF Store °°° Control Hazard °°° RAW (read after write) Data Hazard Mem WB DCD EX Mem WB IF DCD IF 3/8/04 DCD Exec DCD OF ©UCB Spring 2004 WAW Data Hazard (write after write) OF Ex RS Ex Mem WAR Data Hazard (write after read) CS 152 / Kubiatowicz Lec 12. 4

Recall: Single cycle control! Control Ideal Instruction Memory Rd Rs 5 5 A Clk

Recall: Single cycle control! Control Ideal Instruction Memory Rd Rs 5 5 A Clk PC 32 Rw Ra Rb 32 32 -bit Registers Clk Conditions Rt 5 32 32 ALU Next Address Instruction Address Control Signals Instruction B 32 Data Address Data In Ideal Data Memory Data Out Clk Datapath 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 5

Data Stationary Control ° The Main Control generates the control signals during Reg/Dec •

Data Stationary Control ° The Main Control generates the control signals during Reg/Dec • Control signals for Exec (Ext. Op, ALUSrc, . . . ) are used 1 cycle later • Control signals for Mem (Mem. Wr Branch) are used 2 cycles later • Control signals for Wr (Memto. Reg Mem. Wr) are used 3 cycles later Reg/Dec Mem. Wr Branch Memto. Reg. Wr 3/8/04 Reg. Dst Mem. Wr Branch Memto. Reg. Wr ©UCB Spring 2004 Mem. Wr Branch Memto. Reg. Wr Wr Mem/Wr Register Reg. Dst Ext. Op ALUSrc ALUOp Mem Ex/Mem Register Main Control ID/Ex Register IF/ID Register Ext. Op ALUSrc ALUOp Exec Memto. Reg. Wr CS 152 / Kubiatowicz Lec 12. 6

A B WB Ctrl PC Next PC ©UCB Spring 2004 M S D 3/8/04

A B WB Ctrl PC Next PC ©UCB Spring 2004 M S D 3/8/04 Mem Ctrl Reg. File im v rw wb Data Mem rs rt v rw wb me Mem Access op v rw wb me ex Exec rt rs Decode IR fun Reg File Inst. Mem Datapath + Data Stationary Control CS 152 / Kubiatowicz Lec 12. 7

Let’s Try it Out 3/8/04 10 lw r 1, r 2(35) 14 add. I

Let’s Try it Out 3/8/04 10 lw r 1, r 2(35) 14 add. I r 2, 3 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 30 ori r 8, r 9, 17 34 add r 10, r 11, r 12 100 and r 13, r 14, 15 ©UCB Spring 2004 these addresses are octal CS 152 / Kubiatowicz Lec 12. 8

Start: Fetch 10 n = B PC 10 D Next PC M S 3/8/04

Start: Fetch 10 n = B PC 10 D Next PC M S 3/8/04 ©UCB Spring 2004 Data Mem Access im WB Ctrl Mem Ctrl Exec rs rt Reg File IR n Reg. File n Decode Inst. Mem n IF 10 lw r 1, r 2(35) 14 add. I r 2, 3 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 30 ori r 8, r 9, 17 34 add r 10, r 11, r 12 CS 152 / Kubiatowicz 100 and r 13, r 14, 15 Lec 12. 9

B PC 14 D Next PC M S 3/8/04 ©UCB Spring 2004 Data Mem

B PC 14 D Next PC M S 3/8/04 ©UCB Spring 2004 Data Mem = im A WB Ctrl Mem Access rt n Reg. File n Exec 2 Reg File IR n Decode lw r 1, r 2(35) Inst. Mem Fetch 14, Decode 10 ID 10 lw IF 14 add. I r 2, 3 r 1, r 2(35) 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 30 ori r 8, r 9, 17 34 add r 10, r 11, r 12 CS 152 / Kubiatowicz 100 and r 13, r 14, 15 Lec 12. 10

PC 20 3/8/04 ©UCB Spring 2004 Data Mem D Next PC M S Mem

PC 20 3/8/04 ©UCB Spring 2004 Data Mem D Next PC M S Mem Access = B WB Ctrl Mem Ctrl Exec r 2 2 n Reg. File lw r 1 Decode rt 35 IR n Reg File add. I r 2, 3 Inst. Mem Fetch 20, Decode 14, Exec 10 EX 10 lw ID 14 add. I r 2, 3 IF 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 30 ori r 8, r 9, 17 34 add r 10, r 11, r 12 r 1, r 2(35) CS 152 / Kubiatowicz 100 and r 13, r 14, 15 Lec 12. 11

3/8/04 Data Mem Reg. File lw r 1 24 M 10 lw EX 14

3/8/04 Data Mem Reg. File lw r 1 24 M 10 lw EX 14 add. I r 2, 3 ID 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 30 ori r 8, r 9, 17 34 add r 10, r 11, r 12 IF PC Next PC D M Mem Access = B WB Ctrl Mem Ctrl r 2+35 r 2 Exec 4 n add. I r 2, 3 Decode 5 3 IR Reg File sub r 3, r 4, r 5 Inst. Mem Fetch 24, Decode 20, Exec 14, Mem 10 ©UCB Spring 2004 r 1, r 2(35) CS 152 / Kubiatowicz 100 and r 13, r 14, 15 Lec 12. 12

lw r 1 M[r 2+35] r 2+3 30 Note Delayed Branch: always execute ori

lw r 1 M[r 2+35] r 2+3 30 Note Delayed Branch: always execute ori after beq 3/8/04 WB 10 M 14 ©UCB Spring 2004 lw r 1, r 2(35) add. I r 2, 3 EX 20 ID 24 sub r 3, r 4, r 5 beq r 6, r 7, 100 IF 30 ori r 8, r 9, 17 add r 10, r 11, r 12 34 PC Next PC D Data Mem r 5 Mem Access = r 4 WB Ctrl Reg. File add. I r 2 sub r 3 Mem Ctrl 7 Exec 6 Reg File IR Decode beq r 6, r 7 100 Inst. Mem Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 CS 152 / Kubiatowicz 100 and r 13, r 14, 15 Lec 12. 13

3/8/04 r 1=M[r 2+35] WB Ctrl Reg. File add. I r 2+3 ©UCB Spring

3/8/04 r 1=M[r 2+35] WB Ctrl Reg. File add. I r 2+3 ©UCB Spring 2004 Data Mem sub r 3 100 PC Next PC D Mem Access = r 7 Mem Ctrl r 4 -r 5 r 6 9 Exec beq Decode xx 100 IR Reg File ori r 8, r 9 17 Inst. Mem Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 10 WB 14 M 20 lw r 1, r 2(35) add. I r 2, 3 sub r 3, r 4, r 5 EX 24 ID 30 beq r 6, r 7, 100 ori r 8, r 9, 17 34 add r 10, r 11, r 12 CS 152 / Kubiatowicz IF 100 and r 13, r 14, 15 Lec 12. 14

Mem Ctrl 3/8/04 10 lw 14 add. I r 2, 3 ___ WB 20

Mem Ctrl 3/8/04 10 lw 14 add. I r 2, 3 ___ WB 20 M 24 EX 30 34 PC Next PC D Fill it in yourself! WB Ctrl Reg. File = Data Mem Exec Reg File IR Mem Access ? Decode Inst. Mem Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20 ©UCB Spring 2004 r 1, r 2(35) sub r 3, r 4, r 5 beq r 6, r 7, 100 ori r 8, r 9, 17 add r 10, r 11, r 12 CS 152 / Kubiatowicz ID 100 and r 13, r 14, 15 Lec 12. 15

IR ? Mem Ctrl Fill it in yourself! Reg. File ___ PC Next PC

IR ? Mem Ctrl Fill it in yourself! Reg. File ___ PC Next PC D ©UCB Spring 2004 Data Mem = Mem Access Reg File ? ? 3/8/04 WB Ctrl ? Exec ? Decode Inst. Mem Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24 10 lw r 1, r 2(35) 14 add. I r 2, 3 20 sub r 3, r 4, r 5 WB 24 M 30 beq r 6, r 7, 100 ori r 8, r 9, 17 34 add r 10, r 11, r 12 CS 152 / Kubiatowicz r 14, 15 EX 100 and r 13, Lec 12. 16

IR ? ? Mem Ctrl Fill it in yourself! Reg. File ___ 10 lw

IR ? ? Mem Ctrl Fill it in yourself! Reg. File ___ 10 lw 14 add. I r 2, 3 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 ori r 8, r 9, 17 add r 10, r 11, r 12 WB 30 34 PC Next PC D Data Mem ? Mem Access Reg File ? = 3/8/04 WB Ctrl ? Exec ? Decode Inst. Mem Fetch 114, Dcd 110, Ex 104, Mem 100, WB 30 ©UCB Spring 2004 r 1, r 2(35) CS 152 / Kubiatowicz M 100 and r 13, r 14, 15 Lec 12. 17

Pipelined Processor Bubbles D ° Separate control at each stage Reg. File M Data

Pipelined Processor Bubbles D ° Separate control at each stage Reg. File M Data Mem B Mem Access PC Next PC Equal IRmem WB Ctrl Exec S IRwb IRex A Mem Ctrl Dcd Ctrl Stalls Reg File IR Inst. Mem Valid ° Stalls propagate backwards to freeze previous stages ° Bubbles in pipeline introduced by placing “Noops” into local stage, stall previous stages. 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 18

Recap: Data Hazards ° Avoid some “by design” • eliminate WAR by always fetching

Recap: Data Hazards ° Avoid some “by design” • eliminate WAR by always fetching operands early (DCD) in pipe • eleminate WAW by doing all WBs in order (last stage, static) ° Detect and resolve remaining ones • stall or forward (if possible) IF DCD EX IF Mem WB DCD EX Mem WB IF DCD IF 3/8/04 RAW Data Hazard Mem WB DCD OF ©UCB Spring 2004 WAW Data Hazard OF Ex RS Ex Mem WAR Data Hazard CS 152 / Kubiatowicz Lec 12. 19

Is CPI = 1 for our pipeline? ° Remember that CPI is an “Average

Is CPI = 1 for our pipeline? ° Remember that CPI is an “Average # cycles/inst IFetch Dcd Exec IFetch Dcd Mem WB Exec Mem IFetch Dcd WB ° CPI here is 1, since the average throughput is 1 instruction every cycle. ° What if there are stalls or multi-cycle execution? ° Usually CPI > 1. How close can we get to 1? ? 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 20

Administrivia ° Midterm I: Wednesday! (3/10) • 310 Soda Hall, 5: 30 – 8:

Administrivia ° Midterm I: Wednesday! (3/10) • 310 Soda Hall, 5: 30 – 8: 30 • One sheet of notes (both sides) • Afterwards, pizza at La. Vals (I’ll buy!)b ° Topics: • • Chapters 1 -5, some knowledge about pipelining Should know material in Appendices A-C Could be questions about state machine design (Prereq quiz material is fair game!) ° Review Session • Tuesday: 7: 00 – 9: 00 • Location TBA (405? ) ° Homework 4: Not out yet • I may move the deadline ° Lab 3 due Thursday! (3/11) • Demonstrate to Tas in section • Report due by midnight that day 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 21

Hazard Detection ° Suppose instruction i is about to be issued and a predecessor

Hazard Detection ° Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. New Instruction Movement: Inst I Inst J Window on execution: Only pending instructions can cause hazards ° A RAW hazard exists on register if Rregs( i ) Wregs( j ) • Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction. • When instruction issues, reserve its result register. • When on operation completes, remove its write reservation. ° A WAW hazard exists on register if Wregs( i ) Wregs( j ) ° A WAR 3/8/04 CS 152 /j. Kubiatowicz hazard exists on register if 2004 Wregs( i ) Rregs( ) ©UCB Spring Lec 12. 22

Data Hazard Solution: Forwarding • “Forward” result from one stage to another Time (clock

Data Hazard Solution: Forwarding • “Forward” result from one stage to another Time (clock cycles) IF Reg Dm Im Reg ALU or r 8, r 1, r 9 WB ALU and r 6, r 1, r 7 MEM ALU O r d e r sub r 4, r 1, r 3 Im EX ALU I n s t r. add r 1, r 2, r 3 ID/RF xor r 10, r 11 Reg • “or” OK if define read/write properly 3/8/04 ©UCB Spring 2004 Reg Reg Dm Reg CS 152 / Kubiatowicz Lec 12. 23

Record of Pending Writes In Pipeline Registers IAU npc ° Current operand registers I

Record of Pending Writes In Pipeline Registers IAU npc ° Current operand registers I mem Regs op rw rs rt PC ° Pending writes ° hazard <= B A im n op rw alu S n op rw D mem m ((rt == rwwb) & reg. Wwb) n op rw Regs 3/8/04 ((rs == rwex) & reg. Wex) OR ((rs == rwmem) & reg. Wme) OR ((rs == rwwb) & reg. Wwb) OR ((rt == rwex) & reg. Wex) OR ((rt == rwmem) & reg. Wme) OR ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 24

Resolve RAW by “forwarding” (or bypassing) IAU npc I mem Regs op rw rs

Resolve RAW by “forwarding” (or bypassing) IAU npc I mem Regs op rw rs rt Forward mux B A im PC n op rw alu S n op rw ° Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe • Increase muxes to add paths from pipeline registers • Data Forwarding = Data Bypassing D mem m 3/8/04 Regs n op rw ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 25

Question: Critical Path? ? ? ° Bypass path is invariably trouble ° Options? Regs

Question: Critical Path? ? ? ° Bypass path is invariably trouble ° Options? Regs PC Sel Forward mux Equal B A alu S D mem m 3/8/04 Regs im • Make logic really fast • Move forwarding after muxes - Problem: screws up branches that require forwarding! - Use same tricks as “carry-skip” adder to fix this? - This option may just push delay around…. ! • Insert an extra cycle for branches that need forwarding? - Or: hit common case of forwarding from EX stage and stall forward from memory? ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 26

What about memory operations? º If instructions are initiated in order and operations always

What about memory operations? º If instructions are initiated in order and operations always occur in the same stage, there can be no op Rd Ra Rb hazards between memory operations! º What does delaying WB on arithmetic operations cost? – cycles ? op Rd Ra Rb – hardware ? º What about data dependence on loads? R 1 <- R 4 + R 5 R 2 <- Mem[ R 2 + I ] R 3 <- R 2 + R 1 “Delayed Loads” º Can recognize this in decode stage and introduce bubble while stalling fetch stage (hint for lab 4!) º Tricky situation: R 1 <- Mem[ R 2 + I ] Mem[R 3+34] <- R 1 Handle with bypass in memory stage! 3/8/04 ©UCB Spring 2004 Rd Rd A D B R Mem T to reg file CS 152 / Kubiatowicz Lec 12. 27

Compiler Avoiding Load Stalls: scheduled gcc spice unscheduled 54% 31% 42% 14% tex 65%

Compiler Avoiding Load Stalls: scheduled gcc spice unscheduled 54% 31% 42% 14% tex 65% 25% 0% 20% 40% 60% 80% % loads stalling pipeline ° Recall: MIPS I had no pipeline stalls • “Microprocessor without Interlocking Pipeline Stages • Consequently, the “Unscheduled” code above would be wrong 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 28

What about Interrupts, Traps, Faults? ° External Interrupts: • Allow pipeline to drain, Fill

What about Interrupts, Traps, Faults? ° External Interrupts: • Allow pipeline to drain, Fill with NOPs • Load PC with interrupt address ° Faults (within instruction, restartable) • Force trap instruction into IF • disable writes till trap hits WB • must save multiple PCs or PC + state ° Recall: Precise Exceptions State of the machine is preserved as if program executed up to the offending instruction • All previous instructions completed • Offending instruction and all following instructions act as if they have not even started • Same system code will work on different implementations 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 29

Exception/Interrupts: Implementation questions 5 instructions, executing in 5 different pipeline stages! ° Who caused

Exception/Interrupts: Implementation questions 5 instructions, executing in 5 different pipeline stages! ° Who caused the interrupt? Stage Problem interrupts occurring IF Page fault on instruction fetch; misaligned memory access; memory-protection violation ID Undefined or illegal opcode EX Arithmetic exception MEM Page fault on data fetch; misaligned memory access; memory-protection violation; memory error ° How do we stop the pipeline? How do we restart it? ° Do we interrupt immediately or wait? ° How do we sort all of this out to maintain preciseness? 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 30

Exception Handling IAU npc I mem Regs B A lw $2, 20($5) im n

Exception Handling IAU npc I mem Regs B A lw $2, 20($5) im n op rw detect bad instruction address PC Excp detect overflow alu S Excp detect bad data address D mem m 3/8/04 detect bad instruction Regs Excp Allow exception to take effect ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 31

Another look at the exception problem Time Bad Inst TLB fault Overflow IFetch Dcd

Another look at the exception problem Time Bad Inst TLB fault Overflow IFetch Dcd Exec IFetch Dcd Program Flow Data TLB Mem WB Exec Mem IFetch Dcd WB ° Use pipeline to sort this out! • Pass exception status along with instruction. • Keep track of PCs for every instruction in pipeline. • Don’t act on exception until it reache WB stage ° Handle interrupts through “faulting noop” in IF stage ° When instruction reaches end of MEM stage: • Save PC EPC, Interrupt vector addr PC • Turn all instructions in earlier stages into noops! 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 32

Examples of stalls/bubbles ° Exceptions: Flush everything above (later) • Prevent instructions following exception

Examples of stalls/bubbles ° Exceptions: Flush everything above (later) • Prevent instructions following exception from commiting state • Put faulting flag on current instruction • Freeze fetch until exception resolved ° Stalls: Introduce brief stalls into pipeline • • • Decode stage recognizes that current instruction cannot proceed Freeze fetch stage Introduce “bubble” into EX stage (instead of forwarding stalled inst) Can stall until condition is resolved Examples: - 3/8/04 mfhi, mflo: need to wait for multiply/divide unit to finish “Break” instruction for Lab 5: stall until release line received Load delay slot handled this way as well ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 33

Example: Delayed Load – Freeze above & Bubble Below IAU npc I mem Regs

Example: Delayed Load – Freeze above & Bubble Below IAU npc I mem Regs op rw rs rt freeze PC bubble B A im n op rw alu S ° Flush accomplished by setting “invalid” bit in pipeline n op rw D mem m n op rw Regs 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 34

Summary ° Hazards limit performance • Structural: need more HW resources • Data: need

Summary ° Hazards limit performance • Structural: need more HW resources • Data: need forwarding, compiler scheduling • Control: early evaluation & PC, delayed branch, prediction ° Data hazards must be handled carefully: • RAW data hazards handled by forwarding • WAW and WAR hazards don’t exist in 5 -stage pipeline • Some hazards handled by stalling ° MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) ° Exceptions in 5 -stage pipeline recorded when they occur, but acted on only at WB (end of MEM) stage • Must flush all previous instructions ° Compiler optimizations may be used to avoid stalls • Loop unrolling Multiple iterations of loop in software: - Amortizes loop overhead over several iterations - Gives more opportunity for scheduling around stalls 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 35

Summary: Where this class is going ° We’ll build a simple pipeline and look

Summary: Where this class is going ° We’ll build a simple pipeline and look at these issues • Lab 4 Pipelined Processor • Lab 5 With caches ° We’ll talk about modern processors and what’s really hard: • Branches (control hazards) are really hard! • Exception handling • Trying to improve performance with out-of-order execution, etc. • Trying to get CPI < 1 (Superscalar execution) 3/8/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 12. 36