Pipelined b Processor 1 Recap Unpipelined b XAdr

Pipelined b Processor 1

Recap: Unpipelined b XAdr PCSEL ILL JT OP A Instruction Memory D 4 3 2 1 0 ra<20: 16> 00 PC WASEL XP +4 1 IRQ rc<25: 21> + Z Control Logic PCSEL RA 2 SEL ASEL BSEL WDSEL ALUFN Wr WERF WASEL C<15: 0> <<2 0 Z ASEL ALUFN 1 RA 2 SEL WERF JT C<15: 0> sign extended <PC> + 4 C rc<25: 21> RA 1 RA 2 WD WA Register RD 1 File RD 2 WE 0 <PC> + 4 rb<15: 11> 0 1 A Wr 1 1 BSEL WD A B ALU 0 0 R/W Data Memory RD 2 WDSEL What can we do to make it run FASTER? 2

CPU Performance MIPS = Frequency in MHz Clocks Per Instruction (CPI) . To increase MIPS: – Decrease CPI » Instruction set simplicity reduces CPI to 1. 0 » To reduce CPI below 1. 0 need multiple instruction issue machines (stay tuned!) – Increase Frequency » Frequency limited by longest combinational path from register outputs to register inputs » Can therefore increase frequency by pipelining 3

Pipeline Stages Goal: Maintain (nearly) 1. 0 CPI, but increase clock speed Approach: Structure processor as 4 stage pipeline – Instruction Fetch: Maintains PC. Fetches one instruction per cycle. – Register File: Reads source operands from register file. – ALU: Performs indicated operation. – Write-Back: Writes result back into register file. IF RF ALU WB 4

Sketch of 4 -Stage Pipeline Instruction Fetch IF instruction Register File CL instruction ALU CL instruction Write Back RF (read) A B ALU Y Same RF as above! CL RF (write) Need to pass through the stages for branch and jump instructions 5

4 -Stage b Pipeline XAdr PCSEL ILL JT OP • Omits some detail • No bypass or interlock logic 4 3 2 1 0 A Instruction Memory D PCIF 00 IF +4 PCRF IRRF ra<20: 16> rb<15: 11> + 0 1 RA 2 SEL RA 1 RA 2 Register RD 1 File RD 2 C<15: 0> <<2 Z <PC> + 4 C RF rc<25: 21> JT C<15: 0> ASEL PCALU IRALU WB 1 A ALU 0 DALU B Y DWB XP rc<25: 21> BSEL B IRWB WASEL 0 A ALUFN PCWB 0 1 1 WA 0 Register File 1 WD 2 A WD RD Data Memory R/W WDSEL WERF Wr 6

4 -Pipeline Execution Consider a sequence of instructions … ADDC(r 1, 1, r 2) SUBC(r 1, 1, r 3) XOR(r 1, r 5, r 1) MUL(r 2, r 5, r 0) … executed on the 4 -stage pipeline: i TIME (cycles) i+1 i+2 i+3 i+4 Pipeline IF ADDC SUBC XOR MUL RF ALU WB ADDC SUBC XOR i+5 i+6 . . . MUL ADDC SUBC XOR . . . MUL 7

Branch Execution Consider a different sequence: LOOP: i IF CMP RF ALU WB CMPLEC(r 3, 100, r 0) ADD(r 1, r 2, r 3) SUB(r 1, r 2, r 4) BNE(r 0, LOOP) XOR(r 31, r 3) … i+1 i+2 i+3 i+4 i+5 ADD SUB BNE ? CMP ADD SUB BNE CMP ADD SUB i+6 BNE What instruction should we fetch after BNE ? 8

Branch Execution - II i IF CMP RF i+1 i+2 i+3 i+4 ADD SUB BNE ? CMP ADD SUB BNE CMP ADD SUB ALU WB PCSEL 4 3 2 1 0 IF PCIF 00 XOR BNE i+6 BNE A Instruction Memory D +4 PCRF RF i+5 IRRF <PC> + 4 C + C<15: 0> <<2 0 1 RA 2 SEL RA 1 RA 2 Register RD 1 File RD 2 Z PCIF will output correct address in clock cycle. 9

Branch Delay Slots Problem: One (or more) following instructions have been prefetched by the time a branch is taken. Possible solutions are: – “Program around it” » Follow each branch instruction with a NOP = ADD(r 31, r 31) instruction. » Make the compiler clever enough to move useful instructions after branches. These instructions will be executed regardless of whether the branch is taken or not. – Make pipeline “annul” instruction following branch which is taken, e. g. , by disabling RF write and Memory write. 10

Annulling Prefetched Instructions XAdr PCSEL ILL JT OP 4 3 2 1 0 A Instruction Memory D PCIF 00 NOP XORC BNE +4 0 PCRF 1 2 IRRF ra<20: 16> <PC> + 4 C PCALU rb<15: 11> + rc<25: 21> 0 1 RA 2 SEL RA 1 Register RA 2 RD 1 File RD 2 C<15: 0> <<2 Z ADD Annul. IF JT IRALU Annul. IF = 1 if branch is taken, i. e. , PCSEL = 1 Similarly for JMP instruction 11

Data Hazards Consider the sequence: ADD(r 1, r 2, r 3) CMPLEC(r 3, 5, r 0) MULC(r 1, r 2, r 4) SUB(r 1, r 2, r 5) … i IF ADD RF ALU WB i+1 i+2 i+3 i+4 i+5 CMP MUL SUB ADD CMP MUL i+6 SUB ADD writes new value in r 3 during cycle i + 3, which is available beginning of cycle i + 4. Value of r 3 read by CMP during cycle i + 2. CMP reads old value of r 3. 12

Data Hazards: Solution - I Solution I (software) – “Program around it” » Compiler inserts NOPs » Compiler rewrites instruction sequence ADD(r 1, r 2, r 3) CMPLEC(r 3, 5, r 0) Insert how many NOPs here ? Rewrite ADD(r 1, r 2, r 3) CMPLEC(r 3, 5, r 0) MUL(r 1, r 2, r 4) SUB(r 1, r 2, r 5) ADD(r 1, r 2, r 3) MUL(r 1, r 2, r 4) as SUB(r 1, r 2, r 5) CMPLEC(r 3, 5, r 0) 13

Data Hazards: Solution - II Solution II (hardware) – Detect problem and stall the pipeline » Freeze IF, RF stages for 2 cycles and insert NOPs into IRALU for 2 cycles ADD(r 1, r 2, r 3) CMPLEC(r 3, 5, r 0) r 3 read i IF ADD RF ALU WB i+1 i+2 i+3 i+4 i+5 CMP MUL MUL SUB ADD CMP CMP MUL SUB ADD NOP 1 NOP 2 CMP MUL ADD i+6 NOP 1 NOP 2 CMP r 3 written 14

Pipeline Stalls XAdr PCSEL ILL JT OP 4 3 2 1 0 A Instruction Memory D PCIF 00 MUL IF +4 PCRF IRRF ra<20: 16> + CMP Z JT C<15: 0> STALLALU 0 1 ASEL IRALU WB 0 1 1 A ALU 0 DALU B Y DWB XP rc<25: 21> BSEL B IRWB WASEL 0 A ALUFN PCWB RA 2 SEL RA 2 Register RD 1 File RD 2 NOP PCALU 1 RA 1 <PC> + 4 C RF rc<25: 21> 0 C<15: 0> <<2 ADD rb<15: 11> 1 WA 0 Register File 1 WD 2 A WD RD Data Memory R/W WDSEL WERF Wr 15

Data Hazards: Solution - III Solution III (hardware) – Bypass paths. » Extra data paths and control logic which re-route data in problem cases. ADD(r 1, r 2, r 3) CMPLEC(r 3, 5, r 0) r 3 read i IF ADD RF ALU WB i+1 r 3 compared to 5 i+2 i+3 i+4 i+5 CMP MUL SUB ADD CMP MUL i+6 SUB r 1 + r 2 computed r 3 written 16

Bypass Paths - I IRRF CMPLEC(r 3, 5, r 0) 0 1 RA 2 SEL RA 1 RA 2 Register RD 1 File RD 2 ASEL 0 1 IRALU 1 A ADD(r 1, r 2, r 3) 0 BSEL B ALU IRWB Y 1 0 WA 2 WDSEL WD Select bypass if OPCODERF = OP, OPC, . . . OPCODEALU = OP, OPC, . . . ra. RF = rc. ALU AND (Similar path for other ALU input) 17

Bypass Paths - II IRRF SUB(r 1, r 2, r 3) 0 1 RA 2 SEL RA 1 RA 2 Register RD 1 File RD 2 ASEL 0 1 IRALU A ? ? ? BSEL 0 B ALU IRWB Y XOR(r 4, r 5, r 2) WASEL 1 0 1 2 WDSEL 1 0 WA Register File WD WERF Select bypass if OPCODERF = OP WERF = 1 rb. RF = WA AND (Similar path for other ALU input) 18

Next Time: Pipelining Subtleties Dilbert : S. Adams 19