Computer Architecture A Constructive Approach Multicycle SMIPS Implementations

Harvard-Style Datapath for MIPS PCSrc br rind jabs pc+4 Reg. Write old way Mem.

old way Hardwired Control Table Opcode ALU Ext. Sel BSrc Op. Sel Mem. W

new way Single-Cycle SMIPS Register File PC +4 Inst Memory Decode 2 read &

Single-Cycle SMIPS code structure module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile

Decoding Instructions: input-output types decode 31: 26, 5: 0 i. Type IType 31: 26

Reading Registers Read registers RSrc 1 RSrc 2 RF RVal 1 RVal 2 Pure

Executing Instructions execute i. Type r. Dst d. Inst r. Val 2 data ALU

Branch Address Calculation function Addr br. Addr. Calc(Address pc, Data val, IType i. Type,

Some Useful Functions function Bool mem. Type (IType i) return (i==Ld || i ==

Execute Function function Exec. Inst exec(Decoded. Inst, Data r. Val 1, Data r. Val

Single-Cycle SMIPS atomic state updates if(mem. Type(e. Inst. i. Type)) e. Inst. data <-

Single-Cycle SMIPS: Clock Speed Register File PC +4 Decode Execute Inst Memory Data Memory

Two-Cycle SMIPS Register File stage PC +4 ir Decode Execute Data Memory Inst Memory

ir: The instruction register You may recall from our earlier discussion of pipelining that

Additional Types typedef struct { Addr pc; Bit#(32) inst; } Type. Fetch 2 Decode

Two-Cycle SMIPS module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <-

Two-Cycle SMIPS rule do. Execute(stage==Execute); let irpc = ir. pc; let inst = ir.

Princeton versus Harvard Architecture Harvard architecture uses different memories for instructions and data n

SMIPS Princeton Architecture Register File stage PC +4 ir Decode Memory March 5, 2012

Two-Cycle SMIPS Princeton module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf

Two-Cycle SMIPS Princeton rule do. Execute(stage==Execute); let irpc = ir. pc; let inst =

Two-Cycle SMIPS: Fetch Analysis Execute Register File stage PC +4 ir Decode Execute Data

Slides: 23

Download presentation

Computer Architecture: A Constructive Approach Multi-cycle SMIPS Implementations Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -1

Harvard-Style Datapath for MIPS PCSrc br rind jabs pc+4 Reg. Write old way Mem. Write WBSrc 0 x 4 Add clk PC addr clk inst 31 Inst. Memory we rs 1 rs 2 rd 1 ws wd rd 2 clk we addr ALU GPRs z Imm Ext rdata Data Memory wdata ALU Control Op. Code Reg. Dst March 5, 2012 Ext. Sel Op. Sel BSrc http: //csg. csail. mit. edu/6. S 078 zero? L 8 -2

old way Hardwired Control Table Opcode ALU Ext. Sel BSrc Op. Sel Mem. W Reg. W WBSrc Reg. Dst PCSrc SW * s. Ext 16 u. Ext 16 s. Ext 16 Reg Imm Imm Func Op Op + + no no yes yes yes no ALU ALU Mem * rd rt rt rt * pc+4 pc+4 BEQZz=0 s. Ext 16 * 0? no no * * br BEQZz=1 s. Ext 16 * * * no no no * * pc+4 jabs * * 0? * * yes no yes PC * PC R 31 * R 31 jabs rind ALUiu LW J JAL JR JALR BSrc = Reg / Imm Reg. Dst = rt / rd / R 31 March 5, 2012 no no WBSrc = ALU / Mem / PC PCSrc = pc+4 / br / rind / jabs http: //csg. csail. mit. edu/6. S 078 L 8 -3

new way Single-Cycle SMIPS Register File PC +4 Inst Memory Decode 2 read & 1 write ports Execute separate Instruction & Data memories Data Memory Datapath and control were derived automatically from a high-level rule-based description March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -4

Single-Cycle SMIPS code structure module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. RFile; Memory mem <- mk. Two. Ported. Memory; let i. Mem = mem. iport ; let d. Mem = mem. dport; rule do. Proc; let inst = i. Mem(Mem. Req{op: Ld, addr: pc, data: ? }); let d. Inst = decode(inst); Data r. Val 1 = rf. rd 1(d. Inst. r. Src 1); Data r. Val 2 = rf. rd 2(d. Inst. r. Src 2); let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, pc); update rf, pc and d. Mem March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -5

Decoding Instructions: input-output types decode 31: 26, 5: 0 i. Type IType 31: 26 instruction Bit#(32) 31: 26 br. Comp Br. Type 20: 16 15: 11 Mux control logic not shown 25: 21 20: 16 15: 0 25: 0 March 5, 2012 ext http: //csg. csail. mit. edu/6. S 078 r. Dst Rindex r. Src 1 Rindex r. Src 2 Rindex imm Bit#(32) imm. Valid Bool Type Decoded. Inst alu. Func Alu. Func 5: 0 L 8 -6

Reading Registers Read registers RSrc 1 RSrc 2 RF RVal 1 RVal 2 Pure combinational logic March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -7

Executing Instructions execute i. Type r. Dst d. Inst r. Val 2 data ALU r. Val 1 ALUBr Pure combinational logic pc March 5, 2012 Branch Address http: //csg. csail. mit. edu/6. S 078 either for rf write or St either for memory addr reference or branch target br. Taken L 8 -8

Branch Address Calculation function Addr br. Addr. Calc(Address pc, Data val, IType i. Type, Data imm); let target. Addr = case (i. Type) J, Jal : {pc[31: 28], imm[27: 0]}; Jr, Jalr : val; default : pc + imm; endcase; return target. Addr; endfunction March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -9

Some Useful Functions function Bool mem. Type (IType i) return (i==Ld || i == St); endfunction Bool reg. Write. Type (IType i) return (i==Alu || i==Ld || i==Jalr); endfunction Bool control. Type (IType i) return (i==J || i==Jr || i==Jalr || i==Br); endfunction March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -10

Execute Function function Exec. Inst exec(Decoded. Inst, Data r. Val 1, Data r. Val 2, Addr pc); Exec. Inst einst = ? ; Data alu. Val 2 = (d. Inst. imm. Valid)? d. Inst. imm : r. Val 2 let alu. Res = alu(r. Val 1, alu. Val 2, d. Inst. alu. Func); let br. Addr = br. Addr. Cal(pc, r. Val 1, d. Inst. i. Type, d. Inst. imm); einst. itype = d. Inst. i. Type; einst. addr = (mem. Type(d. Inst. i. Type)? alu. Res : br. Addr; einst. data = d. Inst. i. Type==St ? r. Val 2 : alu. Res; einst. br. Taken = alu. Br(r. Val 1, alu. Val 2, d. Inst. br. Comp); einst. r. Dst = d. Inst. r. Dst; return einst; endfunction March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -11

Single-Cycle SMIPS atomic state updates if(mem. Type(e. Inst. i. Type)) e. Inst. data <- d. Mem(Mem. Req{ op: e. Inst. i. Type==Ld ? Ld : St, addr: e. Inst. addr, data: e. Inst. data}); if(reg. Write. Type(e. Inst. i. Type)) rf. wr(e. Inst. r. Dst, e. Inst. data); pc <= e. Inst. br. Taken ? e. Inst. addr : pc + 4; endrule endmodule March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -12

Single-Cycle SMIPS: Clock Speed Register File PC +4 Decode Execute Inst Memory Data Memory t. Clock > t. M + t. DEC + t. RF + t. ALU+ t. M+ t. WB We can improve the clock speed if we execute each instruction in two clock cycles t. Clock > max {t. M , (t. DEC + t. RF + t. ALU+ t. M+ t. WB )} March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -13

Two-Cycle SMIPS Register File stage PC +4 ir Decode Execute Data Memory Inst Memory Introduce register “ir” to hold a fetched instruction and register “stage” to remember which stage (fetch/execute) we are in March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -14

ir: The instruction register You may recall from our earlier discussion of pipelining that when we take multiple cycles to perform some operation (e. g. , IFFT), there is a possibility that intermediate registers do not contain any meaningful data in some cycles It is straight forward to convert ir into a pipeline register n n March 5, 2012 We can associate (Valid/Invalid) bit with ir Equivalently, we can think of ir as a single-element FIFO http: //csg. csail. mit. edu/6. S 078 L 8 -15

Additional Types typedef struct { Addr pc; Bit#(32) inst; } Type. Fetch 2 Decode deriving (Bits, Eq); typedef enum {Fetch, Execute} Type. Stage deriving (Bits, Eq); March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -16

Two-Cycle SMIPS module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. RFile; Memory mem <- mk. Two. Ported. Memory; let i. Mem = mem. iport; let d. Mem = mem. dport; Reg#(Type. Fetch 2 Decode) ir <- mk. Reg. U; Reg#(Type. Stage) stage <- mk. Reg(Fetch); rule do. Fetch (state==Fetch); let inst = i. Mem(Mem. Req{op: Ld, addr: pc, data: ? }); ir <= Type. Fetch 2 Decode{pc: pc, inst: inst}; stage <= Execute; endrule March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -17

Two-Cycle SMIPS rule do. Execute(stage==Execute); let irpc = ir. pc; let inst = ir. inst; let d. Inst = decode(inst); Data r. Val 1 = rf. rd 1(d. Inst. r. Src 1); Data r. Val 2 = rf. rd 2(d. Inst. r. Src 2); let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, irpc); if(mem. Type(e. Inst. i. Type)) e. Inst. data <- d. Mem(Mem. Req{ op: e. Inst. i. Type==Ld ? Ld : St, addr: e. Inst. addr, data: e. Inst. data}); if(reg. Write. Type(e. Inst. i. Type)) rf. wr(e. Inst. r. Dst, e. Inst. data); pc <= e. Inst. br. Taken ? e. Inst. addr : pc + 4; stage <= Fetch; no change from endrule endmodule single-cycle March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -18

Princeton versus Harvard Architecture Harvard architecture uses different memories for instructions and data n needed for a single-cycle implementation Princeton architecture uses the same memory for instruction and data and thus, requires at least two cycles to execute Load/Store instructions The two-cycle implementations of Princeton and Harvard architectures are almost the same March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -19

SMIPS Princeton Architecture Register File stage PC +4 ir Decode Memory March 5, 2012 Execute Since both the Fetch and Execute stages want to use the memory, there is a structural hazard in accessing memory http: //csg. csail. mit. edu/6. S 078 L 8 -20

Two-Cycle SMIPS Princeton module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. RFile; Memory mem <- mk. One. Ported. Memory; let u. Mem = mem. port; Reg#(Type. Fetch 2 Decode) ir <- mk. Reg. U; Reg#(Type. Stage) stage <- mk. Reg(Fetch); rule do. Fetch (stage==Fetch); let inst <- u. Mem(Mem. Req{op: Ld, addr: pc, data: ? }); ir <= Type. Fetch 2 Decode{pc: pc, inst: inst}; stage <= Execute; endrule March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -21

Two-Cycle SMIPS Princeton rule do. Execute(stage==Execute); let irpc = ir. pc; let inst = ir. inst; let d. Inst = decode(inst); Data r. Val 1 = rf. rd 1(d. Inst. r. Src 1); Data r. Val 2 = rf. rd 2(d. Inst. r. Src 2); let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, irpc); if(mem. Type(e. Inst. i. Type)) e. Inst. data <- u. Mem(Mem. Req{ op: e. Inst. i. Type==Ld ? Ld : St, addr: e. Inst. addr, data: e. Inst. data}); if(reg. Write. Type(e. Inst. i. Type)) rf. wr(e. Inst. r. Dst, e. Inst. data); pc <= e. Inst. br. Taken ? e. Inst. addr : pc + 4; stage <= Fetch; endrule endmodule March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -22

Two-Cycle SMIPS: Fetch Analysis Execute Register File stage PC +4 ir Decode Execute Data Memory Inst Memory In any given clock cycle, lots of unused hardware! next lecture: Pipelining to increase throughput March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -23