Computer Architecture A Constructive Approach Multicycle SMIPS Implementations

  • Slides: 23
Download presentation
Computer Architecture: A Constructive Approach Multi-cycle SMIPS Implementations Joel Emer Computer Science & Artificial

Computer Architecture: A Constructive Approach Multi-cycle SMIPS Implementations Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -1

Harvard-Style Datapath for MIPS PCSrc br rind jabs pc+4 Reg. Write old way Mem.

Harvard-Style Datapath for MIPS PCSrc br rind jabs pc+4 Reg. Write old way Mem. Write WBSrc 0 x 4 Add clk PC addr clk inst 31 Inst. Memory we rs 1 rs 2 rd 1 ws wd rd 2 clk we addr ALU GPRs z Imm Ext rdata Data Memory wdata ALU Control Op. Code Reg. Dst March 5, 2012 Ext. Sel Op. Sel BSrc http: //csg. csail. mit. edu/6. S 078 zero? L 8 -2

old way Hardwired Control Table Opcode ALU Ext. Sel BSrc Op. Sel Mem. W

old way Hardwired Control Table Opcode ALU Ext. Sel BSrc Op. Sel Mem. W Reg. W WBSrc Reg. Dst PCSrc SW * s. Ext 16 u. Ext 16 s. Ext 16 Reg Imm Imm Func Op Op + + no no yes yes yes no ALU ALU Mem * rd rt rt rt * pc+4 pc+4 BEQZz=0 s. Ext 16 * 0? no no * * br BEQZz=1 s. Ext 16 * * * no no no * * pc+4 jabs * * 0? * * yes no yes PC * PC R 31 * R 31 jabs rind ALUiu LW J JAL JR JALR BSrc = Reg / Imm Reg. Dst = rt / rd / R 31 March 5, 2012 no no WBSrc = ALU / Mem / PC PCSrc = pc+4 / br / rind / jabs http: //csg. csail. mit. edu/6. S 078 L 8 -3

new way Single-Cycle SMIPS Register File PC +4 Inst Memory Decode 2 read &

new way Single-Cycle SMIPS Register File PC +4 Inst Memory Decode 2 read & 1 write ports Execute separate Instruction & Data memories Data Memory Datapath and control were derived automatically from a high-level rule-based description March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -4

Single-Cycle SMIPS code structure module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile

Single-Cycle SMIPS code structure module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. RFile; Memory mem <- mk. Two. Ported. Memory; let i. Mem = mem. iport ; let d. Mem = mem. dport; rule do. Proc; let inst = i. Mem(Mem. Req{op: Ld, addr: pc, data: ? }); let d. Inst = decode(inst); Data r. Val 1 = rf. rd 1(d. Inst. r. Src 1); Data r. Val 2 = rf. rd 2(d. Inst. r. Src 2); let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, pc); update rf, pc and d. Mem March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -5

Decoding Instructions: input-output types decode 31: 26, 5: 0 i. Type IType 31: 26

Decoding Instructions: input-output types decode 31: 26, 5: 0 i. Type IType 31: 26 instruction Bit#(32) 31: 26 br. Comp Br. Type 20: 16 15: 11 Mux control logic not shown 25: 21 20: 16 15: 0 25: 0 March 5, 2012 ext http: //csg. csail. mit. edu/6. S 078 r. Dst Rindex r. Src 1 Rindex r. Src 2 Rindex imm Bit#(32) imm. Valid Bool Type Decoded. Inst alu. Func Alu. Func 5: 0 L 8 -6

Reading Registers Read registers RSrc 1 RSrc 2 RF RVal 1 RVal 2 Pure

Reading Registers Read registers RSrc 1 RSrc 2 RF RVal 1 RVal 2 Pure combinational logic March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -7

Executing Instructions execute i. Type r. Dst d. Inst r. Val 2 data ALU

Executing Instructions execute i. Type r. Dst d. Inst r. Val 2 data ALU r. Val 1 ALUBr Pure combinational logic pc March 5, 2012 Branch Address http: //csg. csail. mit. edu/6. S 078 either for rf write or St either for memory addr reference or branch target br. Taken L 8 -8

Branch Address Calculation function Addr br. Addr. Calc(Address pc, Data val, IType i. Type,

Branch Address Calculation function Addr br. Addr. Calc(Address pc, Data val, IType i. Type, Data imm); let target. Addr = case (i. Type) J, Jal : {pc[31: 28], imm[27: 0]}; Jr, Jalr : val; default : pc + imm; endcase; return target. Addr; endfunction March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -9

Some Useful Functions function Bool mem. Type (IType i) return (i==Ld || i ==

Some Useful Functions function Bool mem. Type (IType i) return (i==Ld || i == St); endfunction Bool reg. Write. Type (IType i) return (i==Alu || i==Ld || i==Jalr); endfunction Bool control. Type (IType i) return (i==J || i==Jr || i==Jalr || i==Br); endfunction March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -10

Execute Function function Exec. Inst exec(Decoded. Inst, Data r. Val 1, Data r. Val

Execute Function function Exec. Inst exec(Decoded. Inst, Data r. Val 1, Data r. Val 2, Addr pc); Exec. Inst einst = ? ; Data alu. Val 2 = (d. Inst. imm. Valid)? d. Inst. imm : r. Val 2 let alu. Res = alu(r. Val 1, alu. Val 2, d. Inst. alu. Func); let br. Addr = br. Addr. Cal(pc, r. Val 1, d. Inst. i. Type, d. Inst. imm); einst. itype = d. Inst. i. Type; einst. addr = (mem. Type(d. Inst. i. Type)? alu. Res : br. Addr; einst. data = d. Inst. i. Type==St ? r. Val 2 : alu. Res; einst. br. Taken = alu. Br(r. Val 1, alu. Val 2, d. Inst. br. Comp); einst. r. Dst = d. Inst. r. Dst; return einst; endfunction March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -11

Single-Cycle SMIPS atomic state updates if(mem. Type(e. Inst. i. Type)) e. Inst. data <-

Single-Cycle SMIPS atomic state updates if(mem. Type(e. Inst. i. Type)) e. Inst. data <- d. Mem(Mem. Req{ op: e. Inst. i. Type==Ld ? Ld : St, addr: e. Inst. addr, data: e. Inst. data}); if(reg. Write. Type(e. Inst. i. Type)) rf. wr(e. Inst. r. Dst, e. Inst. data); pc <= e. Inst. br. Taken ? e. Inst. addr : pc + 4; endrule endmodule March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -12

Single-Cycle SMIPS: Clock Speed Register File PC +4 Decode Execute Inst Memory Data Memory

Single-Cycle SMIPS: Clock Speed Register File PC +4 Decode Execute Inst Memory Data Memory t. Clock > t. M + t. DEC + t. RF + t. ALU+ t. M+ t. WB We can improve the clock speed if we execute each instruction in two clock cycles t. Clock > max {t. M , (t. DEC + t. RF + t. ALU+ t. M+ t. WB )} March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -13

Two-Cycle SMIPS Register File stage PC +4 ir Decode Execute Data Memory Inst Memory

Two-Cycle SMIPS Register File stage PC +4 ir Decode Execute Data Memory Inst Memory Introduce register “ir” to hold a fetched instruction and register “stage” to remember which stage (fetch/execute) we are in March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -14

ir: The instruction register You may recall from our earlier discussion of pipelining that

ir: The instruction register You may recall from our earlier discussion of pipelining that when we take multiple cycles to perform some operation (e. g. , IFFT), there is a possibility that intermediate registers do not contain any meaningful data in some cycles It is straight forward to convert ir into a pipeline register n n March 5, 2012 We can associate (Valid/Invalid) bit with ir Equivalently, we can think of ir as a single-element FIFO http: //csg. csail. mit. edu/6. S 078 L 8 -15

Additional Types typedef struct { Addr pc; Bit#(32) inst; } Type. Fetch 2 Decode

Additional Types typedef struct { Addr pc; Bit#(32) inst; } Type. Fetch 2 Decode deriving (Bits, Eq); typedef enum {Fetch, Execute} Type. Stage deriving (Bits, Eq); March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -16

Two-Cycle SMIPS module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <-

Two-Cycle SMIPS module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. RFile; Memory mem <- mk. Two. Ported. Memory; let i. Mem = mem. iport; let d. Mem = mem. dport; Reg#(Type. Fetch 2 Decode) ir <- mk. Reg. U; Reg#(Type. Stage) stage <- mk. Reg(Fetch); rule do. Fetch (state==Fetch); let inst = i. Mem(Mem. Req{op: Ld, addr: pc, data: ? }); ir <= Type. Fetch 2 Decode{pc: pc, inst: inst}; stage <= Execute; endrule March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -17

Two-Cycle SMIPS rule do. Execute(stage==Execute); let irpc = ir. pc; let inst = ir.

Two-Cycle SMIPS rule do. Execute(stage==Execute); let irpc = ir. pc; let inst = ir. inst; let d. Inst = decode(inst); Data r. Val 1 = rf. rd 1(d. Inst. r. Src 1); Data r. Val 2 = rf. rd 2(d. Inst. r. Src 2); let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, irpc); if(mem. Type(e. Inst. i. Type)) e. Inst. data <- d. Mem(Mem. Req{ op: e. Inst. i. Type==Ld ? Ld : St, addr: e. Inst. addr, data: e. Inst. data}); if(reg. Write. Type(e. Inst. i. Type)) rf. wr(e. Inst. r. Dst, e. Inst. data); pc <= e. Inst. br. Taken ? e. Inst. addr : pc + 4; stage <= Fetch; no change from endrule endmodule single-cycle March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -18

Princeton versus Harvard Architecture Harvard architecture uses different memories for instructions and data n

Princeton versus Harvard Architecture Harvard architecture uses different memories for instructions and data n needed for a single-cycle implementation Princeton architecture uses the same memory for instruction and data and thus, requires at least two cycles to execute Load/Store instructions The two-cycle implementations of Princeton and Harvard architectures are almost the same March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -19

SMIPS Princeton Architecture Register File stage PC +4 ir Decode Memory March 5, 2012

SMIPS Princeton Architecture Register File stage PC +4 ir Decode Memory March 5, 2012 Execute Since both the Fetch and Execute stages want to use the memory, there is a structural hazard in accessing memory http: //csg. csail. mit. edu/6. S 078 L 8 -20

Two-Cycle SMIPS Princeton module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf

Two-Cycle SMIPS Princeton module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. RFile; Memory mem <- mk. One. Ported. Memory; let u. Mem = mem. port; Reg#(Type. Fetch 2 Decode) ir <- mk. Reg. U; Reg#(Type. Stage) stage <- mk. Reg(Fetch); rule do. Fetch (stage==Fetch); let inst <- u. Mem(Mem. Req{op: Ld, addr: pc, data: ? }); ir <= Type. Fetch 2 Decode{pc: pc, inst: inst}; stage <= Execute; endrule March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -21

Two-Cycle SMIPS Princeton rule do. Execute(stage==Execute); let irpc = ir. pc; let inst =

Two-Cycle SMIPS Princeton rule do. Execute(stage==Execute); let irpc = ir. pc; let inst = ir. inst; let d. Inst = decode(inst); Data r. Val 1 = rf. rd 1(d. Inst. r. Src 1); Data r. Val 2 = rf. rd 2(d. Inst. r. Src 2); let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, irpc); if(mem. Type(e. Inst. i. Type)) e. Inst. data <- u. Mem(Mem. Req{ op: e. Inst. i. Type==Ld ? Ld : St, addr: e. Inst. addr, data: e. Inst. data}); if(reg. Write. Type(e. Inst. i. Type)) rf. wr(e. Inst. r. Dst, e. Inst. data); pc <= e. Inst. br. Taken ? e. Inst. addr : pc + 4; stage <= Fetch; endrule endmodule March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -22

Two-Cycle SMIPS: Fetch Analysis Execute Register File stage PC +4 ir Decode Execute Data

Two-Cycle SMIPS: Fetch Analysis Execute Register File stage PC +4 ir Decode Execute Data Memory Inst Memory In any given clock cycle, lots of unused hardware! next lecture: Pipelining to increase throughput March 5, 2012 http: //csg. csail. mit. edu/6. S 078 L 8 -23