NonPipelined Processors Arvind Computer Science Artificial Intelligence Lab
Non-Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -1
Single-Cycle RISC Processor As an illustrative example, we will use a subset of RISC -V 32 -bit ISA. PC +4 Inst Memory Register File Decode 2 read & 1 write ports Execute separate Instruction & Data memories Data Memory Datapath and control are derived automatically from a high-level rule-based description February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -2
Single-Cycle Implementation code structure module mk. Proc(Proc); to be explained later Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. RFile; instantiate the state IMemory i. Mem <- mk. IMemory; DMemory d. Mem <- mk. DMemory; rule do. Proc; let inst = i. Mem. req(pc); extracts fields let d. Inst = decode(inst); needed for let r. Val 1 = rf. rd 1(d. Inst. r. Src 1); execution let r. Val 2 = rf. rd 2(d. Inst. r. Src 2); let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, pc); produces values update rf, pc and d. Mem needed to update the processor state http: //csg. csail. mit. edu/6. 375 February 29, 2016 L 09 -3
RISC-V Register States 32 general purpose registers (GPR) n n n x 0, x 1, …, x 31 32 -bit wide integer registers x 0 is hard-wired to zero Program counter (PC) n 32 -bit wide CSR (Control and Status Registers) n n n February 29, 2016 cycle instret mhartid mtohost. . . http: //csg. csail. mit. edu/6. 375 will be implemented in labs as needed L 09 -4
Computational Instructions Register-Register instructions (R-type) 7 funct 7 n n n 5 rs 2 5 rs 1 3 funct 3 5 rd 7 opcode=OP: rd rs 1 (funct 3, funct 7) rs 2 funct 3 = SLT/SLTU/AND/OR/XOR/SLL funct 3= ADD w funct 7 = 0000000: rs 1 + rs 2 w funct 7 = 0100000: rs 1 – rs 2 n funct 3 = SRL w funct 7 = 0000000: logical shift right w funct 7 = 0100000: arithmetic shift right February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -5
Computational Instructions cont Register-immediate instructions (I-type) 12 imm[11: 0] n n n 5 rs 1 3 funct 3 5 rd 7 opcode = OP-IMM: rd rs 1 (funct 3) I-imm = sign. Extend(inst[31: 20]) funct 3 = ADDI/SLTIU/ANDI/ORI/XORI A slight variant in coding for shift instructions SLLI / SRAI n February 29, 2016 rd rs 1 (funct 3, inst[30]) I-imm[4: 0] http: //csg. csail. mit. edu/6. 375 L 09 -6
Control Instructions Unconditional jump and link (UJ-type) 1 imm[20] n n n 10 imm[10: 1] 1 imm[11] 8 imm[19: 12] 5 rd 7 opcode = JAL: rd pc + 4; pc + J-imm = sign. Extend({inst[31], inst[19: 12], inst[20], inst[30: 21], 1’b 0}) Jump ± 1 MB range Unconditional jump via register and link (I-type) 12 imm[11: 0] n n 5 rs 1 3 funct 3 5 rd 7 opcode = JALR: rd pc + 4; pc (rs 1 + I-imm) & ~0 x 01 I-imm = sign. Extend(inst[31: 20]) February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -7
Control Instructions 1 6 imm[12] imm[10: 5] 5 rs 2 5 rs 1 cont. 3 4 1 funct 3 imm[4: 1] imm[11] 7 opcode 1’b 0 means it’s half-word aligned. This is because RISC V allows 16 -bit compressed format of instructions February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -8
Load & Store Instructions Load (I-type) 12 imm[11: 0] n n n 5 rs 1 3 funct 3 5 rd 7 opcode = LOAD: rd mem[rs 1 + I-imm] I-imm = sign. Extend(inst[31: 20]) funct 3 = LW/LB/LBU/LH/LHU Store (S-type) 7 imm[11: 5] n n n 5 rs 2 5 rs 1 3 funct 3 5 imm[4: 0] 7 opcode = STORE: mem[rs 1 + S-imm] rs 2 S-imm = sign. Extend({inst[31: 25], inst[11: 7]}) funct 3 = SW/SB/SH February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -9
Decoding Instructions: extract fields needed for execution instruction Bit#(32) pure combinational logic: derived automatically from the high-level description 6: 0 i. Type IType 6: 0, 14: 12, 30 alu. Func Alu. Func 6: 0, 14: 12 br. Comp Br. Func 11: 7 r. Dst Maybe#(RIndx) r. Src 1 Maybe#(RIndx) r. Src 2 Maybe#(RIndx) imm Maybe#(Bit#(32)) 19: 15 24: 20 31: 7 ext February 29, 2016 Type Decoded. Inst decode http: //csg. csail. mit. edu/6. 375 L 09 -10
Decoded Instruction Type typedef struct { IType i. Type; Alu. Func alu. Func; Br. Func br. Func; Maybe#(RIndx) dst; Maybe#(RIndx) src 1; Maybe#(RIndx) src 2; Maybe#(Data) imm; } Decoded. Inst deriving(Bits, Eq); Destination register 0 behaves like an Invalid destination Instruction groups with similar executions paths typedef enum {Unsupported, Alu, Ld, St, J, Jr, Br, Auipc} IType deriving(Bits, Eq); typedef enum {Add, Sub, And, Or, Xor, Sltu, Sll, Sra, Srl} Alu. Func deriving(Bits, Eq); typedef enum {Eq, Neq, Ltu, Geu, AT, NT} Br. Func deriving(Bits, Eq); February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -11
Internal names for various opcode and funct 3 patterns // opcode Bit#(7) op. Op. Imm = 7'b 0010011; // OP-IMM Bit#(7) op. Op = 7'b 0110011; // OP Bit#(7) op. Lui = 7'b 0110111; // LUI Bit#(7) op. Auipc = 7'b 0010111; // AUIPC. . . Bit#(7) op. Jal = 7'b 1101111; // JAL Bit#(7) op. Jalr = 7'b 1100111; // JALR Bit#(7) op. Branch = 7'b 1100011; // BRANCH Bit#(7) op. Load = 7'b 0000011; // LOAD Bit#(7) op. Store = 7'b 0100011; // STORE // funct 3 Bit#(3) fn. ADD = 3'b 000; // ADD Bit#(3) fn. SLL = 3'b 001; // SLL Bit#(3) fn. SLT = 3'b 010; // SLT …… February 29, 2016 http: //csg. csail. mit. edu/6. 375 Values are specified in the RISC-V ISA L 09 -12
Decode Function function Decoded. Inst decode(Bit#(32) inst); Decoded. Inst = ? ; initially let opcode = inst[ 6 : 0 ]; undefined let rd = inst[ 11 : 7 ]; let funct 3 = inst[ 14 : 12 ]; let rs 1 = inst[ 19 : 15 ]; let rs 2 = inst[ 24 : 20 ]; let alu. Sel = inst[ 30 ]; // Add/Sub, Srl/Sra Bit#(32)imm. I=…; Bit#(32)imm. S=…; Bit#(32)imm. B=…; Bit#(32)imm. U=…; Bit#(32)imm. J=…; // I/S/B/U/J-imm case (opcode) op. Op. . . endcase return d. Inst; endfunction February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -13
Decoding Instructions: Computational Instructions op. Op: begin d. Inst. i. Type = Alu; d. Inst. alu. Func = case (funct 3) fn. AND: And; fn. SLTU: Sltu; … fn. ADD: Add; alu. Sel == 0 ? Add : Sub; fn. SR: alu. Sel == 0 ? Srl : Sra; endcase; d. Inst. br. Func = NT; d. Inst. dst = Valid rd; d. Inst. src 1 = Valid rs 1; d. Inst. src 2 = Invalid; Valid rs 2; Decoding instructions d. Inst. imm = Invalid; Valid Imm. I; with immediate operand end (i. e. opcode = OP-IMM) is similar February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -14
Decoding Instructions: Conditional Branch op. Branch: begin Maybe#(Br. Func) br. F = case(funct 3) fn. BEQ: Valid Eq; … fn. BGEU: Valid Geu; default: Invalid; endcase; d. Inst. i. Type = is. Valid(br. F) ? Br : Unsupported; d. Inst. alu. Func = ? ; d. Inst. br. Func = from. Maybe(? , br. F); d. Inst. dst = Invalid; d. Inst. src 1 = Valid rs 1; d. Inst. src 2 = Valid rs 2; d. Inst. imm = Valid imm. B; end February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -15
Decoding Instructions: Load & Store op. Load: begin // only support LW d. Inst. i. Type = funct 3 == fn. LW ? Ld : Unsupported; d. Inst. alu. Func = Add; // calc effective addr d. Inst. br. Func = NT; d. Inst. dst = Valid rd; d. Inst. src 1 = Valid rs 1; d. Inst. src 2 = Invalid; d. Inst. imm = Valid imm. I; end op. Store: begin // only support SW d. Inst. i. Type = funct 3 == fn. SW ? St : Unsupported; d. Inst. alu. Func = Add; // calc effective addr d. Inst. br. Func = NT; d. Inst. dst = Invalid; d. Inst. src 1 = Valid rs 1; d. Inst. src 2 = Valid rs 2; d. Inst. imm = Valid imm. S; end February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -16
Reading Registers and Executing Instructions execute i. Type dst d. Inst data r. Val 2 src 2 RF src 1 ALU r. Val 1 ALUBr Pure combinational logic February 29, 2016 pc Branch Address http: //csg. csail. mit. edu/6. 375 either for rf write or St either for memory addr reference or branch target br. Taken miss. Predict L 09 -17
Output type of exec function typedef struct { IType i. Type; Maybe#(RIndx) dst; Data data; Addr addr; Bool mispredict; Bool br. Taken; } Exec. Inst deriving(Bits, Eq); February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -18
Execute Function function Exec. Inst exec(Decoded. Inst, Data r. Val 1, Data r. Val 2, Addr pc); Exec. Inst e. Inst = ? ; Data alu. Val 2 = from. Maybe(r. Val 2, d. Inst. imm); let alu. Res e. Inst. i. Type e. Inst. data = alu(r. Val 1, alu. Val 2, d. Inst. alu. Func); = d. Inst. i. Type; Needed to load PC = case (d. Inst. i. Type) St : r. Val 2; into a register J, Jr: (pc+4); Auipc: (pc+from. Maybe(? , d. Inst. imm)); default: alu. Res; endcase let br. Taken let br. Addr = alu. Br(r. Val 1, r. Val 2, d. Inst. br. Func); = br. Addr. Calc(pc, r. Val 1, d. Inst. i. Type, from. Maybe(? , d. Inst. imm), br. Taken); = br. Taken; = (d. Inst. i. Type==Ld || d. Inst. i. Type==St)? alu. Res : br. Addr; = d. Inst. dst; e. Inst. br. Taken e. Inst. addr e. Inst. dst return e. Inst; endfunction February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -19
Single-Cycle SMIPS atomic state updates if(e. Inst. i. Type == Ld) e. Inst. data <- d. Mem. req(Mem. Req{op: Ld, addr: e. Inst. addr, data: ? }); else if (e. Inst. i. Type == St) let dummy <- d. Mem. req(Mem. Req{op: St, addr: e. Inst. addr, data: data}); if(is. Valid(e. Inst. dst)) rf. wr(from. Maybe(? , e. Inst. dst), e. Inst. data); pc <= e. Inst. br. Taken ? e. Inst. addr : pc + 4; endrule endmodule February 29, 2016 state updates The whole processor is described using one rule; lots of big combinational functions http: //csg. csail. mit. edu/6. 375 L 09 -20
Harvard-Style Datapath for MIPS PCSrc br rind jabs pc+4 Reg. Write old way Mem. Write WBSrc 0 x 4 Add clk PC clk addr inst 31 Inst. Memory we rs 1 rs 2 rd 1 ws wd rd 2 clk we addr ALU GPRs z Imm Ext rdata Data Memory wdata ALU Control Op. Code Reg. Dst February 29, 2016 Ext. Sel Op. Sel BSrc http: //csg. csail. mit. edu/6. 375 zero? L 09 -21
old way Hardwired Control Table Opcode ALU Ext. Sel BSrc Op. Sel Mem. W Reg. W WBSrc Reg. Dst PCSrc SW * s. Ext 16 u. Ext 16 s. Ext 16 Reg Imm Imm Func Op Op + + no no yes yes yes no ALU ALU Mem * rd rt rt rt * pc+4 pc+4 BEQZz=0 s. Ext 16 * 0? no no * * br BEQZz=1 s. Ext 16 * * * no no no * * pc+4 jabs * * 0? * * yes no yes PC * PC R 31 * R 31 jabs rind ALUiu LW J JAL JR JALR BSrc = Reg / Imm Reg. Dst = rt / rd / R 31 February 29, 2016 no no WBSrc = ALU / Mem / PC PCSrc = pc+4 / br / rind / jabs http: //csg. csail. mit. edu/6. 375 L 09 -22
Single-Cycle RISC-V: Clock Speed Register File PC +4 Inst Memory Decode Execute Data Memory t. Clock > t. M + t. DEC + t. RF + t. ALU+ t. M+ t. WB We can improve the clock speed if we execute each instruction in two clock cycles t. Clock > max {t. M , (t. DEC + t. RF + t. ALU+ t. M+ t. WB )} However, this may not improve the performance because each instruction will now take two cycles to http: //csg. csail. mit. edu/6. 375 February 29, 2016 execute L 09 -23
Structural Hazards Sometimes multicycle implementations are necessary because of resource conflicts, aka, structural hazards n n Princeton style architectures use the same memory for instruction and data and consequently, require at least two cycles to execute Load/Store instructions If the register file supported less than 2 reads and one write concurrently then most instructions would take more than one cycle to execute Usually extra registers are required to hold values between cycles February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -24
Extras Slides on Decoding February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -25
Instruction Formats R-type instruction 7 funct 7 5 rs 2 5 rs 1 3 funct 3 5 rd 7 opcode I-type instruction & I-immediate (32 bits) 12 imm[11: 0] 5 rs 1 3 funct 3 5 rd 7 opcode I-imm = sign. Extend(inst[31: 20]) S-type instruction & S-immediate (32 bits) 7 imm[11: 5] 5 rs 2 5 rs 1 3 funct 3 5 imm[4: 0] 7 opcode S-imm = sign. Extend({inst[31: 25], inst[11: 7]}) February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -26
Instruction Formats cont. SB-type instruction & B-immediate (32 bits) 1 6 imm[12] imm[10: 5] 5 rs 2 5 rs 1 3 4 1 funct 3 imm[4: 1] imm[11] 7 opcode B-imm = sign. Extend({inst[31], inst[7], inst[30: 25], inst[11: 8], 1’b 0}) U-type instruction & U-immediate (32 bits) 20 imm[31: 12] 5 rd 7 opcode U-imm = sign. Extend({inst[31: 12], 12’b 0}) UJ-type instruction & J-immediate (32 bits) 1 imm[20] 10 imm[10: 1] 1 imm[11] 8 imm[19: 12] 5 rd 7 opcode J-imm = sign. Extend({inst[31], inst[19: 12], inst[20], inst[30: 21], 1’b 0}) February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -27
Computational Instructions cont. Register-immediate instructions (U-type) 20 imm[31: 12] n n n 5 rd 7 opcode = LUI : rd U-imm opcode = AUIPC : rd pc + U-imm = {inst[31: 12], 12’b 0} February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -28
Decoding Instructions case (opcode) // opcode op. Op. Imm: … Bit#(7) op. Op. Imm op. Op: … Bit#(7) op. Op op. Lui: … Bit#(7) op. Lui op. Auipc: … Bit#(7) op. Auipc op. Jal: … Bit#(7) op. Jalr: … Bit#(7) op. Branch: … Bit#(7) op. Load: … Bit#(7) op. Store: … default: … // Unsupported endcase; February 29, 2016 http: //csg. csail. mit. edu/6. 375 = = = = = 7'b 0010011; 7'b 0110111; 7'b 0010111; 7'b 1101111; 7'b 1100011; 7'b 0000011; 7'b 0100011; L 09 -29
Decoding Instructions: Computational Instructions - Imm op. Op. Imm: begin d. Inst. i. Type = Alu; d. Inst. alu. Func = case (funct 3) fn. ADD: Add; fn. SLTU: Sltu; … fn. SLL: Sll; fn. SR: alu. Sel == 0 ? Srl : Sra; endcase; d. Inst. br. Func = NT; d. Inst. dst = Valid rd; d. Inst. src 1 = Valid rs 1; d. Inst. src 2 = Invalid; d. Inst. imm = Valid imm. I; end February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -30
Decoding Instructions: Computational Instructions cont. op. Lui: begin // rd = imm. U + r 0 d. Inst. i. Type = Alu; d. Inst. alu. Func = Add; d. Inst. br. Func = NT; d. Inst. dst = tagged Valid rd; d. Inst. src 1 = tagged Valid 0; d. Inst. src 2 = tagged Invalid; d. Inst. imm = tagged Valid imm. U; end op. Auipc: begin d. Inst. i. Type = Auipc; d. Inst. alu. Func = ? ; d. Inst. br. Func = NT; d. Inst. dst = tagged Valid rd; d. Inst. src 1 = tagged Invalid; d. Inst. src 2 = tagged Invalid; d. Inst. imm = tagged Valid imm. U; end February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -31
Decoding Instructions: Unconditional Jumps op. Jal: begin d. Inst. i. Type = J; d. Inst. alu. Func = ? ; d. Inst. br. Func = AT; d. Inst. dst = Valid rd; d. Inst. src 1 = Invalid; d. Inst. src 2 = Invalid; d. Inst. imm = Valid imm. J; end op. Jalr: begin d. Inst. i. Type = Jr; d. Inst. alu. Func = ? ; d. Inst. br. Func = AT; d. Inst. dst = Valid rd; d. Inst. src 1 = Valid rs 1; d. Inst. src 2 = Invalid; d. Inst. imm = Valid imm. I; end February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -32
Decoding instructions: Unsupported default: begin d. Inst. i. Type = Unsupported; d. Inst. alu. Func = ? ; d. Inst. br. Func = NT; d. Inst. dst = Invalid; d. Inst. src 1 = Invalid; d. Inst. src 2 = Invalid; d. Inst. imm = Invalid; end February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -33
Branch Address Calculation function Addr br. Addr. Calc(Addr pc, Data val, IType i. Type, Data imm, Bool taken); Addr pc. Plus 4 = pc + 4; Addr target. Addr = case (i. Type) J : {pc + imm}; Jr : {truncate. LSB(val + imm), 1'b 0}; Br : (taken ? pc + imm : pc. Plus 4); default: pc. Plus 4; endcase; return target. Addr; endfunction February 29, 2016 http: //csg. csail. mit. edu/6. 375 L 09 -34
- Slides: 34