Designing a Single Cycle Processor Outline t t

  • Slides: 70
Download presentation
Designing a Single. Cycle Processor 國立清華大學資訊 程學系 黃婷婷教授

Designing a Single. Cycle Processor 國立清華大學資訊 程學系 黃婷婷教授

Outline t t t Introduction to designing a processor Analyzing the instruction set(step 1)

Outline t t t Introduction to designing a processor Analyzing the instruction set(step 1) Building the datapath(steps 2 and 3) A single-cycle implementation Control for the single-cycle CPU(steps 4 and 5) l l l t Control of CPU operations ALU controller Main controller Adding jump instruction 1

Introduction t CPU performance factors l l t We will examine two MIPS implementations

Introduction t CPU performance factors l l t We will examine two MIPS implementations l l t Instruction count n Determined by ISA and compiler CPI and Cycle time n Determined by CPU hardware A simplified version A more realistic pipelined version Simple subset, shows most aspects l l l Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j 2

Instruction Execution t t t PC instruction memory, fetch instruction Register numbers register file,

Instruction Execution t t t PC instruction memory, fetch instruction Register numbers register file, read registers Depending on instruction class l Use ALU to calculate n n n l l Arithmetic result Memory address for load/store Branch target address Access data memory for load/store PC target address or PC + 4 3

CPU Overview 4

CPU Overview 4

Multiplexers t Can’t just join wires together l Use multiplexers 5

Multiplexers t Can’t just join wires together l Use multiplexers 5

Control 6

Control 6

Logic Design Basics t Information encoded in binary l l l t Combinational element

Logic Design Basics t Information encoded in binary l l l t Combinational element l l t Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses Operate on data Output is a function of input State (sequential) elements l Store information 7

Combinational Elements t AND-gate l t Y=A&B A Adder l A Y B Y=A+B

Combinational Elements t AND-gate l t Y=A&B A Adder l A Y B Y=A+B + Y B t Multiplexer l t Y = S ? I 1 : I 0 I 1 M u x Y Arithmetic/Logic Unit l Y = F(A, B) A ALU Y B S F 8

Sequential Elements t Register: stores data in a circuit l l D Uses a

Sequential Elements t Register: stores data in a circuit l l D Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 Q Clk D Clk Q 9

Sequential Elements t Register with write control l l Only updates on clock edge

Sequential Elements t Register with write control l l Only updates on clock edge when write control input is 1 Used when stored value is required later Clk D Write Clk Q Write D Q 10

Clocking Methodology t Combinational logic transforms data during clock cycles l l l Between

Clocking Methodology t Combinational logic transforms data during clock cycles l l l Between clock edges Input from state elements, output to state element Longest delay determines clock period 11

How to Design a Processor? 1. Analyze instruction set (datapath requirements) l l l

How to Design a Processor? 1. Analyze instruction set (datapath requirements) l l l The meaning of each instruction is given by the register transfers Datapath must include storage element Datapath must support each register transfer 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points effecting register transfer 5. Assemble the control logic 12

Outline t t t Introduction to designing a processor Analyzing the instruction set (step

Outline t t t Introduction to designing a processor Analyzing the instruction set (step 1) Building the datapath (steps 2 and 3) A single-cycle implementation Control for the single-cycle CPU l l l Control of CPU operations ALU controller Main controller 13

Step 1: Analyze Instruction Set t All MIPS instructions are 32 bits long with

Step 1: Analyze Instruction Set t All MIPS instructions are 32 bits long with 3 formats: l R-type: l I-type: l J-type: 31 26 op 31 31 6 bits op 6 bits 21 rs 26 26 5 bits rs 5 bits op 6 bits t The different fields are: l l l 16 rt 21 5 bits 16 11 6 0 rd shamt funct 5 bits 6 bits rt 5 bits 0 immediate 16 bits 0 target address 26 bits op: operation of the instruction rs, rt, rd: source and destination register shamt: shift amount funct: selects variant of the “op” field address / immediate target address: target address of jump 14

Our Example: A MIPS Subset t t R-Type: l add rd, rs, rt l

Our Example: A MIPS Subset t t R-Type: l add rd, rs, rt l sub rd, rs, rt l and rd, rs, rt l or rd, rs, rt l slt rd, rs, rt Load/Store: l lw rt, rs, imm 16 l sw rt, rs, imm 16 Imm operand: l addi rt, rs, imm 16 Branch: l beq rs, rt, imm 16 Jump: l j target 31 26 op 6 bits 31 21 rs 5 bits 26 op 6 bits 31 op 6 bits rt 5 bits 21 rs 5 bits 26 16 11 rd 5 bits shamt 5 bits 0 funct 6 bits 16 0 immediate 16 bits rt 5 bits 21 6 16 address 0 26 bits 15

Logical Register Transfers t t RTL gives the meaning of the instructions All start

Logical Register Transfers t t RTL gives the meaning of the instructions All start by fetching the instruction, read registers, then use ALU => simplicity and regularity help MEM[ PC ] = op | rs | rt | rd | shamt | funct or = op | rs | rt | Imm 16 or = op | Imm 26 (added at the end) Inst ADD SUB LOAD STORE ADDI BEQ Register transfers R[rd] <- R[rs] + R[rt]; PC <- PC + 4 R[rd] <- R[rs] - R[rt]; PC <- PC + 4 R[rt] <- MEM[ R[rs] + sign_ext(Imm 16)]; PC <- PC + 4 MEM[ R[rs] + sign_ext(Imm 16) ] <-R[rt]; PC <- PC + 4 R[rt] <- R[rs] + sign_ext(Imm 16)]; PC <- PC + 4 if (R[rs] == R[rt]) then PC <- PC + 4 + sign_ext(Imm 16)] || 00 else PC <- PC + 4 16

Requirements of Instruction Set After checking the register transfers, we can see that datapath

Requirements of Instruction Set After checking the register transfers, we can see that datapath needs the followings: t Memory l t Registers (32 x 32) l l l t t store instructions and data read RS read RT Write RT or RD PC Extender for zero- or sign-extension Add and sub register or extended immediate (ALU) Add 4 or extended immediate to PC 17

Outline t t t Introduction to designing a processor Analyzing the instruction set (step

Outline t t t Introduction to designing a processor Analyzing the instruction set (step 1) Building the datapath (steps 2, 3) A single-cycle implementation Control for the single-cycle CPU l l l t Control of CPU operations ALU controller Main controller Adding jump instruction 18

Step 2 a: Combinational Components for Datapath t Basic building blocks of combinational logic

Step 2 a: Combinational Components for Datapath t Basic building blocks of combinational logic elements : Carry. In A 32 32 Adder A Sum Carry B 32 MUX Adder B 32 Select 32 Y 32 MUX ALU control 4 A 32 B 32 32 Result ALU 19

Step 2 b: Sequential Components for Datapath Storage elements: t Register: l Similar to

Step 2 b: Sequential Components for Datapath Storage elements: t Register: l Similar to the D Flip Flop except n n l N-bit input and output Write Enable input Write Enable: n n negated (0): Data Out will not change asserted (1): Data Out will become Data In Write Enable Data In N Data Out N Clk 20

Storage Element: Register File t Consists of 32 registers: l l l t Register

Storage Element: Register File t Consists of 32 registers: l l l t Register is selected by: l l l t Appendix B. 8 Two 32 -bit output busses: bus. A and bus. B One 32 -bit input bus: bus. W Write Enable bus. W 32 Clk RW RA RB 5 5 5 32 -bit Registers bus. A 32 bus. B 32 RA selects the register to put on bus. A (data) RB selects the register to put on bus. B (data) RW selects the register to be written via bus. W (data) when Write Enable is 1 Clock input (CLK) l l The CLK input is a factor ONLY during write operation During read, behaves as a combinational circuit 21

Storage Element: Memory t Memory (idealized) l l l t Word is selected by:

Storage Element: Memory t Memory (idealized) l l l t Word is selected by: l l t Appendix B. 8 One input bus: Data In One output bus: Data Out Write Enable Address Data In 32 Clk Data. Out 32 Address selects the word to put on Data Out Write Enable = 1: address selects the memory word to be written via the Data In bus Clock input (CLK) l l The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: n n Address valid => Data Out valid after access time No need for read control 22

Step 3 a: Datapath Assembly t Instruction fetch unit: common operations l l Fetch

Step 3 a: Datapath Assembly t Instruction fetch unit: common operations l l Fetch the instruction: mem[PC] Update the program counter: n n Sequential code: PC <- PC + 4 Branch and Jump: PC <- “Something else” 23

Step 3 b: Add and Subtract t R[rd] <- R[rs] op R[rt] Ex: add

Step 3 b: Add and Subtract t R[rd] <- R[rs] op R[rt] Ex: add l l rd, rs, rt Ra, Rb, Rw come from inst. ’s rs, rt, and rd fields ALU and Reg. Write: control logic after decode 31 26 21 op 6 bits rs 5 bits rs rt Instruction rd Read register 1 16 rt 5 bits 11 rd 5 bits 6 shamt 5 bits 4 Read data 1 Read register 2 Registers Write register Read data 2 Write data 0 funct 6 bits ALU operation (funct) Zero ALU result Reg. Write 24

Step 3 c: Store/Load Operations t R[rt]<-Mem[R[rs]+Sign. Ext[imm 16]] Ex: lw rt, rs, imm

Step 3 c: Store/Load Operations t R[rt]<-Mem[R[rs]+Sign. Ext[imm 16]] Ex: lw rt, rs, imm 16 31 26 op 6 bits rs 21 rs 5 bits 11 16 rt 5 bits 0 immediate 16 bits rd 4 rt rt 25

R-Type/Load/Store Datapath 26

R-Type/Load/Store Datapath 26

Step 3 d: Branch Operations t beq rs, rt, imm 16 mem[PC] Fetch inst.

Step 3 d: Branch Operations t beq rs, rt, imm 16 mem[PC] Fetch inst. from memory Equal <- R[rs] == R[rt] Calculate branch condition if (COND == 0) Calculate next inst. address PC <- PC + 4 + ( Sign. Ext(imm 16) x 4 ) else PC <- PC + 4 31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits 27

Datapath for Branch Operations t beq rs, rt, imm 16 4 28

Datapath for Branch Operations t beq rs, rt, imm 16 4 28

Outline t t t Introduction to designing a processor Analyzing the instruction set Building

Outline t t t Introduction to designing a processor Analyzing the instruction set Building the datapath A single-cycle implementation Control for the single-cycle CPU l l l Control of CPU operations ALU controller Main controller 29

A Single Cycle Datapath 30

A Single Cycle Datapath 30

Data Flow during add 100. . 0100 4 • Clocking • data flows in

Data Flow during add 100. . 0100 4 • Clocking • data flows in other paths 31

Clocking Methodology t Combinational logic transforms data during clock cycles l l l Between

Clocking Methodology t Combinational logic transforms data during clock cycles l l l Between clock edges Input from state elements, output to state element Longest delay determines clock period 32

Clocking Methodology t t Define when signals are read and written Assume edge-triggered: l

Clocking Methodology t t Define when signals are read and written Assume edge-triggered: l l Values in storage (state) elements updated only on a clock edge => clock edge should arrive only after input signals stable Any combinational circuit must have inputs from and outputs to storage elements Clock cycle: time for signals to propagate from one storage element, through combinational circuit, to reach the second storage element A register can be read, its value propagated through some combinational circuit, new value is written back to the same register, all in same cycle => no feedback within a single cycle 33

Register-Register Timing Clk PC Old Value Clk-to-Q New Value Rs, Rt, Rd, Op, Func

Register-Register Timing Clk PC Old Value Clk-to-Q New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value Reg. Wr Old Value bus. A, B bus. W Delay through Control Logic New Value Register File Access Time New Value Old Value ALU Delay New Value Old Value 32 Rd Rs Rt Reg. Wr 5 5 5 bus. W 32 Clk Rw Ra Rb 32 32 -bit Registers ALUctr Register Write Occurs Here bus. A 32 bus. B 32 ALU PC Ideal Instruction Memory Clk Instruction Memory Access Time New Value Result 32 34

The Critical Path t Register file and ideal memory: l During read, behave as

The Critical Path t Register file and ideal memory: l During read, behave as combinational logic: n Address valid => Output valid after access time Ideal Instruction Memory Instruction Rd 5 Instruction Address Rt 5 Imm 16 A PC 32 Clk Rs 5 Clk Rw Ra Rb 32 32 -bit Registers 32 ALU Next Address Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction memory’s Access Time + Register file’s Access Time + ALU to Perform a 32 -bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew Data 32 Address Ideal Data In Memory B 32 Clk 35

Worst Case Timing (Load) Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access

Worst Case Timing (Load) Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access Time New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value Ext. Op Old Value New Value ALUSrc Old Value New Value Memto. Reg Old Value New Value Reg. Wr Old Value New Value bus. A bus. B Delay through Control Logic New Value Register Write Occurs Register File Access Time New Value Old Value Delay through Extender & Mux Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time bus. W Old Value New 36

Outline t t t Introduction to designing a processor Analyzing the instruction set Building

Outline t t t Introduction to designing a processor Analyzing the instruction set Building the datapath A single-cycle implementation Control for the single-cycle CPU l l l t Control of CPU operations (step 4) ALU controller ( step 5 a) Main controller (step 5 b) Adding jump instruction 37

Step 4: Control Points and Signals Rd <0: 15> Rs <11: 15> Op Funct

Step 4: Control Points and Signals Rd <0: 15> Rs <11: 15> Op Funct Rt <16: 20> <21: 25> Addr <21: 25> Inst. Memory Instruction<31: 0> Imm 16 Control PCsrc Reg. Dst ALUSrc Mem. Wr Reg. Wr Mem. Rd ALUctr Memto. Reg Equal Datapath 38

Datapath with Mux and Control point 39

Datapath with Mux and Control point 39

Designing Main Control t Some observations: l opcode (Op[5 -0]) is always in bits

Designing Main Control t Some observations: l opcode (Op[5 -0]) is always in bits 31 -26 40

Datapath with Control Unit 41

Datapath with Control Unit 41

Instruction Fetch at Start of Add t instruction <- mem[PC]; PC + 4 42

Instruction Fetch at Start of Add t instruction <- mem[PC]; PC + 4 42

Instruction Decode of Add t Fetch the two operands and decode instruction: 43

Instruction Decode of Add t Fetch the two operands and decode instruction: 43

ALU Operation during Add t R[rs] + R[rt] 44

ALU Operation during Add t R[rs] + R[rt] 44

Write Back at the End of Add t R[rd] <- ALU; PC <- PC

Write Back at the End of Add t R[rd] <- ALU; PC <- PC + 4 45

Datapath Operation for lw t R[rt] <- Memory {R[rs] + Sign. Ext[imm 16]} 46

Datapath Operation for lw t R[rt] <- Memory {R[rs] + Sign. Ext[imm 16]} 46

Datapath Operation for beq if (R[rs]-R[rt]==0) then Zero<-1 else Zero<-0 if (Zero==1) then PC=PC+4+sign.

Datapath Operation for beq if (R[rs]-R[rt]==0) then Zero<-1 else Zero<-0 if (Zero==1) then PC=PC+4+sign. Ext[imm 16]*4; else PC = PC + 4 47

Outline t t t Designing a processor Analyzing the instruction set Building the datapath

Outline t t t Designing a processor Analyzing the instruction set Building the datapath A single-cycle implementation Control for the single-cycle CPU l l l t Control of CPU operations (step 4) ALU controller (step 5 a) Main controller (step 5 b) Adding jump instruction 48

Datapath with Control Unit 49

Datapath with Control Unit 49

Step 5 a: ALU Control t ALU used for l l l Load/Store: F

Step 5 a: ALU Control t ALU used for l l l Load/Store: F = add Branch: F = subtract R-type: F depends on funct field ALU control 0000 0001 0010 0111 1100 Function AND OR add subtract set-on-less-than NOR 50

Our Plan for the Controller 7 31 R-type t 26 op 21 rs ALU

Our Plan for the Controller 7 31 R-type t 26 op 21 rs ALU Control (Local) 6 ALUop 2 Main Control 16 rt ALUctr 4 11 rd ALU Op code 6 func 6 shamt 0 funct ALUop is 2 -bit wide to represent: load/store requiring the ALU to perform add (00) l beq requiring the ALU to perform sub (01) l “R-type” need to reference func field (10) l ALUop (Symbolic) ALUop<1: 0> R-type “R-type” 10 lw Add 00 sw Add 00 beq Subtract 01 jump xxx 51

ALU Control t Assume 2 -bit ALUOp derived from opcode l Combinational logic derives

ALU Control t Assume 2 -bit ALUOp derived from opcode l Combinational logic derives ALU control opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001 set-on-lessthan 101010 set-on-lessthan 0111 52

Logic Equation for ALUctr ALUop func bit<1> bit<0> bit<5>bit<4>bit<3> bit<2> bit<1> bit<0> x x

Logic Equation for ALUctr ALUop func bit<1> bit<0> bit<5>bit<4>bit<3> bit<2> bit<1> bit<0> x x 0 0 x x ALUctr bit<3>bit<2> bit<1> bit<0> 0 1 0 0 x 1 x x x 0 1 1 1 1 1 x x x x 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 1 53

Logic Equation for ALUctr 2 ALUop bit<1> bit<0> x 1 1 x func bit<5>

Logic Equation for ALUctr 2 ALUop bit<1> bit<0> x 1 1 x func bit<5> bit<4> bit<3> bit<2> x x x 0 0 x x 1 0 bit<1> bit<0> x 1 1 x 0 0 ALUctr<2> 1 1 1 This makes func<3> a don’t care ALUctr 2 = ALUop 0 + ALUop 1‧func 2’‧func 1‧func 0’ 54

Logic Equation for ALUctr 1 ALUop bit<1> 0 x 1 1 1 bit<0> 0

Logic Equation for ALUctr 1 ALUop bit<1> 0 x 1 1 1 bit<0> 0 1 x x x func bit<5> bit<4> bit<3> bit<2> x x x x x 0 0 x x 1 0 x bit<1> x x 0 1 1 bit<0> x x 0 0 0 ALUctr<1> 1 1 1 ALUctr 1 = ALUop 1’ + ALUop 1‧func 2’‧func 0’ 55

Logic Equation for ALUctr 0 ALUop bit<1> bit<0> 1 x bit<5> x x func

Logic Equation for ALUctr 0 ALUop bit<1> bit<0> 1 x bit<5> x x func bit<4> bit<3> bit<2> bit<1> bit<0> x 0 1 x 1 0 ALUctr<0> 1 1 ALUctr 0 = ALUop 1 ‧func 3’‧func 2‧func 1’‧func 0 + ALUop 1’‧func 3 ‧func 2’‧func 1‧func 0’ 56

The Resultant ALU Control Block 0 Operation 3 57

The Resultant ALU Control Block 0 Operation 3 57

Outline t t t Introduction to designing a processor Analyzing the instruction set Building

Outline t t t Introduction to designing a processor Analyzing the instruction set Building the datapath A single-cycle implementation Control for the single-cycle CPU l l l t Control of CPU operations ALU controller Main controller (step 5 b) Adding jump instruction 58

Datapath with Control Unit 59

Datapath with Control Unit 59

Step 5 b: The Main Control Unit t Control signals derived from instruction R-type

Step 5 b: The Main Control Unit t Control signals derived from instruction R-type Load/ Store Branch 0 rs rt rd shamt funct 31: 26 25: 21 20: 16 15: 11 10: 6 5: 0 35 or 43 rs rt address 31: 26 25: 21 20: 16 15: 0 4 rs rt address 31: 26 25: 21 20: 16 15: 0 opcode always read, except for load write for R -type and load sign-extend add 60

Truth Table of Control Signals (6 inputs and 9 outputs) See Appendix A func

Truth Table of Control Signals (6 inputs and 9 outputs) See Appendix A func 10 0000 10 0010 op 00 0000 add 1 0 0 0 1 00 0000 sub 1 0 0 0 1 0 0 Reg. Dst ALUSrc Memto. Reg. Write Mem. Read Mem. Write Branch ALUop 1 ALUop 0 Op code 6 Main Control Reg. Dst ALUSrc : ALUop 2 We Don’t Care : -) 10 0011 lw 0 1 1 0 0 0 10 1011 sw x 1 x 0 0 1 0 0 00 0100 beq x 0 0 0 1 0 func 6 ALU Control (Local) ALUctr 4 61

Truth Table for Reg. Write Op code 00 0000 R-type Reg. Write 10 0011

Truth Table for Reg. Write Op code 00 0000 R-type Reg. Write 10 0011 10 1011 00 0100 lw sw beq 1 1 0 0 Reg. Write = R-type + lw = op 5’‧op 4’‧op 3’‧op 2’‧op 1’‧op 0’(R-type) + op 5‧op 4’‧op 3’‧op 2’‧op 1‧ op 0 (lw) op<5>. . <0> R-type op<5>. . <0> lw op<5>. . <0> sw op<5>. . <0> beq op<0> jump Reg. Write X 62

PLA Implementing Main Control 63

PLA Implementing Main Control 63

Outline t t t Introduction to designing a processor Analyzing the instruction set (step

Outline t t t Introduction to designing a processor Analyzing the instruction set (step 1) Building the datapath (steps 2, 3) A single-cycle implementation Control for the single-cycle CPU l l l t Control of CPU operations ALU controller Main controller Adding jump instruction 64

Implementing Jumps Jump t t address 31: 26 25: 0 Jump uses word address

Implementing Jumps Jump t t address 31: 26 25: 0 Jump uses word address Update PC with concatenation of l l l t 2 Top 4 bits of old PC 26 -bit jump address 00 Need an extra control signal decoded from opcode 65

Putting it Altogether (+ jump instruction) 66

Putting it Altogether (+ jump instruction) 66

Worst Case Timing (Load) Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access

Worst Case Timing (Load) Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access Time New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value Ext. Op Old Value New Value ALUSrc Old Value New Value Memto. Reg Old Value New Value Reg. Wr Old Value New Value bus. A bus. B Delay through Control Logic New Value Register Write Occurs Register File Access Time New Value Old Value Delay through Extender & Mux Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time bus. W Old Value New 67

Drawback of Single-Cycle Design t Long cycle time: l Cycle time must be long

Drawback of Single-Cycle Design t Long cycle time: l Cycle time must be long enough for the load instruction: PC’s Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew t Cycle time for load is much longer than needed for all other instructions 68

Summary t t Single cycle datapath => CPI=1, Clock cycle time long MIPS makes

Summary t t Single cycle datapath => CPI=1, Clock cycle time long MIPS makes control easier l l Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates 69