Major CPU Design Steps 1 Analyze instruction set

  • Slides: 50
Download presentation
Major CPU Design Steps 1. Analyze instruction set operations using independent RTN Datapath ISA

Major CPU Design Steps 1. Analyze instruction set operations using independent RTN Datapath ISA => RTN => datapath requirements. – This provides the required datapath components and how they are connected to meet ISA requirements. 2. Select required datapath components, connections & establish clock methodology (e. g clock edge-triggered). + Determine number of cycles per instruction and operations in each cycle. 3. Assemble datapath meeting the requirements. Control 4. Identify and define the function of all control points or signals needed by the datapath. – Analyze implementation of each instruction to determine setting of control points that affects its operations and register transfer. 5. Design & assemble the control logic. – Hard-Wired: Finite-state machine implementation. – Microprogrammed. i. e using a control program 3 rd Edition Chapter 5. 5 – See Handout – Not in 4 th Edition EECC 550 - Shaaban #1 Lec # 5 Winter 2009 1 -5 -2010

Single Cycle MIPS Datapath: PCSrc Branch Zero PC+4 ALUop (2 -bits) Zero Function Field

Single Cycle MIPS Datapath: PCSrc Branch Zero PC+4 ALUop (2 -bits) Zero Function Field 32 Branch Target imm 16 16 Jump Not Included (Includes ORI not in book version) 1 = 32 Data In 32 Clk 32 Wr. En Adr 0 Mux Clk Extender Clk Mem. Wr Memto. Reg Main ALU 1 bus. W Mux PC Mux Adder Rs Rt 5 5 R[rs] bus. A Rw Ra Rb 32 32 32 -bit R[rt] Registers bus. B 0 32 ALU Control Reg. Wr 5 0 T = I x CPI x C Imm 16 Rd Rt 0 1 Adder PC Ext imm 16 Rd Reg. Dst 00 4 Rt Instruction<31: 0> <0: 15> Rs <11: 15> Adr <16: 20> <21: 25> Inst Memory CPI = 1, Long Clock Cycle 1 Data Memory Ext. Op ALUSrc EECC 550 - Shaaban #2 Lec # 5 Winter 2009 1 -5 -2010

Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit Added 32 32

Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit Added 32 32 32 PC +4 Branch Target Book figure may have an error! Opcode rs R[rs] rt R[rt] rd R[rt] 32 imm 16 Figure 5. 24 page 314 Function Field In this book version, ORI is not supported—no zero extend of immediate needed. ALUOp (2 -bits) 00 = add 01 = subtract 10 = R-Type EECC 550 - Shaaban #3 Lec # 5 Winter 2009 1 -5 -2010

Drawbacks of Single-Cycle Processor CPI = 1 1. Long cycle time: – All instructions

Drawbacks of Single-Cycle Processor CPI = 1 1. Long cycle time: – All instructions must take as much time as the slowest: • Cycle time for load is longer than needed for all other instructions. – Real memory is not as well-behaved as idealized memory • Cannot always complete data access in one (short) cycle. 2. Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle. • e. g indirect memory addressing. 3. High and duplicate hardware resource requirements – Any hardware functional unit cannot be used more than once in a single cycle (e. g. ALUs). 4. Cannot pipeline (overlap) the processing of one instruction with the previous instructions. – (instruction pipelining, chapter 6). EECC 550 - Shaaban #4 Lec # 5 Winter 2009 1 -5 -2010

Abstract View of Single Cycle CPU Main Control op ALU control Reg. Dst Reg.

Abstract View of Single Cycle CPU Main Control op ALU control Reg. Dst Reg. Wr Mem. Wr Result Store 2 ns Reg. Wrt Mem. Rd Mem. Wr Mem Access Ext ALU 1 ns Data Mem 1 ns Ext. Op ALUSrc ALUctr Equal 2 ns Register Fetch Instruction Fetch PC Next PC Branch, Jump fun 2 ns One CPU Clock Cycle Duration C = 8 ns One instruction per cycle CPI = 1 Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns EECC 550 - Shaaban #5 Lec # 5 Winter 2009 1 -5 -2010

Single Cycle Instruction Timing Arithmetic & Logical PC Inst Memory Load PC Inst Memory

Single Cycle Instruction Timing Arithmetic & Logical PC Inst Memory Load PC Inst Memory Reg File mux ALU mux Reg File Critical Path ALU Store PC Inst Memory Reg File Branch PC Inst Memory Reg File mux setup Data Mem mux setup (Determines CPU clock cycle, C) mux cmp Data Mem mux Critical Path: Load (e. g 8 ns) EECC 550 - Shaaban #6 Lec # 5 Winter 2009 1 -5 -2010

Clock Cycle Time & Critical Path One CPU Clock Cycle Duration C = 8

Clock Cycle Time & Critical Path One CPU Clock Cycle Duration C = 8 ns here Clk . . i. e longest delay . . . Critical Path LW in this case • Critical path: the slowest path between any two storage devices • Clock Cycle time is a function of the critical path, and must be greater than: – Clock-to-Q + Longest Delay Path through the Combination Logic + Setup + Clock Skew Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns EECC 550 - Shaaban #7 Lec # 5 Winter 2009 1 -5 -2010

Reducing Cycle Time: Multi-Cycle Design • Cut combinational dependency graph by inserting registers /

Reducing Cycle Time: Multi-Cycle Design • Cut combinational dependency graph by inserting registers / latches. • The same work is done in two or more shorter cycles, rather than one long cycle. storage element One long cycle e. g CPI =1 Acyclic Combinational Logic storage element Two shorter cycles Cycle 1 Acyclic Combinational Logic (A) e. g CPI =2 => Storage Element: Register or memory Cycle 2 storage element Place registers to: • Get a balanced clock cycle length • Save any results needed for the remaining cycles storage element Acyclic Combinational Logic (B) storage element EECC 550 - Shaaban #8 Lec # 5 Winter 2009 1 -5 -2010

Basic MIPS Instruction Processing Steps Instruction Memory Instruction Fetch Next Obtain instruction from program

Basic MIPS Instruction Processing Steps Instruction Memory Instruction Fetch Next Obtain instruction from program storage Instruction ¬ Mem[PC] Update program counter to address Instruction of next instruction Instruction Determine instruction type PC ¬ PC + 4 Decode Obtain operands from registers Execute Compute result value or status } Common steps for all instructions Done by Control Unit Result Store result in register/memory if needed Store (usually called Write Back). EECC 550 - Shaaban #9 Lec # 5 Winter 2009 1 -5 -2010

Partitioning The Single Cycle Datapath 1 Instruction Fetch Cycle (IF) Instruction Decode 2 Cycle

Partitioning The Single Cycle Datapath 1 Instruction Fetch Cycle (IF) Instruction Decode 2 Cycle (ID) Execution Cycle 3 (EX) Place registers to: • Get a balanced clock cycle length • Save any results needed for the remaining cycles Data Memory Access 4 Cycle (MEM) 5 Result Store Mem. Wr Reg. Dst Reg. Wr Reg. File Mem. Rd Mem. Wr ALUctr ALUSrc Exec Data Mem Operand Fetch Instruction Fetch 2 ns Ext. Op 1 ns 2 ns Mem Access To Control Unit PC Next PC Branch, Jump Add registers between steps to break into cycles Write back Cycle (WB) EECC 550 - Shaaban #10 Lec # 5 Winter 2009 1 -5 -2010

B Execution (EX) 2 ns Reg. Dst Reg. Wr File Equal Mem. To. Reg

B Execution (EX) 2 ns Reg. Dst Reg. Wr File Equal Mem. To. Reg Mem. Rd Mem. Wr ALUSrc ALUctr R M Data Mem Instruction Decode (ID) 2 1 ns A Ext ALU Reg File Mem Access Instruction Fetch (IF) 2 ns IR Instruction Fetch Ext. Op To Control Unit PC Branch, Jump Next PC 1 Example Multi-cycle Datapath Memory Write Back (MEM) (WB) 3 4 2 ns 5 1 ns All clock-edge triggered (not shown register write enable control lines) Registers added: IR: Instruction register A, B: Two registers to hold operands read from register file. R: or ALUOut, holds the output of the main ALU M: or Memory data register (MDR) to hold data read from data memory CPU Clock Cycle Time: Worst cycle delay = C = 2 ns Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns Thus Clock Rate: f = 1 / 2 ns = 500 MHz (ignoring MUX, CLK-Q delays) EECC 550 - Shaaban #11 Lec # 5 Winter 2009 1 -5 -2010

Operations (Dependant RTN) for Each Cycle Logic Immediate R-Type IF Instruction Fetch IR ¬

Operations (Dependant RTN) for Each Cycle Logic Immediate R-Type IF Instruction Fetch IR ¬ Mem[PC] IR ¬ ID Instruction Decode A ¬ R[rs] B ¬ R[rt] B ¬ Mem[PC] R[rt Load IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt Store IR ¬ Branch Mem[PC] IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt] Zero ¬ A - B If Zero = 1: EX Execution R ¬ A funct B R ¬ A OR Zero. Ext[imm 16] R ¬ A + Sign. Ex(Im 16) PC ¬ PC + 4 + (Sign. Ext(imm 16) x 4) else (i. e Zero =0): PC ¬ PC + 4 MEM WB Memory Write Back M ¬ Mem[R] ¬ R[rd] ¬ R R[rt] PC ¬ PC + 4 Instruction Fetch (IF) & Instruction Decode cycles are common for all instructions Mem[R] ¬ PC + 4 B M EECC 550 - Shaaban #12 Lec # 5 Winter 2009 1 -5 -2010

MIPS Multi-Cycle Datapath: Five Cycles of Load Cycle 1 Cycle 2 Load IF ID

MIPS Multi-Cycle Datapath: Five Cycles of Load Cycle 1 Cycle 2 Load IF ID CPI = 5 Cycle 3 Cycle 4 Cycle 5 EX MEM WB 1 - Instruction Fetch (IF): Fetch the instruction from instruction Memory. 2 - Instruction Decode (ID): Operand Register Fetch and Instruction Decode. 3 - Execute (EX): Calculate the effective memory address. 4 - Memory (MEM): Read the data from the Data Memory. 5 - Write Back (WB): Write the loaded data to the register file. Update PC. EECC 550 - Shaaban #13 Lec # 5 Winter 2009 1 -5 -2010

Multi-cycle Datapath Instruction CPI • R-Type/Immediate: Require four cycles, CPI = 4 – IF,

Multi-cycle Datapath Instruction CPI • R-Type/Immediate: Require four cycles, CPI = 4 – IF, ID, EX, WB • Loads: Require five cycles, CPI = 5 – IF, ID, EX, MEM, WB • Stores: Require four cycles, CPI = 4 – IF, ID, EX, MEM • Branches/Jumps: Require three cycles, CPI = 3 – IF, ID, EX • Average or effective program CPI: 3 £ CPI £ 5 depending on program profile (instruction mix). EECC 550 - Shaaban #14 Lec # 5 Winter 2009 1 -5 -2010

Single Cycle Vs. Multi-Cycle CPU Clk 8 ns (125 MHz) Cycle 1 Cycle 2

Single Cycle Vs. Multi-Cycle CPU Clk 8 ns (125 MHz) Cycle 1 Cycle 2 Single Cycle Implementation: 8 ns Load Store Waste 2 ns (500 MHz) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load IF ID EX MEM WB Single-Cycle CPU: CPI = 1 C = 8 ns f = 125 MHz One million instructions take = I x CPI x C = 106 x 1 x 8 x 10 -9 = 8 msec T = I x CPI x C Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns Store IF ID EX R-type MEM IF Multi-Cycle CPU: CPI = 3 to 5 C = 2 ns f = 500 MHz One million instructions take from 106 x 3 x 2 x 10 -9 = 6 msec to 106 x 5 x 2 x 10 -9 = 10 msec depending on instruction mix used. EECC 550 - Shaaban #15 Lec # 5 Winter 2009 1 -5 -2010

Control Unit Design: • • • Finite State Machine (FSM) Control Model State specifies

Control Unit Design: • • • Finite State Machine (FSM) Control Model State specifies control points (outputs) for Register Transfer. Control points (outputs) are assumed to depend only on the current state and not inputs (i. e. Moore finite state machine) Transfer (register/memory writes) and state transition occur upon exiting the state on the falling edge of the clock. inputs (opcode, conditions) Last State Next State Logic State X Control State Register Transfer Control Points State Transition Depends on Inputs e. g Flip-Flops Output Logic outputs (control points) To datapath Current State Moore Finite State Machine Next State EECC 550 - Shaaban #16 Lec # 5 Winter 2009 1 -5 -2010

Control Specification For Multi-cycle CPU Finite State Machine (FSM) - State Transition Diagram “instruction

Control Specification For Multi-cycle CPU Finite State Machine (FSM) - State Transition Diagram “instruction fetch” IR ¬ MEM[PC] (Start state) “decode / operand fetch” R ¬ A fun B ORi R ¬ A or ZX Memory Execute R-type R[rd] ¬ R PC ¬ PC + 4 R[rt] ¬ R PC ¬ PC + 4 To instruction fetch LW SW BEQ & Zero BEQ & ~Zero PC ¬ PC + 4 R ¬ A + SX M ¬ MEM[R] ¬ B PC ¬ PC + 4 R[rt] ¬ M PC ¬ PC + 4 To instruction fetch PC ¬ PC + 4+ SX || 00 To instruction fetch 13 states: 4 State Flip-Flops needed Write-back A ¬ R[rs] B ¬ R[rt] EECC 550 - Shaaban #17 Lec # 5 Winter 2009 1 -5 -2010

Traditional FSM Controller next state op cond state Outputs control points Next State Logic

Traditional FSM Controller next state op cond state Outputs control points Next State Logic State Transition Table Inputs 11 Equal Opcode next State Output Logic control points 6 Current State 4 State op datapath State register (4 Flip-Flops) Outputs (Control points) To datapath EECC 550 - Shaaban #18 Lec # 5 Winter 2009 1 -5 -2010

Traditional FSM Controller datapath + state diagram => control • Translate RTN statements into

Traditional FSM Controller datapath + state diagram => control • Translate RTN statements into control points. • Assign states. • Implement the controller. More on FSM controller implementation in Appendix C EECC 550 - Shaaban #19 Lec # 5 Winter 2009 1 -5 -2010

Mapping RTNs To Control Points Examples & State Assignments IR ¬ MEM[PC] “instruction fetch”

Mapping RTNs To Control Points Examples & State Assignments IR ¬ MEM[PC] “instruction fetch” 0000 0 imem_rd, IRen A ¬ R[rs] B ¬ R[rt] Aen, Ben “decode / operand fetch” 1 0001 ALUfun, Sen R-type R ¬ A fun B 0100 6 0110 1000 0101 M ¬ MEM[R] 1001 0111 To instruction fetch state 0000 BEQ & ~Zero R ¬ A + SX PC ¬ PC + 4 1011 0011 3 12 9 R[rt] ¬ R PC ¬ PC + 4 BEQ & Zero SW 11 R ¬ A + SX 7 R[rd] ¬ R PC ¬ PC + 4 8 R ¬ A or ZX Reg. Dst, Reg. Wr, PCen 5 LW ORi MEM[R] ¬ B PC ¬ PC + 4 1100 10 R[rt] ¬ M PC ¬ PC + 4 1010 To instruction fetch state 0000 2 PC ¬ PC + 4+SX || 00 0010 To instruction fetch state 0000 13 states: 4 State Flip-Flops needed Write-back Memory Execute 4 EECC 550 - Shaaban #20 Lec # 5 Winter 2009 1 -5 -2010

Detailed Control Specification - State Transition Table Current IF ID BEQ R ORI LW

Detailed Control Specification - State Transition Table Current IF ID BEQ R ORI LW SW Op field Z Write-Back State Wr Dst 0000 0001 0001 0010 0011 0100 0101 Next IR PC en sel. A B Ops Exec Ex Sr ALU S Mem RWM M-R ? ? ? ? 0001 1 BEQ 0 0011 11 BEQ 1 0010 11 R-type x 0100 11 or. I x 0110 11 LW x 1000 11 Can be combined 1 in 1 one state SW x 1011 xxxxxx x 0000 1 0 xxxxxx x 0101 0 1 fun 1 xxxxxx x 0000 1 0 0 1 1 0110 xxxxxx x 0111 0 0 or 1 0111 xxxxxx x 0000 1 0 1000 xxxxxx x 1001 1 0 add 1 1001 xxxxxx x 1010 1 1010 on FSM xxxxxx x implementation 0000 in Appendix 1 C 0 EECC 550 - Shaaban More controller 1 1 0 #21 Lec # 5 Winter 2009 1 -5 -2010

Alternative Multiple Cycle Datapath (In Textbook) • Minimizes Hardware: 1 memory, 1 ALU PCWr

Alternative Multiple Cycle Datapath (In Textbook) • Minimizes Hardware: 1 memory, 1 ALU PCWr ALUSrc. A 1 Reg. Wr Din Dout 32 Mem. Rd Rd bus. A A Reg File Rw bus. W bus. B 1 Mux 0 Imm 16 Extend 32 32 32 4 1 0 Zero 32 0 1 2 3 32 32 ALU Out 32 32 Rt 0 5 Ra Rb 32 ALU 1 5 32 Rt Mux Ideal Memory Rs Mem Data Reg Mux Address 0 PC Mux 0 Instruction Reg 32 32 Reg. Dst 32 PCSrc Mux PCWr. Cond Zero Ior. D Mem. Wr IRWr ALU Control << 2 Memto. Reg ALUSrc. B ALUOp EECC 550 - Shaaban #22 Lec # 5 Winter 2009 1 -5 -2010

Alternative Multiple Cycle Datapath (In Textbook) rs rt rd imm 16 i. e MDR

Alternative Multiple Cycle Datapath (In Textbook) rs rt rd imm 16 i. e MDR • Shared instruction/data memory unit • A single ALU shared among instructions • Shared units require additional or widened multiplexors • Temporary registers to hold data between clock cycles of the instruction: • Additional registers: Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut (Figure 5. 27 page 322) EECC 550 - Shaaban #23 Lec # 5 Winter 2009 1 -5 -2010

Alternative Multiple Cycle Datapath With Control Lines (Fig 5. 28 In Textbook) 32 2

Alternative Multiple Cycle Datapath With Control Lines (Fig 5. 28 In Textbook) 32 2 2 PC+ 4 PC 32 32 32 rs Branch Target rt rd 32 32 2 imm 16 32 (ORI not supported, Jump supported) (Figure 5. 28 page 323) EECC 550 - Shaaban #24 Lec # 5 Winter 2009 1 -5 -2010

The Effect of The 1 -bit Control Signals Signal Name Effect when deasserted (=0)

The Effect of The 1 -bit Control Signals Signal Name Effect when deasserted (=0) Effect when asserted (=1) Reg. Dst The register destination number for the write register comes from the rt field (instruction bits 20: 16). Reg. Write None The register destination number for the write register comes from the rd field (instruction bits 15: 11). The register on the write register input is written with the value on the Write data input. ALUSrc. A The first ALU operand is the PC The First ALU operand is register A (i. e R[rs]) Mem. Read None Mem. Write None Content of memory specified by the address input are put on the memory data output. Memory contents specified by the address input is replaced by the value on the Write data input. Memto. Reg The value fed to the register write data input comes from ALUOut register. The value fed to the register write data input comes from data memory register (MDR). Ior. D The PC is used to supply the address to the memory unit. The ALUOut register is used to supply the address to the memory unit. IRWrite None The output of the memory is written into Instruction Register (IR) PCWrite None The PC is written; the source is controlled by PCSource PCWrite. Cond None (Figure 5. 29 page 324) The PC is written if the Zero output of the ALU is also active. EECC 550 - Shaaban #25 Lec # 5 Winter 2009 1 -5 -2010

The Effect of The 2 -bit Control Signals Signal Name Value (Binary) ALUOp ALUSrc.

The Effect of The 2 -bit Control Signals Signal Name Value (Binary) ALUOp ALUSrc. B 00 The ALU performs an add operation 01 The ALU performs a subtract operation 10 The funct field of the instruction determines the ALU operation (R-Type) 00 The second input of the ALU comes from register B (i. e R[rs]) 01 The second input of the ALU is the constant 4 10 The second input of the ALU is the sign-extended 16 -bit immediate (imm 16) field of the instruction in IR The second input of the ALU is is the sign-extended 16 -bit immediate field of IR shifted left 2 bits (for branches) 11 00 PCSource Effect Output of the ALU (PC+4) is sent to the PC for writing 01 The content of ALUOut (the branch target address) is sent to the PC for writing 10 The jump target address (IR[25: 0] shifted left 2 bits and concatenated with PC+4[31: 28] is sent to the PC for writing i. e jump address (Figure 5. 29 page 324) EECC 550 - Shaaban #26 Lec # 5 Winter 2009 1 -5 -2010

Operations (Dependant RTN) for Each Cycle R-Type IF ID EX Instruction Fetch Instruction Decode

Operations (Dependant RTN) for Each Cycle R-Type IF ID EX Instruction Fetch Instruction Decode Execution IR ¬ Mem[PC] PC ¬ PC + 4 WB Store IR ¬ Mem[PC] PC ¬ PC + 4 Branch IR ¬ Mem[PC] PC ¬ PC + 4 Jump IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] A ¬ R[rs] B ¬ B ¬ R[rt] R[rt] ALUout ¬ PC + (Sign. Ext(imm 16) x 4) ALUout ¬ PC + ALUout ¬ A funct B MEM Load (Sign. Ext(imm 16) x 4) ALUout ¬ PC + (Sign. Ext(imm 16) x 4) Zero ¬ A - B ALUout ¬ A + Sign. Ex(Imm 16) (Sign. Ext(imm 16) x 4) A + Sign. Ex(Imm 16) ALUout ¬ PC + (Sign. Ext(imm 16) x 4) PC ¬ Jump Address Zero: PC ¬ ALUout Memory MDR ¬ Mem[ALUout] Write Back R[rd] ¬ ALUout R[rt] ¬ Mem[ALUout] ¬ B MDR Instruction Fetch (IF) & Instruction Decode (ID) cycles are common for all instructions EECC 550 - Shaaban #27 Lec # 5 Winter 2009 1 -5 -2010

High-Level View of Finite State Machine Control (Figure 5. 32) 2 -5 6 -7

High-Level View of Finite State Machine Control (Figure 5. 32) 2 -5 6 -7 (Figure 5. 33) • • (Figure 5. 34) 0 -1 8 9 (Figure 5. 35) (Figure 5. 36) First steps are independent of the instruction class Then a series of sequences that depend on the instruction opcode Then the control returns to fetch a new instruction. Each box above represents one or several state. (Figure 5. 31 page 332) EECC 550 - Shaaban #28 Lec # 5 Winter 2009 1 -5 -2010

FSM State Transition Diagram (From Book) IF (Figure 5. 38 page 339) A ¬

FSM State Transition Diagram (From Book) IF (Figure 5. 38 page 339) A ¬ R[rs] ID B ¬ R[rt] ALUout ¬ PC + (Sign. Ext(imm 16) x 4) IR ¬ Mem[PC] PC ¬ PC + 4 ALUout ¬ A + Sign. Ex(Imm 16) PC ¬ Jump Address EX ALUout ¬ A func B Zero ¬ A -B Zero: PC ¬ ALUout MDR ¬ Mem[ALUout] WB MEM R[rd] ¬ ALUout Mem[ALUout] ¬ B Total 10 states R[rt] ¬ MDR WB EECC 550 - Shaaban More on FSM controller implementation in Appendix C #29 Lec # 5 Winter 2009 1 -5 -2010

Instruction Fetch (IF) and Decode (ID) FSM States A ¬ R[rs] B ¬ R[rt]

Instruction Fetch (IF) and Decode (ID) FSM States A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (Sign. Ext(imm 16) x 4) IF IR ¬ Mem[PC] PC ¬ PC + 4 (Figure 5. 33) (Figure 5. 32 page 333) (Figure 5. 34) ID (Figure 5. 35) (Figure 5. 36) EECC 550 - Shaaban #30 Lec # 5 Winter 2009 1 -5 -2010

Instruction Fetch (IF) Cycle (State 0) IR ¬ Mem[PC] PC ¬ PC + 4

Instruction Fetch (IF) Cycle (State 0) IR ¬ Mem[PC] PC ¬ PC + 4 Mem. Read = 1 ALUSrc. B = 01 ALUSrc. A = 0 ALUOp = 00 (add) Ior. D = 0 PCWrite = 1 IRWrite =1 PCSource = 00 32 00 1 2 2 1 01 1 PC 32 PC+ 4 0 32 32 rs Branch Target rt rd 32 32 2 imm 16 00 Add 32 (ORI not supported, Jump supported) (Figure 5. 28 page 323) EECC 550 - Shaaban #31 Lec # 5 Winter 2009 1 -5 -2010

Instruction Decode (ID) Cycle (State 1) A ¬ R[rs] B ¬ R[rt] ALUSrc. A

Instruction Decode (ID) Cycle (State 1) A ¬ R[rs] B ¬ R[rt] ALUSrc. A = 0 ALUout ¬ PC + (Sign. Ext(imm 16) x 4) ALUSrc. B = 11 ALUOp = 00 (add) (Calculate branch target) 32 2 2 11 PC 32 PC+ 4 0 32 32 rs Branch Target rt rd 32 32 2 imm 16 00 Add 32 (ORI not supported, Jump supported) (Figure 5. 28 page 323) EECC 550 - Shaaban #32 Lec # 5 Winter 2009 1 -5 -2010

Load/Store Instructions FSM States (From Instruction Decode) EX ALUout ¬ A + Sign. Ex(Imm

Load/Store Instructions FSM States (From Instruction Decode) EX ALUout ¬ A + Sign. Ex(Imm 16) i. e Effective address calculation MDR ¬ Mem[ALUout] MEM R[rt] Mem[ALUout] ¬ B ¬ MDR WB (Figure 5. 33 page 334) To Instruction Fetch (Figure 5. 32) EECC 550 - Shaaban #33 Lec # 5 Winter 2009 1 -5 -2010

Load/Store Execution (EX) Cycle (State 2) Effective address calculation ALUout ¬ A + Sign.

Load/Store Execution (EX) Cycle (State 2) Effective address calculation ALUout ¬ A + Sign. Ex(Imm 16) ALUSrc. A = 1 ALUOp = 00 (add) ALUSrc. B = 10 32 2 2 10 PC 32 PC+ 4 1 32 32 rs Branch Target rt rd 32 32 2 imm 16 00 Add 32 (ORI not supported, Jump supported) (Figure 5. 28 page 323) EECC 550 - Shaaban #34 Lec # 5 Winter 2009 1 -5 -2010

Load Memory (MEM) Cycle (State 3) MDR ¬ Mem[ALUout] Mem. Read = 1 Ior.

Load Memory (MEM) Cycle (State 3) MDR ¬ Mem[ALUout] Mem. Read = 1 Ior. D = 1 32 2 2 1 1 PC+ 4 PC 32 32 32 rs Branch Target rt rd 32 32 2 imm 16 32 (ORI not supported, Jump supported) (Figure 5. 28 page 323) EECC 550 - Shaaban #35 Lec # 5 Winter 2009 1 -5 -2010

Load Write Back (WB) Cycle (State 4) R[rt] ¬ MDR Reg. Write = 1

Load Write Back (WB) Cycle (State 4) R[rt] ¬ MDR Reg. Write = 1 Memto. Reg = 1 Reg. Dst = 0 32 2 2 1 32 PC+ 4 PC 32 0 32 rs Branch Target rt rd 32 32 1 2 imm 16 32 (ORI not supported, Jump supported) (Figure 5. 28 page 323) EECC 550 - Shaaban #36 Lec # 5 Winter 2009 1 -5 -2010

Store Memory (MEM) Cycle (State 5) Mem[ALUout] ¬ B Mem. Write = 1 Ior.

Store Memory (MEM) Cycle (State 5) Mem[ALUout] ¬ B Mem. Write = 1 Ior. D = 1 32 2 2 1 1 PC+ 4 PC 32 32 32 rs Branch Target rt rd 32 32 2 imm 16 32 (ORI not supported, Jump supported) (Figure 5. 28 page 323) EECC 550 - Shaaban #37 Lec # 5 Winter 2009 1 -5 -2010

(From Instruction Decode) R-Type Instructions FSM States EX ALUout ¬ A funct B WB

(From Instruction Decode) R-Type Instructions FSM States EX ALUout ¬ A funct B WB R[rd] ¬ ALUout To State 0 (Instruction Fetch) (Figure 5. 32) (Figure 5. 34 page 335) EECC 550 - Shaaban #38 Lec # 5 Winter 2009 1 -5 -2010

R-Type Execution (EX) Cycle (State 6) ALUout ¬ A funct B ALUSrc. A =

R-Type Execution (EX) Cycle (State 6) ALUout ¬ A funct B ALUSrc. A = 1 ALUSrc. B = 00 ALUOp = 10 (R-Type) 32 2 2 00 PC 32 PC+ 4 1 32 32 rs Branch Target rt rd 32 32 2 imm 16 10 R-Type 32 (ORI not supported, Jump supported) (Figure 5. 28 page 323) EECC 550 - Shaaban #39 Lec # 5 Winter 2009 1 -5 -2010

R-Type Write Back (WB) Cycle (State 7) R[rd] ¬ ALUout Reg. Write = 1

R-Type Write Back (WB) Cycle (State 7) R[rd] ¬ ALUout Reg. Write = 1 Memto. Reg = 0 Reg. Dst = 1 32 2 2 1 32 PC+ 4 PC 32 1 32 rs Branch Target rt rd 32 32 0 2 imm 16 32 (ORI not supported, Jump supported) (Figure 5. 28 page 323) EECC 550 - Shaaban #40 Lec # 5 Winter 2009 1 -5 -2010

Jump Instruction Single EX State Branch Instruction Single EX State (From Instruction Decode) Zero

Jump Instruction Single EX State Branch Instruction Single EX State (From Instruction Decode) Zero ¬ A - B PC ¬ Jump Address Zero : PC ¬ ALUout EX EX To State 0 (Instruction Fetch) (Figure 5. 32) (Figures 5. 35, 5. 36 page 337) To State 0 (Instruction Fetch) (Figure 5. 32) EECC 550 - Shaaban #41 Lec # 5 Winter 2009 1 -5 -2010

Branch Execution (EX) Cycle (State 8) Zero ¬ A - B Zero : PC

Branch Execution (EX) Cycle (State 8) Zero ¬ A - B Zero : PC ¬ ALUout ALUSrc. A = 1 PCWrite. Cond = 1 ALUSrc. B = 00 PCSource = 01 ALUOp = 01 (Subtract) 32 1 01 2 2 00 PC 32 PC+ 4 1 32 32 rs Branch Target rt rd 32 32 2 imm 16 01 Subtract 32 (ORI not supported, Jump supported) (Figure 5. 28 page 323) EECC 550 - Shaaban #42 Lec # 5 Winter 2009 1 -5 -2010

Jump Execution (EX) Cycle (State 9) PC ¬ Jump Address PCWrite = 1 PCSource

Jump Execution (EX) Cycle (State 9) PC ¬ Jump Address PCWrite = 1 PCSource = 10 32 10 1 2 2 1 PC+ 4 PC 32 32 32 rs Branch Target rt rd 32 32 2 imm 16 32 (ORI not supported, Jump supported) (Figure 5. 28 page 323) EECC 550 - Shaaban #43 Lec # 5 Winter 2009 1 -5 -2010

MIPS Multi-cycle Datapath Performance Evaluation • What is the average CPI? – State diagram

MIPS Multi-cycle Datapath Performance Evaluation • What is the average CPI? – State diagram gives CPI for each instruction type. – Workload (program) below gives frequency of each type. Type CPIi for type Frequency CPIi x freq. Ii Arith/Logic 4 40% 1. 6 Load 5 30% 1. 5 Store 4 10% 0. 4 branch 3 20% 0. 6 Average CPI: 4. 1 Better than CPI = 5 if all instructions took the same number of clock cycles (5). T = I x CPI x C EECC 550 - Shaaban #44 Lec # 5 Winter 2009 1 -5 -2010

Adding Support for swap to Multi Cycle Datapath • You are to add support

Adding Support for swap to Multi Cycle Datapath • You are to add support for a new instruction, swap that exchanges the values of two registers to the MIPS multicycle datapath of Figure 5. 28 on page 232 R[rt] ¬ R[rs] swap $rs, $rt R[rs] ¬ R[rt] • Swap used the R-Type format with: the value of field rs = the value of field rd • Add any necessary datapaths and control signals to the multicycle datapath. Find a solution that minimizes the number of clock cycles required for the new instruction without modifying the register file. Justify the need for the modifications, if any. i. e No additional register write ports • Show the necessary modifications to the multicycle control finite state machine of Figure 5. 38 on page 339 when adding the swap instruction. For each new state added, provide the dependent RTN and active control signal values. EECC 550 - Shaaban #45 Lec # 5 Winter 2009 1 -5 -2010

Adding swap Instruction Support to Multi Cycle Datapath Swap $rs, $rt We assume here

Adding swap Instruction Support to Multi Cycle Datapath Swap $rs, $rt We assume here rs = rd in instruction encoding R[rt] ¬ R[rs] op R[rs] ¬ R[rt] [31 -26] rs rt [25 -21] 2 rs R[rs] rt [20 -16] rd [10 -6] 2 PC+ 4 Branch Target R[rt] rd imm 16 2 The outputs of A and B should be connected to the multiplexor controlled by Memto. Reg if one of the two fields (rs and rd) contains the name of one of the registers being swapped. The other register is specified by rt. The Memto. Reg control signal becomes two bits. EECC 550 - Shaaban #46 Lec # 5 Winter 2009 1 -5 -2010

Adding swap Instruction Support to Multi Cycle Datapath IF A ¬ R[rs] IR ¬

Adding swap Instruction Support to Multi Cycle Datapath IF A ¬ R[rs] IR ¬ Mem[PC] PC ¬ PC + 4 ID B ¬ R[rt] ALUout ¬ PC + (Sign. Ext(imm 16) x 4) EX ALUout ¬ A + Sign. Ex(Imm 16) WB 1 R[rd] ¬ B ALUout ¬ A func B Zero ¬ A -B Zero: PC ¬ ALUout WB 2 R[rt] ¬ A R[rd] ¬ ALUout MEM WB Swap takes 4 cycles WB EECC 550 - Shaaban #47 Lec # 5 Winter 2009 1 -5 -2010

Adding Support for add 3 to Multi Cycle Datapath • You are to add

Adding Support for add 3 to Multi Cycle Datapath • You are to add support for a new instruction, add 3, that adds the values of three registers, to the MIPS multicycle datapath of Figure 5. 28 on page 232 For example: add 3 $s 0, $s 1, $s 2, $s 3 Register $s 0 gets the sum of $s 1, $s 2 and $s 3. The instruction encoding uses a modified R-format, with an additional register specifier rx added replacing the five low bits of the “funct” field. 6 bits [31 -26] • • 5 bits [25 -21] 5 bits [20 -16] 5 bits [15 -11] OP rs rt rd add 3 $s 1 $s 2 $s 0 6 bits [10 -5] 5 bits [4 -0] rx Not used $s 3 Add necessary datapath components, connections, and control signals to the multicycle datapath without modifying the register bank or adding additional ALUs. Find a solution that minimizes the number of clock cycles required for the new instruction. Justify the need for the modifications, if any. Show the necessary modifications to the multicycle control finite state machine of Figure 5. 38 on page 339 when adding the add 3 instruction. For each new state added, provide the dependent RTN and active control signal values. EECC 550 - Shaaban #48 Lec # 5 Winter 2009 1 -5 -2010

Add 3 $rd, $rs, $rt, $rx add 3 instruction support to Multi Cycle Datapath

Add 3 $rd, $rs, $rt, $rx add 3 instruction support to Multi Cycle Datapath rx is a new register specifier in field [0 -4] of the instruction No additional register read ports or ALUs allowed R[rd] ¬ R[rs] + R[rt] + R[rx] Modified R-Format op [31 -26] rs rt [25 -21] [20 -16] rd rx [10 -6] [4 -0] 2 Write. B rs rt 2 2 PC+ 4 Branch Target rx rd imm 16 1. ALUout is added as an extra input to first ALU operand MUX to use the previous ALU result as an input for the second addition. 2. A multiplexor should be added to select between rt and the new field rx containing register number of the 3 rd operand (bits 4 -0 for the instruction) for input for Read Register 2. This multiplexor will be controlled by a new one bit control signal called Read. Src. 3. Write. B control line added to enable writing R[rx] to B EECC 550 - Shaaban #49 Lec # 5 Winter 2009 1 -5 -2010

add 3 instruction support to Multi Cycle Datapath IF A ¬ R[rs] IR ¬

add 3 instruction support to Multi Cycle Datapath IF A ¬ R[rs] IR ¬ Mem[PC] PC ¬ PC + 4 B ¬ ID R[rt] ALUout ¬ PC + (Sign. Ext(imm 16) x 4) EX ALUout ¬ Write. B A + Sign. Ex(Im 16) EX 1 ALUout ¬ A + B Write. B ALUout ¬ A func B Zero ¬ A -B Zero: PC ¬ ALUout B ¬ R[rx] EX 2 ALUout ¬ ALUout + B R[rd] ¬ ALUout MEM WB Add 3 takes 5 cycles WB EECC 550 - Shaaban #50 Lec # 5 Winter 2009 1 -5 -2010