14 332 331 Computer Architecture and Assembly Language
14: 332: 331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s UCB CS 152 slides and Mary Jane Irwin’s PSU CSE 331 slides] 331 W 10. 1 Spring 2005
Head’s Up q This week’s material l Multicycle MIPS datapath implementation - Reading assignment – PH 5. 5 and C. 3 331 W 10. 2 Spring 2005
Review: Single Cycle Data and Control Path Instr[25 -0] Shift left 2 26 28 1 32 0 PC+4[31 -28] 0 Add ALUOp Branch Add Shift left 2 4 Jump 1 PCSrc Mem. Read Memto. Reg Mem. Write Instr[31 -26] Control ALUSrc Reg. Dst Instruction Memory PC Read Address Instr[31 -0] Reg. Write ovf Instr[25 -21] Read Addr 1 Register Read Instr[20 -16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15 -0] Write Data zero 0 ALU Data 2 Address Data Memory Read Data 1 Write Data 0 1 Sign 16 Extend 32 ALU control Instr[5 -0] 331 W 10. 3 Spring 2005
Disadvantages of the Single Cycle Datapath q Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction l q especially problematic for more complex instructions like floating point multiply Is wasteful of area since some functional units must be duplicated since they can not be “shared” during an instruction execution l 331 W 10. 4 e. g. , need separate adders to do PC update and branch target address calculations, as well as an ALU to do Rtype arithmetic/logic operations and data memory address calculations Spring 2005
Multicycle Implementation Overview q Each step in the execution takes 1 clock cycle q An instruction takes more than 1 clock cycle to complete q Not every instruction takes the same number of clock cycles to complete q Multicycle implementations allow l functional units to be used more than once per instruction as long as they are used on different clock cycles, as a result - only need one memory - need only one ALU/adder l l 331 W 10. 5 faster clock rates different instructions to take a different number of clock cycles Spring 2005
The Multicycle Datapath – A High Level View Registers have to be added after every major functional unit to hold the output value until it is used in a subsequent clock cycle Write Data 331 W 10. 6 MDR Write Data 2 ALU Spring 2005 ALUout Read Data (Instr. or Data) A Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read Address B PC Memory IR q
Clocking the Multicycle Datapath System Clock cycle Write Data 331 W 10. 7 MDR Write Data 2 ALU Spring 2005 ALUout Read Data (Instr. or Data) A Read Addr 1 Read Register Data 1 Read Addr 2 File Write Addr Read Address B PC Memory Reg. Write IR Mem. Write
Multicycle Approach q Break up the instructions into steps where each step takes a cycle while trying to l l q balance the amount of work to be done in each step restrict each cycle to use only one major functional unit At the end of a cycle l Store values needed in a later cycle by the current instruction in a state element (internal register) not visible to the programmer IR – Instruction Register MDR – Memory Data Register A and B – register file read data registers ALUout – ALU output register l l 331 W 10. 8 All (except IR) hold data only between a pair of adjacent clock cycles (so don’t need a write control signal) Data used by subsequent instructions are stored in programmer visible state elements (i. e. , register file, PC, or memory) Spring 2005
The Complete Multicycle Data with Control 1 Memory Address Read Data (Instr. or Data) 1 1 Write Data 0 MDR Write Data 2 Shift left 2 28 2 0 1 zero ALU 4 0 Instr[15 -0] Sign Extend 32 Instr[5 -0] 331 W 10. 9 Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR PC Instr[31 -26] 0 PC[31 -28] 0 1 2 3 ALU control Spring 2005 ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite
Multicycle Approach, con’t q Reading/writing to l l any of the internal registers or the PC occurs (quickly) at the end of a clock cycle reading/writing to the register file takes ~50% of a clock cycle since it has additional control and access overhead (reading can be done in parallel with decode) q Have to add multiplexors in front of several of the functional unit inputs because the functional units are shared by different instruction cycles q All operations occurring in one step occur in parallel within the same clock cycle l 331 W 10. 10 This limits us to one ALU operation, one memory access, and one register file access per step (per clock cycle) Spring 2005
Five Instruction Steps q Instruction Fetch q Instruction Decode and Register Fetch q R-type Instruction Execution, Memory Read/Write Address Computation, Branch Completion, or Jump Completion q Memory Read Access, Memory Write Completion or R-type Instruction Completion q Memory Read Completion (Write Back) INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! 331 W 10. 11 Spring 2005
Step 1: Instruction Fetch q Use PC to get instruction from the memory and put it in the Instruction Register q Increment the PC by 4 and put the result back in the PC q Can be described succinctly using RTL "Register. Transfer Language“ IR = Memory[PC]; PC = PC + 4; Can we figure out the values of the control signals? 331 W 10. 12 What is the advantage of updating the PC now? Spring 2005
Fetch Control Signals Settings Instr Fetch Start 331 W 10. 13 Spring 2005
Step 2: Instruction Decode and Register Fetch q Don’t know what the instruction is yet, so can only l l q Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A = Reg[IR[25 -21]]; B = Reg[IR[20 -16]]; ALUOut = PC +(sign-extend(IR[15 -0])<< 2); Note we aren't setting any control lines based on the instruction (since we are busy "decoding" it in our control logic) 331 W 10. 14 Spring 2005 q
Datapath Activity During Instruction Decode 1 Memory Address Read Data (Instr. or Data) 1 1 Write Data 0 MDR Write Data 2 Shift left 2 28 2 0 1 zero ALU 4 0 Instr[15 -0] Sign Extend 32 Instr[5 -0] 331 W 10. 15 Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR PC Instr[31 -26] 0 PC[31 -28] 0 1 2 3 ALU control Spring 2005 ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite
Decode Control Signals Settings Unless otherwise assigned Start PCWrite, IRWrite, Mem. Write, Reg. Write=0 others=X 331 W 10. 16 Ior. D=0 Instr Fetch Mem. Read; IRWrite ALUSrc. A=0 ALUsrc. B=01 PCSource, ALUOp=00 PCWrite Decode Spring 2005
Step 3 (instruction dependent) q ALU is performing one of four functions, based on instruction type q Memory reference (lw and sw): ALUOut = A + sign-extend(IR[15 -0]); q R-type: ALUOut = A op B; q Branch: if (A==B) PC = ALUOut; q Jump: PC = PC[31 -28] || (IR[25 -0] << 2); 331 W 10. 17 Spring 2005
Datapath Activity During Instruction Execute 1 Memory Address Read Data (Instr. or Data) 1 1 Write Data 0 MDR Write Data 2 Shift left 2 28 2 0 1 zero ALU 4 0 Instr[15 -0] Sign Extend 32 Instr[5 -0] 331 W 10. 18 Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR PC Instr[31 -26] 0 PC[31 -28] 0 1 2 3 ALU control Spring 2005 ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite
Execute Control Signals Settings Decode Ior. D=0 Instr Fetch Unless otherwise assigned Mem. Read; IRWrite ALUSrc. A=0 Start ALUSrc. A=0 ALUSrc. B=11 PCWrite, IRWrite, ALUsrc. B=01 ALUOp=00 Mem. Write, Reg. Write=0 PCSource, ALUOp=00 PCWrite. Cond=0 others=X ) e PCWrite ) p eq R-ty b ) w s = = (Op = j) lw or (Op p = p O O ( ( Execute 331 W 10. 19 Spring 2005
Step 4 (instruction dependent) q Memory reference: or q MDR = Memory[ALUOut]; -- lw Memory[ALUOut] = B; -- sw R-type instruction completion Reg[IR[15 -11]] = ALUOut; q 331 W 10. 20 Remember, the register write actually takes place at the end of the cycle on the clock edge Spring 2005
Datapath Activity During Memory Access 1 Memory Address Read Data (Instr. or Data) 1 1 Write Data 0 MDR Write Data 2 Shift left 2 28 2 0 1 zero ALU 4 0 Instr[15 -0] Sign Extend 32 Instr[5 -0] 331 W 10. 21 Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR PC Instr[31 -26] 0 PC[31 -28] 0 1 2 3 ALU control Spring 2005 ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite
Memory Access Control Signals Settings Decode Ior. D=0 Instr Fetch Unless otherwise assigned Mem. Read; IRWrite ALUSrc. A=0 Start ALUSrc. A=0 ALUSrc. B=11 PCWrite, IRWrite, ALUsrc. B=01 ALUOp=00 Mem. Write, Reg. Write=0 PCSource, ALUOp=00 PCWrite. Cond=0 others=X ) e PCWrite ) p eq R-ty b ) w s = = (Op = j) lw or (Op p = p O O ( ( ALUSrc. A=1 ALUSrc. B=10 ALUSrc. B=00 PCSource=10 Execute ALUOp=00 ALUOp=01 ALUOp=10 PCWrite. Cond=0 PCSource=01 PCWrite. Cond=0 (Op PCWrite. Cond = (Op = lw) sw ) Memory Access 331 W 10. 22 Spring 2005
Step 5: Memory Read Completion (Write Back) q All we have left is the write back into the register file the data just read from memory for lw instruction Reg[IR[20 -16]]= MDR; What about all the other instructions? 331 W 10. 23 Spring 2005
Datapath Activity During Write Back 1 Memory Address Read Data (Instr. or Data) 1 1 Write Data 0 MDR Write Data 2 Shift left 2 28 2 0 1 zero ALU 4 0 Instr[15 -0] Sign Extend 32 Instr[5 -0] 331 W 10. 24 Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR PC Instr[31 -26] 0 PC[31 -28] 0 1 2 3 ALU control Spring 2005 ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite
Write Back Control Signals Settings Decode Ior. D=0 Instr Fetch Unless otherwise assigned Mem. Read; IRWrite ALUSrc. A=0 Start ALUSrc. A=0 ALUSrc. B=11 PCWrite, IRWrite, ALUsrc. B=01 ALUOp=00 Mem. Write, Reg. Write=0 PCSource, ALUOp=00 PCWrite. Cond=0 others=X ) e PCWrite ) p eq R-ty b ) w s = = (Op = j) lw or (Op p = p O O ( ( ALUSrc. A=1 ALUSrc. B=10 ALUSrc. B=00 PCSource=10 Execute ALUOp=00 ALUOp=01 ALUOp=10 PCWrite. Cond=0 PCSource=01 PCWrite. Cond=0 (Op PCWrite. Cond = (Op = lw) sw ) Memory Access Mem. Read Ior. D=1 PCWrite. Cond=0 Mem. Write Ior. D=1 PCWrite. Cond=0 Reg. Dst=1 Reg. Write Memto. Reg=0 PCWrite. Cond=0 Write Back 331 W 10. 25 Spring 2005
RTL Summary Step Instr fetch Decode Execute Memory access Writeback 331 W 10. 26 R-type Mem Ref Branch Jump IR = Memory[PC]; PC = PC + 4; A = Reg[IR[25 -21]]; B = Reg[IR[20 -16]]; ALUOut = PC +(sign-extend(IR[15 -0])<< 2); ALUOut = A if (A==B) PC = A op B; + sign-extend PC = PC[31 -28] (IR[15 -0]); ALUOut; ||(IR[25 -0] << 2); Reg[IR[15 MDR = 11]] = Memory[ALUOut]; ALUOut; or Memory[ALUOut] = B; Reg[IR[20 -16]] = MDR; Spring 2005
Simple Questions q How many cycles will it take to execute this code? Label: lw lw beq add sw. . . $t 2, $t 3, $t 2, $t 5, 0($t 3) 4($t 3) $t 3, Label $t 2, $t 3 8($t 3) #assume not q What is going on during the 8 th cycle of execution? q In what cycle does the actual addition of $t 2 and $t 3 takes place? q In what cycle is the branch target address calculated? 331 W 10. 27 Spring 2005
Multicycle Control q Multicycle datapath control signals are not determined solely by the bits in the instruction l q e. g. , op code bits tell what operation the ALU should be doing, but not what instruction cycle is to be done next We’ll use a finite state machine for control a set of states (current state stored in State Register) l next state function (determined by current state and the input) Combinational l output function (determined by control logic current state and the input). . . Inst Opcode . . . l Datapath control points . . . State Reg Next State We’ll use a Moore machine 331 W 10. 28 Spring 2005 (so control signals based only on current q
Combinational control logic Outputs Finite State Machine Implementation Op 5 Op 4 Op 3 Op 2 Op 1 Op 0 Inputs State Reg PCWrite. Cond Ior. D Mem. Read Mem. Write IRWrite Memto. Reg PCSource ALUOp ALUSource. B ALUSource. A Reg. Write Reg. Dst Next State Inst[31 -26] System Clock 331 W 10. 29 Spring 2005
Datapath Control Outputs Truth Table Outputs PCWrite. Con d Ior. D Mem. Read Mem. Write IRWrite Memto. Reg PCSource ALUOp ALUSrc. B ALUSrc. A Reg. Write Reg. Dst 331 W 10. 30 Input Values (Current State[3 -0]) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1 X 0 0 0 0 1 1 X 00 00 01 0 0 X X X 0 0 X XX 00 10 1 0 X 1 1 0 0 X XX X 0 X X X 0 0 1 XX XX XX X 1 0 1 X 1 0 X XX X 0 X X X 0 0 X XX 10 00 1 0 X X X 0 0 0 XX XX XX X 1 1 X X 0 0 X 01 01 00 1 0 X X X 0 0 X 10 XX XX X 0 X Spring 2005
Next State Truth Table Current State [30] 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 331 W 10. 31 Inst[31 -26] 000000 (R-type) 0001 0110 XXXX 0111 0000 XXXX 000010 (jmp) 0001 1001 XXXX XXXX 0000 000100 (beq) 0001 1000 XXXX XXXX 0000 XXXX (Op[5 -0]) 100011 (lw) 0001 0010 0011 0100 0000 XXXX XXXX 101011 (sw) 0001 0010 0101 XXXX 0000 XXXX Any other 0001 illegal illegal illegal Spring 2005
- Slides: 31