CS 35101 Computer Architecture Week 11 Multi Cycle
- Slides: 39
CS 35101 - Computer Architecture Week 11: Multi Cycle Implementation Paul Durand ( www. cs. kent. edu/~durand ) [Adapted from M Irwin (www. cse. psu. edu/~mji) ] [Adapted from COD, Patterson & Hennessy, © 2005, UCB]
Head’s Up q This week’s material l l Multicycle MIPS datapath implementation Control via Finite State Machine - Reading assignment – PH 5. 5 and C. 3 q Reminders l q HW#6 is due ? Next week’s material l Microprogramming - Reading assignment – PH 5. 5, Appendix C. 4 -C. 6 l Introduction to pipelining - Reading assignment – PH 6. 1
Review: Single Cycle Data and Control Path Instr[25 -0] Shift left 2 26 28 1 32 0 PC+4[31 -28] 0 Add ALUOp Branch Add Shift left 2 4 Jump 1 PCSrc Mem. Read Memto. Reg Mem. Write Instr[31 -26] Control ALUSrc Reg. Dst Instruction Memory PC Read Address Instr[31 -0] Reg. Write ovf Instr[25 -21] Read Addr 1 Register Read Instr[20 -16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15 -0] Write Data zero 0 ALU Data 2 1 Sign 16 Extend 32 Instr[5 -0] ALU control Address Data Memory Read Data 1 Write Data 0
Disadvantages of the Single Cycle Datapath q Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction l q especially problematic for more complex instructions like floating point multiply Is wasteful of area since some functional units must be duplicated since they can not be “shared” during an instruction execution l e. g. , need separate adders to do PC update and branch target address calculations, as well as an ALU to do R-type arithmetic/logic operations and data memory address calculations
Multicycle Implementation Overview q Each step in the execution takes 1 clock cycle q An instruction takes more than 1 clock cycle to complete q Not every instruction takes the same number of clock cycles to complete q Multicycle implementations allow l functional units to be used more than once per instruction as long as they are used on different clock cycles, as a result - only need one memory - need only one ALU/adder l l faster clock rates different instructions to take a different number of clock cycles
The Multicycle Datapath – A High Level View Write Data 2 ALUout Read Data (Instr. or Data) A Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read Address B PC Memory IR Registers have to be added after every major functional unit to hold the output value until it is used in a subsequent clock cycle MDR q
Clocking the Multicycle Datapath System Clock cycle Write Data 2 A ALUout Read Data (Instr. or Data) B Read Addr 1 Read Register Data 1 Read Addr 2 File Write Addr Read Address MDR PC Memory Reg. Write IR Mem. Write
Multicycle Approach q Break up the instructions into steps where each step takes a cycle while trying to l l q balance the amount of work to be done in each step restrict each cycle to use only one major functional unit At the end of a cycle l Store values needed in a later cycle by the current instruction in a state element (internal register) not visible to the programmer IR – Instruction Register MDR – Memory Data Register A and B – register file read data registers ALUout – ALU output register l l All (except IR) hold data only between a pair of adjacent clock cycles (so don’t need a write control signal) Data used by subsequent instructions are stored in programmer visible state elements (i. e. , register file, PC, or memory)
The Complete Multicycle Data with Control Address Read Data (Instr. or Data) 1 1 Write Data 0 Write Data Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR 1 Memory MDR PC Instr[31 -26] 0 PC[31 -28] Data 2 Instr[15 -0] Sign Extend 32 Instr[5 -0] Shift left 2 2 0 1 zero ALU 4 0 28 0 1 2 3 ALU control ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite
Review: ALU Control q Controlling the ALU makes use of multiple levels of decoding l l main control unit generates the ALUOp bits ALU control unit generates ALU control inputs Instr op lw sw beq add subt and or slt funct xxxxxx 100000 100010 100101 101010 ALUOp desired ALU control action input 00 add 010 01 subtract 110 10 add 010 10 subtract 110 10 and 000 10 or 001 10 slt 111
Multicycle Approach, con’t q Reading/writing to l l any of the internal registers or the PC occurs (quickly) at the end of a clock cycle reading/writing to the register file takes ~50% of a clock cycle since it has additional control and access overhead (reading can be done in parallel with decode) q Have to add multiplexors in front of several of the functional unit inputs because the functional units are shared by different instruction cycles q All operations occurring in one step occur in parallel within the same clock cycle l This limits us to one ALU operation, one memory access, and one register file access per step (per clock cycle)
Five Instruction Steps q Instruction Fetch q Instruction Decode and Register Fetch q R-type Instruction Execution, Memory Read/Write Address Computation, Branch Completion, or Jump Completion q Memory Read Access, Memory Write Completion or R-type Instruction Completion q Memory Read Completion (Write Back) INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
Step 1: Instruction Fetch q Use PC to get instruction from the memory and put it in the Instruction Register q Increment the PC by 4 and put the result back in the PC q Can be described succinctly using RTL "Register. Transfer Language“ IR = Memory[PC]; PC = PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now?
Datapath Activity During Instruction Fetch Address Read Data (Instr. or Data) 1 1 Write Data 0 Write Data Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR 1 Memory MDR PC Instr[31 -26] 0 PC[31 -28] Data 2 Instr[15 -0] Sign Extend 32 Instr[5 -0] Shift left 2 2 0 1 zero ALU 4 0 28 0 1 2 3 ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite 00 ALU control
Fetch Control Signals Settings Unless otherwise assigned Start PCWrite, IRWrite, Mem. Write, Reg. Write=0 others=X Ior. D=0 Instr Fetch Mem. Read; IRWrite ALUSrc. A=0 ALUsrc. B=01 PCSource, ALUOp=00 PCWrite
Step 2: Instruction Decode and Register Fetch q Don’t know what the instruction is yet, so can only l l q Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A = Reg[IR[25 -21]]; B = Reg[IR[20 -16]]; ALUOut = PC +(sign-extend(IR[15 -0])<< 2); q Note we aren't setting any control lines based on the instruction (since we are busy "decoding" it in our control logic)
Datapath Activity During Instruction Decode Address Read Data (Instr. or Data) 1 1 Write Data 0 Write Data Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR 1 Memory MDR PC Instr[31 -26] 0 PC[31 -28] Data 2 Instr[15 -0] Sign Extend 32 Instr[5 -0] Shift left 2 2 0 1 zero ALU 4 0 28 0 1 2 3 ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite 00 ALU control
Decode Control Signals Settings Unless otherwise assigned Start PCWrite, IRWrite, Mem. Write, Reg. Write=0 others=X Ior. D=0 Instr Fetch Mem. Read; IRWrite ALUSrc. A=0 ALUsrc. B=01 PCSource, ALUOp=00 PCWrite Decode ALUSrc. A=0 ALUSrc. B=11 ALUOp=00 PCWrite. Cond=0
Step 3 (instruction dependent) q ALU is performing one of four functions, based on instruction type q Memory reference (lw and sw): ALUOut = A + sign-extend(IR[15 -0]); q R-type: ALUOut = A op B; q Branch: if (A==B) PC = ALUOut; q Jump: PC = PC[31 -28] || (IR[25 -0] << 2);
Datapath Activity During lw & sw Execute Address Read Data (Instr. or Data) 1 1 Write Data 0 Write Data Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR 1 Memory MDR PC Instr[31 -26] 0 PC[31 -28] Data 2 Instr[15 -0] Sign Extend 32 Instr[5 -0] Shift left 2 2 0 1 zero ALU 4 0 28 0 1 2 3 ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite 00 ALU control
Datapath Activity During R-type Execute Address Read Data (Instr. or Data) 1 1 Write Data 0 Write Data Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR 1 Memory MDR PC Instr[31 -26] 0 PC[31 -28] Data 2 Instr[15 -0] Sign Extend 32 Instr[5 -0] Shift left 2 2 0 1 zero ALU 4 0 28 0 1 2 3 ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite 10 ALU control
Datapath Activity During beq Execute Address Read Data (Instr. or Data) 1 1 Write Data 0 Write Data Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR 1 Memory MDR PC Instr[31 -26] 0 PC[31 -28] Data 2 Instr[15 -0] Sign Extend 32 Instr[5 -0] Shift left 2 2 0 1 zero ALU 4 0 28 0 1 2 3 ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite 01 ALU control
Datapath Activity During j Execute Address Read Data (Instr. or Data) 1 1 Write Data 0 Write Data Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR 1 Memory MDR PC Instr[31 -26] 0 PC[31 -28] Data 2 Instr[15 -0] Sign Extend 32 Instr[5 -0] Shift left 2 2 0 1 zero ALU 4 0 28 0 1 2 3 ALU control ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite
Execute Control Signals Settings Decode Ior. D=0 Instr Fetch Unless otherwise assigned ALUSrc. A=0 Mem. Read; IRWrite ALUSrc. B=11 Start ALUSrc. A=0 PCWrite, IRWrite, ALUOp=00 ALUsrc. B=01 Mem. Write, Reg. Write=0 PCWrite. Cond=0 PCSource, ALUOp=00 others=X ) PCWrite ) type eq R b ) w s = = (Op = j) lw or (Op p = p O O ( ( ALUSrc. A=1 ALUSrc. B=10 ALUSrc. B=00 PCSource=10 ALUSrc. B=00 Execute ALUOp=00 ALUOp=01 PCWrite ALUOp=10 PCWrite. Cond=0 PCSource=01 PCWrite. Cond=0 PCWrite. Cond
Step 4 (instruction dependent) q Memory reference: or q MDR = Memory[ALUOut]; -- lw Memory[ALUOut] = B; -- sw R-type instruction completion Reg[IR[15 -11]] = ALUOut; q Remember, the register write actually takes place at the end of the cycle on the clock edge
Datapath Activity During lw Memory Access Address Read Data (Instr. or Data) 1 1 Write Data 0 Write Data Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR 1 Memory MDR PC Instr[31 -26] 0 PC[31 -28] Data 2 Instr[15 -0] Sign Extend 32 Instr[5 -0] Shift left 2 2 0 1 zero ALU 4 0 28 0 1 2 3 ALU control ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite
Datapath Activity During sw Memory Access Address Read Data (Instr. or Data) 1 1 Write Data 0 Write Data Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR 1 Memory MDR PC Instr[31 -26] 0 PC[31 -28] Data 2 Instr[15 -0] Sign Extend 32 Instr[5 -0] Shift left 2 2 0 1 zero ALU 4 0 28 0 1 2 3 ALU control ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite
Datapath Activity During R-type Memory Access Address Read Data (Instr. or Data) 1 1 Write Data 0 Write Data Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR 1 Memory MDR PC Instr[31 -26] 0 PC[31 -28] Data 2 Instr[15 -0] Sign Extend 32 Instr[5 -0] Shift left 2 2 0 1 zero ALU 4 0 28 0 1 2 3 ALU control ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite
Memory Access Control Signals Settings Decode Ior. D=0 Instr Fetch Unless otherwise assigned ALUSrc. A=0 Mem. Read; IRWrite ALUSrc. B=11 Start ALUSrc. A=0 PCWrite, IRWrite, ALUOp=00 ALUsrc. B=01 Mem. Write, Reg. Write=0 PCWrite. Cond=0 PCSource, ALUOp=00 others=X ) PCWrite ) type eq R b ) w s = = (Op = j) lw or (Op p = p O O ( ( ALUSrc. A=1 ALUSrc. B=10 ALUSrc. B=00 PCSource=10 Execute ALUOp=00 ALUOp=01 ALUOp=10 PCWrite. Cond=0 PCSource=01 PCWrite. Cond=0 (Op PCWrite. Cond = (Op = lw) sw ) Memory Access Mem. Read Ior. D=1 PCWrite. Cond=0 Mem. Write Ior. D=1 PCWrite. Cond=0 Reg. Dst=1 Reg. Write Memto. Reg=0 PCWrite. Cond=0
Step 5: Memory Read Completion (Write Back) q All we have left is the write back into the register file the data just read from memory for lw instruction Reg[IR[20 -16]]= MDR; What about all the other instructions?
Datapath Activity During lw Write Back Address Read Data (Instr. or Data) 1 1 Write Data 0 Write Data Shift left 2 Instr[25 -0] Read Addr 1 Register Read Addr 2 Data 1 File Write Addr Read IR 1 Memory MDR PC Instr[31 -26] 0 PC[31 -28] Data 2 Instr[15 -0] Sign Extend 32 Instr[5 -0] Shift left 2 2 0 1 zero ALU 4 0 28 0 1 2 3 ALU control ALUout Mem. Read Mem. Write Memto. Reg IRWrite PCSource ALUOp Control ALUSrc. B ALUSrc. A Reg. Write Reg. Dst A Ior. D B PCWrite. Cond PCWrite
Write Back Control Signals Settings Decode Ior. D=0 Instr Fetch Unless otherwise assigned Mem. Read; IRWrite ALUSrc. A=0 Start ALUSrc. A=0 ALUSrc. B=11 PCWrite, IRWrite, ALUsrc. B=01 ALUOp=00 Mem. Write, Reg. Write=0 PCSource, ALUOp=00 PCWrite. Cond=0 others=X ) e PCWrite ) p eq R-ty b ) w s = = (Op = j) lw or (Op p = p O O ( ( ALUSrc. A=1 ALUSrc. B=10 ALUSrc. B=00 PCSource=10 Execute ALUOp=00 ALUOp=01 ALUOp=10 PCWrite. Cond=0 PCSource=01 PCWrite. Cond=0 (Op PCWrite. Cond = (Op = lw) sw ) Memory Access Mem. Read Ior. D=1 PCWrite. Cond=0 Mem. Write Ior. D=1 PCWrite. Cond=0 Reg. Dst=0 Reg. Write Memto. Reg=1 PCWrite. Cond=0 Write Back Reg. Dst=1 Reg. Write Memto. Reg=0 PCWrite. Cond=0
RTL Summary Step Instr fetch Decode Execute Memory access Writeback R-type Mem Ref Branch Jump IR = Memory[PC]; PC = PC + 4; A = Reg[IR[25 -21]]; B = Reg[IR[20 -16]]; ALUOut = PC +(sign-extend(IR[15 -0])<< 2); ALUOut = A if (A==B) PC = A op B; + sign-extend PC = PC[31 -28] (IR[15 -0]); ALUOut; ||(IR[25 -0] << 2); Reg[IR[15 MDR = 11]] = Memory[ALUOut]; ALUOut; or Memory[ALUOut] = B; Reg[IR[20 -16]] = MDR;
Simple Questions q How many cycles will it take to execute this code? lw lw beq #assume not add sw Label: . . . q q q 5 $t 2, 0($t 3) 5 $t 3, 4($t 3) 3 $t 2, $t 3, Label 4 4 $t 5, $t 2, $t 3 = $t 5, 8($t 3) 21 cycles address for second lw being calculated What is going on during the 8 th cycle of execution? th cycle 16 In what cycle does the actual addition of $t 2 and $t 3 takes place? 12 th cycle In what cycle is the branch target address calculated?
Multicycle Control q Multicycle datapath control signals are not determined solely by the bits in the instruction l q e. g. , op code bits tell what operation the ALU should be doing, but not what instruction cycle is to be done next We’ll use a finite state machine for control a set of states (current state stored in State Register) l next state function (determined by current state and the input) l output function (determined by current state and the input) Combinational control logic . . . Inst Opcode q . . . l Datapath control points . . . State Reg We’ll use a Moore machine (so control signals based only on current state) Next State
Multicycle Datapath Finite State Machine Decode Ior. D=0 Instr Fetch 1 Unless otherwise assigned ALUSrc. A=0 Mem. Read; IRWrite ALUSrc. B=11 Start ALUSrc. A=0 PCWrite, IRWrite, ALUOp=00 ALUsrc. B=01 Mem. Write, Reg. Write=0 PCWrite. Cond=0 PCSource, ALUOp=00 others=X ) PCWrite ) type eq R b ) w s = = (Op = j) lw or (Op 2 p = p O O ( ( 9 6 8 ALUSrc. A=1 ALUSrc. B=10 ALUSrc. B=00 PCSource=10 Execute ALUOp=00 ALUOp=01 ALUOp=10 PCWrite. Cond=0 PCSource=01 PCWrite. Cond=0 (Op PCWrite. Cond = (Op = lw) sw ) 0 3 Memory Access Mem. Read Ior. D=1 PCWrite. Cond=0 4 5 Mem. Write Ior. D=1 PCWrite. Cond=0 Reg. Dst=0 Reg. Write Memto. Reg=1 PCWrite. Cond=0 Write Back 7 Reg. Dst=1 Reg. Write Memto. Reg=0 PCWrite. Cond=0
Combinational control logic Outputs Finite State Machine Implementation Op 5 Op 4 Op 3 Op 2 Op 1 Op 0 Inputs Inst[31 -26] System Clock State Reg PCWrite. Cond Ior. D Mem. Read Mem. Write IRWrite Memto. Reg PCSource ALUOp ALUSource. B ALUSource. A Reg. Write Reg. Dst Next State
Datapath Control Outputs Truth Table Outputs PCWrite. Cond Ior. D Mem. Read Mem. Write IRWrite Memto. Reg PCSource ALUOp ALUSrc. B ALUSrc. A Reg. Write Reg. Dst Input Values (Current State[3 -0]) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1 X 00 00 01 0 0 X X 0 0 X XX 00 10 1 0 X 0 0 1 1 0 0 X XX X 0 X 0 0 X X 0 0 1 XX XX XX X 1 0 0 0 1 X 1 0 X XX X 0 X 0 0 X XX 10 00 1 0 X 0 0 X X 0 0 0 XX XX XX X 1 1 0 1 X X 0 0 X 01 01 00 1 0 X 1 X X X 0 0 X 10 XX XX X 0 X
Next State Truth Table Current Inst[31 -26] (Op[5 -0]) State [3 00000010 000100 100011 101011 -0] (R-type) (jmp) (beq) (lw) (sw) 0000 0001 0001 0110 1001 1000 0010 XXXX 0011 0101 0011 XXXX 0100 XXXX 0000 XXXX 0101 XXXX 0000 0111 XXXX 0111 0000 XXXX 1000 XXXX 0000 XXXX 1001 XXXX 0000 XXXX Any other 0001 illegal illegal illegal
- Week by week plans for documenting children's development
- Single channel single phase example
- Multi loop pid controller regolatore pid multi loop
- What is architecture business cycle?
- Bus design in computer architecture
- Instruction cycle in computer architecture
- Forbidden latency in computer architecture
- Computer architecture and organisation
- Simple computer design
- Multi user dbms architecture
- Multi user dbms architecture
- Architecture multi tiers
- Unified multitier wot architecture
- Vmware multi tenant architecture
- Multi dbms architecture
- Classe multi-niveaux cycle 2
- Return architecture
- Modular product architectures
- Modular product architecture example
- Computer organization and architecture 10th solution
- Intel pentium
- Virtual lab for computer organization and architecture
- Introduction to computer organization and architecture
- Timing and control in computer architecture
- Computer architecture: concepts and evolution
- Programmed i/o in computer architecture
- Floating point division algorithm in computer architecture
- Absolute addressing mode in computer architecture
- Static interconnection network in computer architecture
- Smt in computer architecture
- Mips pseudo instructions
- 111011-100100
- Instruction format in computer architecture
- Nano programmed control unit
- Binary microprogram
- Memory system
- Dram memory mapping
- Linear pipelining
- Computer architecture definition
- Parallel processing definition