Advanced Computer Architecture 5 MD 00 5 Z















![Datapath with Control 0 M u x Add 4 Instruction [31– 26] PC Instruction Datapath with Control 0 M u x Add 4 Instruction [31– 26] PC Instruction](https://slidetodoc.com/presentation_image_h/13c4530712250d1e4f64b558cc9d09b8/image-16.jpg)

















![Step 4 (R-type or memoryaccess) n Loads and stores access memory MDR = Memory[ALUOut]; Step 4 (R-type or memoryaccess) n Loads and stores access memory MDR = Memory[ALUOut];](https://slidetodoc.com/presentation_image_h/13c4530712250d1e4f64b558cc9d09b8/image-34.jpg)
![Write-back step n Memory read completion step Reg[IR[20 -16]]= MDR; What about all the Write-back step n Memory read completion step Reg[IR[20 -16]]= MDR; What about all the](https://slidetodoc.com/presentation_image_h/13c4530712250d1e4f64b558cc9d09b8/image-35.jpg)














- Slides: 49
Advanced Computer Architecture 5 MD 00 / 5 Z 033 MIPS Design data path and control Henk Corporaal www. ics. ele. tue. nl/~heco/courses/aca TUEindhoven 2010 H. Corporaal ACA 1
Topics n Building a datapath u n A single cycle processor datapath u n support a subset of the MIPS-I instruction-set all instruction actions in one (long) cycle A multi-cycle processor datapath u each instructions takes multiple (shorter) cycles n Exception support n For details see this book (4 th ed. ch 4): H. Corporaal ACA 5 MD 00 2
Datapath and Control FSM or Microprogramming Registers & Memories Multiplexors Buses ALUs Control H. Corporaal ACA 5 MD 00 Datapath 3
The Processor: Datapath & Control n Simplified MIPS implementation to contain only: u u u n lw, sw add, sub, and, or, beq, j Generic Implementation: u u n memory-reference instructions: arithmetic-logical instructions: slt control flow instructions: use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do All instructions use the ALU after reading the registers Why? F H. Corporaal ACA 5 MD 00 memory-reference? 4
More Implementation Details n Abstract / Simplified View: Data Address PC Instruction memory Instruction Register # Registers Register # ALU Address Data memory Register # Data n Two types of functional units: u u H. Corporaal ACA 5 MD 00 elements that operate on data values (combinational) elements that contain state (sequential) 5
Our Implementation n n An edge triggered methodology Typical execution: u u u read contents of some state elements, send values through some combinational logic, write results to one or more state elements State element 1 Combinational logic State element 2 Clock cycle H. Corporaal ACA 5 MD 00 6
D flip-flop n Output changes only on the clock edge D D C D latch Q D latch _ C Q Q _ Q C H. Corporaal ACA 5 MD 00 7
Register File n 3 -ported: one write, two read ports Read reg. #1 Read data 1 Read reg. #2 Read data 2 Write reg. # Write data Write H. Corporaal ACA 5 MD 00 8
Register file: read ports • Register file built using D flip-flops Read register number 1 Register 0 Register 1 M Register n – 1 u x Read data 1 Register n Read register number 2 M u Read data 2 x Implementation of the read ports H. Corporaal ACA 5 MD 00 9
Register file: write port n Note: we still use the real clock to determine when to write W r ite 0 1 R e g is te r n u m b e r n -to -1 C R e g iste r 0 D C d e co d e r n – 1 R e g iste r 1 D n C R e g is te r n – 1 D C R e g iste r n R e g iste r d a ta H. Corporaal ACA 5 MD 00 D 10
Simple Implementation Include the functional units we need for each instruction n Instruction address Mem. Write PC Instruction Add Sum Instruction memory Address a. Instruction memory 5 Register numbers 5 5 Data b. Program counter 3 Read register 1 Read register 2 Registers Write register Write data c. Adder ALU control Write data Read data Data memory Data Sign extend 32 Mem. Read a. Data memory unit Read data 1 16 b. Sign-extension unit Zero ALU result Read data 2 Reg. Write a. Registers H. Corporaal ACA 5 MD 00 b. ALU 11
Building the Datapath n Use multiplexors to stitch them together PCSrc M u x Add ALU result 4 Shift left 2 Registers PC Read address Instruction memory Read register 1 Read data 1 register 2 Write register Write data Reg. Write 16 H. Corporaal ACA 5 MD 00 ALUSrc Read data 2 Sign extend M u x 32 3 ALU operation Zero ALU result Mem. Write Memto. Reg Address Read data Data Write memory data M u x Mem. Read 12
Our Simple Control Structure n n n All of the logic is combinational We wait for everything to settle down, and the right thing to be done u ALU might not produce “right answer” right away u we use write signals along with clock to determine when to write Cycle time determined by length of the longest path S tate elem ent 1 Com binational logic State elem ent 2 Clock cycle We are ignoring some details like setup and hold times ! H. Corporaal ACA 5 MD 00 13
Control n Selecting the operations to perform (ALU, read/write, etc. ) n Controlling the flow of data (multiplexor inputs) n Information comes from the 32 bits of the instruction n Example: add $8, $17, $18 000000 op n Instruction Format: 10001 rs 10010 rt 01000 rd 00000 100000 shamt funct ALU's operation based on instruction type and function code H. Corporaal ACA 5 MD 00 14
Control: 2 level implementation 31 6 Control 2 26 instruction register Opcode bit 2 ALUop 00: lw, sw 01: beq 10: add, sub, and, or, slt Funct. Control 1 H. Corporaal ACA 5 MD 00 5 6 3 ALUcontrol 000: and 001: or 010: add 110: sub 111: set on less than ALU 0 15
Datapath with Control 0 M u x Add 4 Instruction [31– 26] PC Instruction [31– 0] Instruction memory Read register 1 Instruction [20– 16] Instruction [15– 11] Instruction [15– 0] 0 M u x 1 1 Zero ALU result Address Shift left 2 Reg. Dst Branch Mem. Read Memto. Reg Control ALUOp Mem. Write ALUSrc Reg. Write Instruction [25– 21] Read address ALU Add result Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data 16 Sign extend Write data Read data Data memory 1 M u x 0 32 ALU control Instruction [5– 0] H. Corporaal ACA 5 MD 00 16
ALU Control 1 n n What should the ALU do with this instruction example: lw $1, 100($2) 2 1 op rs rt 100 16 bit offset ALU control input 000 001 010 111 n 35 AND OR add subtract set-on-less-than Why is the code for subtract 110 and not 011? H. Corporaal ACA 5 MD 00 17
ALU Control 1 n Must describe hardware to compute 3 -bit ALU control input given instruction type 00 = lw, sw 01 = beq, 10 = arithmetic u function code for arithmetic intputs Describe it using a truth table u n H. Corporaal ACA 5 MD 00 ALU Operation class, computed from instruction type (can turn into gates): outputs 18
ALU Control 1 n Simple combinational logic (truth tables) H. Corporaal ACA 5 MD 00 19
Deriving Control 2 signals Input 6 -bits 9 control (output) signals Determine these control signals directly from the opcodes: R-format: 0 lw: 35 sw: 43 beq: 4 H. Corporaal ACA 5 MD 00 20
Control 2 n PLA example implementation H. Corporaal ACA 5 MD 00 21
Single Cycle Implementation n Calculate cycle time assuming negligible delays except: u memory (2 ns), ALU and adders (2 ns), register file access (1 ns) PCSrc Add 4 Reg. Write Instruction [25– 21] PC Read address Instruction [31– 0] Instruction memory Instruction [20– 16] 1 M u Instruction [15– 11] x 0 Reg. Dst Instruction [15– 0] Read register 1 Read register 2 Read data 1 Read Write data 2 register Write Registers data 16 Sign 32 extend Shift left 2 ALU Add result 1 M u x 0 Mem. Write ALUSrc 1 M u x 0 ALU control Zero ALU result Memto. Reg Address Write data Read data Data memory 1 M u x 0 Mem. Read Instruction [5– 0] ALUOp H. Corporaal ACA 5 MD 00 22
Single Cycle Implementation n Memory (2 ns), ALU & adders (2 ns), reg. file access (1 ns) n Fixed length clock: longest instruction is the ‘lw’ which requires 8 ns n Variable clock length (not realistic, just as exercise): u u u n R-instr: Load: Store: Branch: Jump: 6 ns 8 ns 7 ns 5 ns 2 ns Average depends on instruction mix (see pg 374) H. Corporaal ACA 5 MD 00 23
Where we are headed n Single Cycle Problems: u u n what if we had a more complicated instruction like floating point? wasteful of area: NO Sharing of Hardware resources One Solution: u use a “smaller” cycle time have different instructions take different numbers of cycles a “multicycle” datapath: Instruction register PC Address IR ALU Registers Memory data register MDR H. Corporaal ACA 5 MD 00 A Register # Instruction Memory or data Data ALUOut Register # B Register # 24
Multicycle Approach n We will be reusing functional units u u n n Add registers after every major functional unit Our control signals will not be determined solely by instruction u n ALU used to compute address and to increment PC Memory used for instruction and data e. g. , what should the ALU do for a “subtract” instruction? We’ll use a finite state machine (FSM) or microcode for control H. Corporaal ACA 5 MD 00 25
Review: finite state machines n Finite state machines: u u u a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) Current state Inputs Next-state function Clock Output function u Next state Outputs We’ll use a Moore machine (output based only on current state) H. Corporaal ACA 5 MD 00 26
Multicycle Approach n Break up the instructions into steps, each step takes a cycle u u n At the end of a cycle u u n balance the amount of work to be done restrict each cycle to use only one major functional unit store values for use in later cycles (easiest thing to do) introduce additional “internal” registers Notice: we distinguish u u processor state: programmer visible registers internal state: programmer invisible registers (like IR, MDR, A, B, and ALUout) H. Corporaal ACA 5 MD 00 27
Multicycle Approach PC 0 M u x 1 Address Memory Mem. Data Write data Instruction [25– 21] Read register 1 Instruction [20– 16] Read data 1 register 2 Registers Write Read register data 2 Instruction [15– 0] Instruction [15– 11] Instruction register Instruction [15– 0] Memory data register H. Corporaal ACA 5 MD 00 0 M u x 1 A B Sign extend 32 Zero ALU result ALUOut 0 4 Write data 16 0 M u x 1 1 M u 2 x 3 Shift left 2 28
Multicycle Approach n Note that previous picture does not include: u u u branch support jump support Control lines and logic n Tclock > max (ALU delay, Memory access, Regfile access) n See book for complete picture H. Corporaal ACA 5 MD 00 29
Five Execution Steps n Instruction Fetch n Instruction Decode and Register Fetch n Execution, Memory Address Computation, or Branch Completion n Memory Access or R-type instruction completion n Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! H. Corporaal ACA 5 MD 00 30
Step 1: Instruction Fetch n n n Use PC to get instruction and put it in the Instruction Register Increment the PC by 4 and put the result back in the PC Can be described succinctly using RTL "Register. Transfer Language" IR = Memory[PC]; PC = PC + 4; n Can we figure out the values of the control signals? n What is the advantage of updating the PC now? H. Corporaal ACA 5 MD 00 31
Step 2: Instruction Decode and Register Fetch n n Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch Previous two actions are done optimistically!! RTL: A = Reg[IR[25 -21]]; B = Reg[IR[20 -16]]; ALUOut = PC+(sign-extend(IR[15 -0])<< 2); n We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic) H. Corporaal ACA 5 MD 00 32
Step 3 (instruction dependent) n ALU is performing one of four functions, based on instruction type n Memory Reference: ALUOut = A + sign-extend(IR[15 -0]); n R-type: ALUOut = A op B; n Branch: if (A==B) PC = ALUOut; n Jump: PC = PC[31 -28] || (IR[25 -0]<<2) H. Corporaal ACA 5 MD 00 33
Step 4 (R-type or memoryaccess) n Loads and stores access memory MDR = Memory[ALUOut]; or Memory[ALUOut] = B; n R-type instructions finish Reg[IR[15 -11]] = ALUOut; The write actually takes place at the end of the cycle on the edge H. Corporaal ACA 5 MD 00 34
Write-back step n Memory read completion step Reg[IR[20 -16]]= MDR; What about all the other instructions? H. Corporaal ACA 5 MD 00 35
Summary execution steps Steps taken to execute any instruction class H. Corporaal ACA 5 MD 00 36
Simple Questions n How many cycles will it take to execute this code? lw $t 2, 0($t 3) lw $t 3, 4($t 3) beq $t 2, $t 3, L 1 add $t 5, $t 2, $t 3 sw $t 5, 8($t 3) L 1: . . . n n #assume not taken What is going on during the 8 th cycle of execution? In what cycle does the actual addition of $t 2 and $t 3 takes place? H. Corporaal ACA 5 MD 00 37
Implementing the Control n Value of control signals is dependent upon: u u n Use the information we have accumulated to specify a finite state machine (FSM) u u n what instruction is being executed which step is being performed specify the finite state machine graphically, or use microprogramming Implementation can be derived from specification H. Corporaal ACA 5 MD 00 38
FSM: high level view Start/reset Instruction fetch, decode and register fetch Memory access instructions H. Corporaal ACA 5 MD 00 R-type instructions Branch instruction Jump instruction 39
Memory address computation (Op 2 = 'L W' (O ) or 'S p= ') EQ = e) Branch completion Execution 6 ALUSrc. A = 1 ALUSrc. B = 10 ALUOp = 00 8 ALUSrc. A = 1 ALUSrc. B = 00 ALUOp = 10 Jump completion 9 ALUSrc. A = 1 ALUSrc. B = 00 ALUOp = 01 PCWrite. Cond PCSource = 01 PCWrite PCSource = 10 (O p = 'S W ') (Op = 'LW') (O p W ') yp R -t 'B How many state bits will we need? ALUSrc. A = 0 ALUSrc. B = 11 ALUOp = 00 = n 1 p Start Mem. Read ALUSrc. A = 0 Ior. D = 0 IRWrite ALUSrc. B = 01 ALUOp = 00 PCWrite PCSource = 00 (Op = 'J') Graphical Specification of FSM Instruction decode/ register fetch Instruction fetch Memory access 3 Memory access 5 Mem. Read Ior. D = 1 R-type completion 7 Mem. Write Ior. D = 1 Reg. Dst = 1 Reg. Write Memto. Reg = 0 Write-back step 4 Reg. Dst = 0 Reg. Write Memto. Reg = 1 H. Corporaal ACA 5 MD 00 40
Finite State Machine for Control Implementation: H. Corporaal ACA 5 MD 00 41
opcode PLA Impleme n-tation (see fig C. 14) current state n If I picked a horizontal or vertical line could you explain it ? What type of FSM is used? Mealy or Moore? H. Corporaal ACA 5 MD 00 datapath control n next state 42
ROM Implementation n ROM = "Read Only Memory" u n values of memory locations are fixed ahead of time A ROM can be used to implement a truth table u u if the address is m-bits, we can address 2 m entries in the ROM our outputs are the bits of data that the address points to ROM n bits m bits address 0 0 0 1 1 1 0 0 1 1 1 0 0 data 0 1 1 0 0 0 0 1 1 1 0 0 1 0 1 m is the "heigth", and n is the "width" H. Corporaal ACA 5 MD 00 43
ROM Implementation n How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i. e. , 210 = 1024 different addresses) How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs ROM is 210 x 20 = 20 K bits (very large and a rather unusual size) Rather wasteful, since for lots of the entries, the outputs are the same H. Corporaal ACA 5 MD 00 — i. e. , opcode is often ignored n 44
ROM Implementation Cheaper implementation: n n Exploit the fact that the FSM is a Moore machine ==> u Control outputs only depend on current state and not on other incoming control signals ! u Next state depends on all inputs Break up the table into two parts — 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM — 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM — Total number of bits: 4. 3 K bits of ROM H. Corporaal ACA 5 MD 00 45
ROM vs PLA n n PLA is much smaller u can share product terms (ROM has an entry (=address) for every product term u only need entries that produce an active output u can take into account don't cares Size of PLA: (#inputs ´ #product-terms) + (#outputs ´ #productterms) u n For this example: (10 x 17)+(20 x 17) = 460 PLA cells usually slightly bigger than the size of a ROM cell H. Corporaal ACA 5 MD 00 46
Exceptions n n Unexpected events External: interrupt u n Internal: exception u n e. g. I/O request e. g. Overflow, Undefined instruction opcode, Software trap, Page fault How to handle exception? u u u Jump to general entry point (record exception type in status register) Jump to vectored entry point Address of faulting instruction has to be recorded ! H. Corporaal ACA 5 MD 00 47
Exceptions Changes needed: see fig. 5. 48 / 5. 49 / 5. 50 n n Extend PC input mux with extra entry with fixed address: “C 000000 hex” Add EPC register containing old PC (we’ll use the ALU to decrement PC with 4) u n Cause register (one bit in our case) containing: u u n extra input ALU src 2 needed with fixed value 4 0: undefined instruction 1: ALU overflow Add 2 states to FSM u u undefined instr. state #10 overflow state #11 H. Corporaal ACA 5 MD 00 48
Exceptions Legend: Int. Cause =0/1 Cause. Write ALUSrc. A = 0 ALUSrc. B = 01 ALUOp = 01 EPCWrite PCSource =11 type of exception write Cause register select PC select constant 4 subtract operation write EPC register with current PC write PC with exception address select exception address: C 000000 hex 2 New states: #10 undefined instruction Int. Cause =0 Cause. Write ALUSrc. A = 0 ALUSrc. B = 01 ALUOp = 01 EPCWrite PCSource =11 #11 overflow Int. Cause =1 Cause. Write ALUSrc. A = 0 ALUSrc. B = 01 ALUOp = 01 EPCWrite PCSource =11 To state 0 (begin of next instruction) H. Corporaal ACA 5 MD 00 49