Ch 5 Designing a Single Cycle Datapath Computer



























![3 b: R-format instructions: add, sub, and, or, slt • R[rd] <- R[rs] op 3 b: R-format instructions: add, sub, and, or, slt • R[rd] <- R[rs] op](https://slidetodoc.com/presentation_image_h/d65d60c484ae0abdb1117063a9d2e827/image-28.jpg)


![3 d: Load & Store Operations • • R[rt] <- Mem[R[rs] + Sign. Ext[imm 3 d: Load & Store Operations • • R[rt] <- Mem[R[rs] + Sign. Ext[imm](https://slidetodoc.com/presentation_image_h/d65d60c484ae0abdb1117063a9d2e827/image-31.jpg)





![Putting it all together cont’d PCSrc Add 4 Reg. Write Instruction [25– 21] PC Putting it all together cont’d PCSrc Add 4 Reg. Write Instruction [25– 21] PC](https://slidetodoc.com/presentation_image_h/d65d60c484ae0abdb1117063a9d2e827/image-37.jpg)



















- Slides: 56

Ch 5: Designing a Single Cycle Datapath Computer Systems Architecture CS 424/524

The Big Picture: Where are We Now? • The Five Classic Components of a Computer Processor Input Control Memory Datapath Output • Today’s Topic: Design a Single Cycle Processor machine design Languages/Compilers (Ch 2) Arithmetic (Ch 3) technology

The Big Picture: The Performance Perspective CPI • Performance of a machine is determined by: – Instruction count – Clock cycle time Inst. Count Cycle Time – Clock cycles per instruction • Processor design (datapath and control) will determine: – Clock cycle time – Clock cycles per instruction • Today: – Single cycle processor: • Advantage: One clock cycle per instruction • Disadvantage: long cycle time

How to Design a Processor: step-by-step 1. Analyze instruction set => datapath requirements – the meaning of each instruction is given by the register transfers – datapath must include storage element for ISA registers • possibly more – datapath must support each register transfer 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic

The MIPS Instruction Formats • All MIPS instructions are 32 bits long. The three instruction formats: 31 26 op – R-type rs 6 bits – I-type – J-type 31 op 6 bits 31 16 rt 5 bits 26 5 bits 21 rs 5 bits 11 6 0 rd shamt funct 5 bits 6 bits 16 rt 5 bits 0 immediate 16 bits 26 op 6 bits • 21 0 target address 26 bits The different fields are: – op: operation of the instruction – rs, rt, rd: the source and destination register specifiers – shamt: shift amount – funct: selects the variant of the operation in the “op” field – address / immediate: address offset or immediate value – target address: target address of the jump instruction

Step 1 a: The MIPS-lite Subset • • • 31 26 21 16 11 6 ADD, SUB, AND, OR op rs rt rd shamt – add rd, rs, rt 6 bits 5 bits – sub rd, rs, rt – and rd, rs, rt – or rd, rs, rt 31 26 21 16 LOAD and STORE Word op rs rt immediate 6 bits 5 bits 16 bits – lw rt, rs, imm 16 – sw rt, rs, imm 16 BRANCH: 31 26 21 16 op rs rt immediate – beq rs, rt, imm 16 6 bits 5 bits 16 bits 0 funct 6 bits 0 0

Logical Register Transfers • RTL gives the meaning of the instructions • First step is to fetch the instruction from memory op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm 16 = MEM[ PC ] inst Register Transfers ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4 SUB R[rd] <– R[rs] – R[rt]; PC <– PC + 4 OR R[rt] <– R[rs] | R[rt]; PC <– PC + 4 LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm 16)]; PC <– PC + 4 STORE MEM[ R[rs] + sign_ext(Imm 16) ] <– R[rt]; PC <– PC + 4 BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm 16)] || 00 else PC <– PC + 4

Step 1: Requirements of the Instruction Set • Memory – instruction & data • Registers (32 x 32) – read RS – read RT – Write RT or RD • PC • Extender • Add and Sub register or extended immediate • Add 4 or extended immediate to PC

Step 2: Components of the Datapath • Combinational Elements • Storage Elements – Clocking methodology

Abstract/Simplified View of Datapath Data PC Address Instruction memory Instruction Register # Registers Register # ALU Address Data memory Register # Data • Two types of functional units: – elements that operate on data values (combinational) – elements that contain state (sequential)

Combinational Logic Elements (Basic Building Blocks) Carry. In A Adder 32 • Adder B Selec t A 32 B Sum Carry 32 MUX • MUX 32 32 Y 32 O P A B 32 ALU • ALU 32 32 Result

State Elements: Review • • Unclocked vs. Clocked Clocks used in synchronous logic – when should an element that contains state be updated? falling edge cycle time rising edge

An unclocked state element • The set-reset latch – output depends on present inputs and also on past inputs

Latches and Flip-flops • • Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) Change of state (value) is based on the clock Latches: whenever the inputs change, and the clock is asserted Flip-flop: state changes only on a clock edge (edge-triggered methodology) "logically true", — could mean electrically low A clocking methodology defines when signals can be read and written — wouldn't want to read a signal at the same time it was being written

D-latch • • Two inputs: – the data value to be stored (D) – the clock signal (C) indicating when to read & store D Two outputs: – the value of the internal state (Q) and its complement

D flip-flop • Output changes only on the clock edge D D C C D latch Q D latch _ C Q Q _ Q

Our Implementation • • An edge triggered methodology Typical execution: – read contents of some state elements, – send values through some combinational logic – write results to one or more state elements State element 1 Combinational logic Clock cycle State element Combinational logic State element 2

Storage Element: Register (Basic Building Block) • Register Write Enable – Similar to the D Flip Flop except Data In Data Out • N-bit input and output N N • Write Enable input – Write Enable: Clk • negated (0): Data Out will not change • asserted (1): Data Out will become Data In

Register File • Built using D flip-flops Read register number 1 Register 0 Register 1 Register n – 1 M u x Read register number 2 Read data 1 Register file Write register Read data 2 Register n Write data Read register number 2 M u x Read data 1 Read data 2 Write

Register File • Note: we still use the clock to determine when to write Write 0 R egister n u m ber C R e gi ster 0 1 D n-to-1 de co der C n – 1 R e gi ster 1 D n C R egister n – 1 D C R e gi ster n R e gi st er d at a D

Storage Element: Register File RWRARB Write Enable 5 5 5 • Register File consists of 32 registers: bus. A – Two 32 -bit output busses: bus. W 32 32 32 -bit bus. A and bus. B 32 Registers bus. B Clk – One 32 -bit input bus: bus. W 32 • Register is selected by: – RA (number) selects the register to put on bus. A (data) – RB (number) selects the register to put on bus. B (data) – RW (number) selects the register to be written via bus. W (data) when Write Enable is 1 • Clock input (CLK) – The CLK input is a factor ONLY during write operation – During read operation, behaves as a combinational logic block: • RA or RB valid => bus. A or bus. B valid after “access time. ”

Storage Element: Idealized Memory Write Enable Address • Memory (idealized) – One input bus: Data In Data. Out – One output bus: Data Out 32 32 Clk • Memory word is selected by: – Address selects the word to put on Data Out – Write Enable = 1: address selects the memory word to be written via the Data In bus • Clock input (CLK) – The CLK input is a factor ONLY during write operation – During read operation, behaves as a combinational logic block: • Address valid => Data Out valid after “access time. ”

Clocking Methodology Clk Setup Hold . . . Don’t Care . . . • • . . . All storage elements are clocked by the same clock edge Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew

Step 3 • Register Transfer Requirements –> Datapath Assembly • Instruction Fetch • Read Operands and Execute Operation

3 a: Overview of the Instruction Fetch Unit • The common RTL operations – Fetch the Instruction: mem[PC] – Update the program counter: • Sequential Code: PC <- PC + 4 • Branch and Jump: PC <- “something else” • We don’t know if instruction is a Branch/Jump or one of the other instructions until we have fetched and interpreted the instruction from memory. So all instructions initially increment the PC

Instruction address PC Instruction Add Sum Instruction memory a. Instruction memory b. Program counter c. Adder

Datapath for Instruction Fetch Add 4 PC Read address Instruction memory
![3 b Rformat instructions add sub and or slt Rrd Rrs op 3 b: R-format instructions: add, sub, and, or, slt • R[rd] <- R[rs] op](https://slidetodoc.com/presentation_image_h/d65d60c484ae0abdb1117063a9d2e827/image-28.jpg)
3 b: R-format instructions: add, sub, and, or, slt • R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt – Read register 1, Read register 2, and Write register come from instruction’s rs, rt, and rd fields – ALU control and Reg. Write: control logic after decoding the instruction 31 26 op 6 bits 5 Register numbers 5 5 Data 21 16 rs 5 bits Read register 1 rt 5 bits 11 rd 5 bits 6 shamt 5 bits 0 funct 6 bits 3 ALU control Read data 1 Read register 2 Registers Write register Read data 2 Write data Data Zero ALU result Reg. Write a. Registers b. ALU

Datapath for R-format instructions 3 Read register 1 Instruction Read register 2 Registers Write register Write data Read data 1 ALU operation Zero ALU result Read data 2 Reg. Write

Register-Register Timing Clk PC Old Value Clk-to-Q New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value Reg. Wr Old Value bus. A, B bus. W Instruction Memory Access Time New Value Delay through Control Logic New Value Register File Access Time New Value Old Value ALU Delay New Value Old Value Rd Rs Rt Reg. Wr 5 5 5 Register Write Occurs Here bus. A 32 bus. B 32 ALU bus. W 32 Clk Rw Ra Rb 32 32 -bit Registers ALUctr Result 32
![3 d Load Store Operations Rrt MemRrs Sign Extimm 3 d: Load & Store Operations • • R[rt] <- Mem[R[rs] + Sign. Ext[imm](https://slidetodoc.com/presentation_image_h/d65d60c484ae0abdb1117063a9d2e827/image-31.jpg)
3 d: Load & Store Operations • • R[rt] <- Mem[R[rs] + Sign. Ext[imm 16]] Mem[ R[rs] + Sign. Ext[imm 16] <- R[rt] ] 31 26 op 6 bits 21 rs 5 bits Example: lw rt, rs, imm 16 Example: sw rt, rs, imm 16 16 rt 5 bits 0 immediate 16 bits Mem. Write Address Write data Read data Data memory 16 Sign extend 32 Mem. Read a. Data memory unit b. Sign-extension unit

Datapath for lw & sw 3 Read register 1 Instruction ALU operation Mem. Write Read data 1 Read register 2 Registers Write register Read data 2 Write data Zero ALU result Address Write data Reg. Write 16 Sign extend 32 Read data Data memory Mem. Read

3 f: The Branch Instruction 31 26 op 6 bits • 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits beq rs, rt, imm 16 – mem[PC] Fetch the instruction from memory – Equal <- R[rs] == R[rt] Calculate the branch condition if (COND eq 0) Calculate the next instruction’s address PC <- PC + 4 + ( Sign. Ext(imm 16) x 4 ) else PC <- PC + 4

Datapath for branch instruction PC + 4 from instruction datapath Add Sum Branch target Shift left 2 Instruction 3 Read register 1 Read register 2 Registers Write register Write data Read data 1 ALU Zero Read data 2 Reg. Write 16 ALU operation Sign extend 32 To branch control logic

Using multiplexors to stitch together the datapath for memory access and R-format instructions Add 4 PC Read address Instruction memory Read register 1 Registers 3 Read register 2 Read data 1 Write register Read data 2 Mem. Write Memto. Reg ALUSrc Write data M u x Zero ALU result Address Write data Reg. Write 16 ALU operation Sign 32 extend Read data Data memory Mem. Read M u x

Putting it all together PCSrc M u x Add ALU result 4 Shift left 2 PC Read address Instruction memory Registers Read register 1 Read data 1 register 2 Write register Write data Reg. Write 16 ALUSrc Read data 2 Sign extend M u x 32 3 ALU operation Zero ALU result Mem. Write Memto. Reg Address Read data Data memory Write data Mem. Read M u x
![Putting it all together contd PCSrc Add 4 Reg Write Instruction 25 21 PC Putting it all together cont’d PCSrc Add 4 Reg. Write Instruction [25– 21] PC](https://slidetodoc.com/presentation_image_h/d65d60c484ae0abdb1117063a9d2e827/image-37.jpg)
Putting it all together cont’d PCSrc Add 4 Reg. Write Instruction [25– 21] PC Read address Instruction [31– 0] Instruction memory Instruction [20– 16] 1 M u Instruction [15– 11] x 0 Reg. Dst Instruction [15– 0] Read register 1 Read register 2 Read data 1 Read Write data 2 register Write Registers data 16 Sign 32 extend Shift left 2 ALU Add result 1 M u x 0 Mem. Write ALUSrc 1 M u x 0 ALU control Instruction [5– 0] ALUOp Zero ALU result Memto. Reg Address Read data Data Write memory data Mem. Read 1 M u x 0

Adding the control unit 0 M u x ALU Add result Add 4 Instruction [31 26] PC Instruction [25 21] Read address Instruction [15 11] Instruction [15 0] Shift left 2 Reg. Dst Branch Mem. Read Memto. Reg ALUOp Mem. Write ALUSrc Reg. Write PCSrc Read register 1 Instruction [20 16] Instruction [31– 0] Instruction memory Control 1 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data 16 Instruction [5 0] Sign extend 32 ALU control Zero ALU result Address Write data Read data Data memory 1 M u x 0

An Abstract View of the Critical Path Register file and ideal memory: – The CLK input is a factor ONLY during write operation – During read operation, behave as combinational logic: • Address valid => Output valid after “access time. ” Ideal Instruction Memory Rd Rs 5 5 Rt 5 Imm 16 A PC Clk Rw Ra Rb 32 32 -bit Registers 32 32 ALU 32 Clk Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32 -bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew Instruction Address Next Address • B 32 Data Address Data In Clk Ideal Data Memory

Step 4: Given Datapath: RTL -> Control Instruction<31: 0> Rd <0: 15> Rs <11: 15> Rt <16: 20> Op Fun <21: 25> Adr <21: 25> Inst Memory Imm 16 Control Branch Reg. Wr Reg. Dst ALUSrc ALUop Mem. Rd Mem. Wr Memto. Reg DATA PATH Zero

Control • Selecting the operations to perform (ALU, read/write, etc. ) Design the ALU Control Unit • Controlling the flow of data (multiplexor inputs) Design the Main Control Unit • Information comes from the 32 bits of the instruction • Example: add $8, $17, $18 000000 op • Instruction Format: 10001 10010 01000 00000 100000 rs rt rd shamt funct ALU's operation based on instruction type and function code

ALU Control • • • e. g. , what should the ALU do with this instruction Example: lw $1, 100($2) 35 2 1 op rs rt 16 bit offset ALU control input 000 001 010 111 • 100 AND OR add subtract set-on-less-than Why is the code for subtract 110 and not 011? ) (Recall design of ALU from Chapter 4. Bnegate input for adder set to 1 for subtraction

ALU Control Design Instruction opcode ALUOp Instruction operation Funct field Desired ALU action ALU control input LW 00 Load word xxxxxx Add 010 SW 00 Store word xxxxxx Add 010 BEQ 01 Branch eq xxxxxx Subtract 110 R-type 10 Add 100000 Add 010 R-type 10 Subtract 100010 Subtract 110 R-type 10 AND 100100 And 000 R-type 10 OR 1000101 Or 001 R-type 10 Set on less than 101010 Set on less than 111

Control • Must describe hardware to compute 3 -bit ALU control input – given instruction type ALUOp 00 = lw, sw computed from instruction type 01 = beq 10 = arithmetic – function code for arithmetic • Describe it using a truth table (can turn into gates):

Design the main control unit • Seven control signals Reg. Dst Reg. Write ALUSrc PCSrc Mem. Read Mem. Write Memto. Reg

Control Signals 1. Reg. Dst = 0 => Register destination number for the Write register comes from the rt field (bits 20 -16) Reg. Dst = 1 => Register destination number for the Write register comes from the rd field (bits 15 -11) 2. Reg. Write = 1 => The register on the Write register input is written with the data on the Write data input (at the next clock edge) 3. ALUSrc = 0 => The second ALU operand comes from Read data 2 ALUSrc = 1 => The second ALU operand comes from the signextension unit 4. PCSrc = 0 => The PC is replaced with PC+4 PCSrc = 1 => The PC is replaced with the branch target address 1. Memto. Reg = 0 => The value fed to the register write data input comes from the ALU Memto. Reg = 1 => The value fed to the register write data input comes from the data memory 6. Mem. Read = 1 => Read data memory 7. Mem. Write = 1 => Write data memory

R-format instructions Reg. Dst = 1 Reg. Write = 1 ALUSrc = 0 Branch = 0 Memto. Reg = 0 Mem. Read = 0 Mem. Write = 0 ALUOp = 10

Memory access instructions Load word Reg. Dst = 0 Reg. Write = 1 0 Store Word Reg. Dst = X Reg. Write = 0 ALUSrc = 1 Branch = 0 Memto. Reg = 1 Memto. Reg = X Mem. Read = 1 Mem. Read = 0 Mem. Write = 1 ALUOp = 00

Branch Equal Reg. Dst = X Reg. Write = 0 ALUSrc = 0 Branch = 1 Memto. Reg = X Mem. Read = 0 Mem. Write = 0 ALUOp = 01

Control

Step 5: Implementing Control • Simple combinational logic (truth tables) ALUOp Inputs Op 5 Op 4 Op 3 Op 2 Op 1 Op 0 ALU control block ALUOp 0 Outputs ALUOp 1 R-format F 3 F 2 F (5– 0) Operation 2 Operation 1 F 1 Operation 0 F 0 Iw sw beq Reg. Dst ALUSrc Operation Memto. Reg. Write Mem. Read Mem. Write Branch ALUOp 1 ALU Control Unit ALUOp. O Main Control Unit

Our Simple Control Structure • All of the logic is combinational • We wait for everything to settle down, and the right thing to be done – ALU might not produce “right answer” right away – we use write signals along with clock to determine when to write • Cycle time determined by length of the longest path State ele ment 1 Clock cycle Co m binatio nal lo gic State element 2

An Abstract View of the Critical Path Register file and ideal memory: – The CLK input is a factor ONLY during write operation – During read operation, behave as combinational logic: • Address valid => Output valid after “access time. ” Ideal Instruction Memory Rd Rs 5 5 Rt 5 Imm 16 A PC Clk Rw Ra Rb 32 32 -bit Registers 32 32 ALU 32 Clk Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32 -bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew Instruction Address Next Address • B 32 Data Address Data In Clk Ideal Data Memory

Single Cycle Implementation • Calculate cycle time assuming negligible delays except: – memory (2 ns), ALU and adders (2 ns), register file access (1 ns) PCSrc Add 4 Reg. Write Instruction [25– 21] PC Read address Instruction [31– 0] Instruction memory Instruction [20– 16] 1 M u Instruction [15– 11] x 0 Reg. Dst Instruction [15– 0] Read register 1 Read register 2 Read data 1 Read data 2 Write register Write data Registers 16 Sign 32 extend Shift left 2 ALU Add result 1 M u x 0 Mem. Write ALUSrc 1 M u x 0 ALU control Instruction [5– 0] ALUOp Zero ALU result Memto. Reg Address Read data Data Write data memory Mem. Read 1 M u x 0

A Real MIPS Datapath (CNS T 0)

Summary • 5 steps to design a processor – – 1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. – 5. Assemble the control logic • MIPS makes it easier – – Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates • Single cycle datapath => CPI=1, Clock Cycle Time => long