CPU Organization Design Datapath Design Components their connections

  • Slides: 57
Download presentation
CPU Organization (Design) • Datapath Design: Components & their connections needed by ISA instructions

CPU Organization (Design) • Datapath Design: Components & their connections needed by ISA instructions – Capabilities & performance characteristics of principal Functional Units (FUs) needed by ISA instructions – (e. g. , Registers, ALU, Shifters, Logic Units, . . . ) – Ways in which these components are interconnected (buses connections, multiplexors, etc. ). – How information flows between components. • Control Unit Design: Control/sequencing of operations of datapath components to realize ISA instructions – Logic and means by which such information flow is controlled. – Control and coordination of FUs operation to realize the targeted Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram). • Hardware description with a suitable language, possibly using Register Transfer Notation (RTN). Chapter 5. 1 -5. 4 EECC 550 - Shaaban #1 Lec # 4 Winter 2005 12 -13 -2005

Major CPU Design Steps 1 Analyze instruction set to get datapath requirements: – Using

Major CPU Design Steps 1 Analyze instruction set to get datapath requirements: – Using independent RTN, write the micro-operations required for target ISA instructions. • This provides the required datapath components and how they are connected. 2 Select set of datapath components and establish clocking methodology (defines when storage or state elements can read and when they can be written, e. g clock edge-triggered) 3 Assemble datapath meeting the requirements. 4 Identify and define the function of all control points or signals needed by the datapath. – Analyze implementation of each instruction to determine setting of control points that affects its operations. 5 Control unit design, based on micro-operation timing and control signals identified: – Combinational logic: For single cycle CPU. e. g Any instruction completed in one cycle – Hard-Wired: Finite-state machine implementation. – Microprogrammed. EECC 550 - Shaaban #2 Lec # 4 Winter 2005 12 -13 -2005

CPU Design & Implantation Process • Top-down Design: – Specify component behavior from high-level

CPU Design & Implantation Process • Top-down Design: – Specify component behavior from high-level requirements (ISA). • Bottom-up Design: – Assemble components in target technology to establish critical timing (hardware delays, critical path timing). • Iterative refinement: – Establish a partial solution, expand improve. Instruction Set Architecture (ISA): Provides Requirements Reg. File Mux Processor Datapath ALU Target VLSI implementation Technology Reg Cells Control Mem Decoder Sequencer Gates EECC 550 - Shaaban #3 Lec # 4 Winter 2005 12 -13 -2005

Datapath Design Steps • Write the micro-operation sequences required for a number of representative

Datapath Design Steps • Write the micro-operation sequences required for a number of representative target ISA instructions using independent RTN. • Independent RTN statements specify: the required datapath components and how they are connected. • From the above, create an initial datapath by determining possible destinations for each data source (i. e registers, ALU). – This establishes connectivity requirements (data paths, or connections) for datapath components. – Whenever multiple sources are connected to a single input, a multiplexor of appropriate size is added. (or destination) • Find the worst-time propagation delay in the datapath to determine the datapath clock cycle (CPU clock cycle). • Complete the micro-operation sequences for all remaining instructions adding datapath components + connections/multiplexors as needed. EECC 550 - Shaaban #4 Lec # 4 Winter 2005 12 -13 -2005

MIPS Instruction Formats 31 R-Type 26 op 6 bits [31: 26] I-Type: ALU 31

MIPS Instruction Formats 31 R-Type 26 op 6 bits [31: 26] I-Type: ALU 31 [31: 26] 31 J-Type: Jumps [25: 21] 16 rt 5 bits [20: 16] 21 rs 5 bits [25: 21] 11 rd 5 bits 6 shamt 5 bits [15: 11] [10: 6] 16 op 6 bits 0 funct 6 bits [5: 0] 0 immediate rt 5 bits 16 bits [20: 16] [15: 0] 26 [31: 26] • • • rs 5 bits 26 op 6 bits Load/Store, Branch 21 0 target address 26 bits [25: 0] op: Opcode, operation of the instruction. rs, rt, rd: The source and destination register specifiers. shamt: Shift amount. funct: Selects the variant of the operation in the “op” field. address / immediate: Address offset or immediate value. target address: Target address of the jump instruction. EECC 550 - Shaaban #5 Lec # 4 Winter 2005 12 -13 -2005

MIPS R-Type (ALU) Instruction Fields R-Type: All ALU instructions that use three registers 1

MIPS R-Type (ALU) Instruction Fields R-Type: All ALU instructions that use three registers 1 st operand OP rs 6 bits 5 bits [31: 26] • • • 2 nd operand [25: 21] rt 5 bits [20: 16] Destination rd shamt funct 5 bits 6 bits [15: 11] [10: 6] [5: 0] op: Opcode, basic operation of the instruction. Rs, rt , rd – For R-Type op = 0 are register specifier fields rs: The first register source operand. Independent RTN: rt: The second register source operand. R[rd] ¬ R[rs] funct R[rt] rd: The register destination operand. PC ¬ PC + 4 shamt: Shift amount used in constant shift operations. funct: Function, selects the specific variant of operation in the op field. Operand register in rs Destination register in rd Examples: add $1, $2, $3 sub $1, $2, $3 R-Type = Register Type Register Addressing used (Mode 1) Operand register in rt and $1, $2, $3 or $1, $2, $3 EECC 550 - Shaaban #6 Lec # 4 Winter 2005 12 -13 -2005

MIPS ALU I-Type Instruction Fields I-Type ALU instructions that use two registers and an

MIPS ALU I-Type Instruction Fields I-Type ALU instructions that use two registers and an immediate value Loads/stores, conditional branches. 1 st operand • • Destination 2 nd operand OP rs rt 6 bits 5 bits 16 bits [31: 26] [25: 21] [20: 16] [15: 0] immediate op: Opcode, operation of the instruction. Independent RTN for addi: rs: The register source operand. R[rt] ¬ R[rs] + immediate PC ¬ PC + 4 rt: The result destination register. immediate: Constant second operand for ALU instruction. Result register in rt Examples: Source operand register in rs add immediate: addi $1, $2, 100 and immediate andi $1, $2, 10 I-Type = Immediate Type Immediate Addressing used (Mode 2) Constant operand in immediate EECC 550 - Shaaban #7 Lec # 4 Winter 2005 12 -13 -2005

MIPS Load/Store I-Type Instruction Fields Base • • Src. /Dest. OP rs rt 6

MIPS Load/Store I-Type Instruction Fields Base • • Src. /Dest. OP rs rt 6 bits 5 bits 16 bits [31: 26] [25: 21] [20: 16] [15: 0] address (e. g. offset) Signed address offset in bytes op: Opcode, operation of the instruction. – For load word op = 35, for store word op = 43. rs: The register containing memory base address. rt: For loads, the destination register. For stores, the source register of value to be stored. address: 16 -bit memory address offset in bytes added to base register. Examples: source register in rt Offset Store word: sw $3, 500($4) Load word: lw $1, 32($2) Destination register in rt Base or Displacement Addressing used (Mode 3) Offset base register in rs Mem[R[rs] + address] ¬ R[rt] PC ¬ PC + 4 R[rt] ¬ Mem[R[rs] + address] PC ¬ PC + 4 base register in rs EECC 550 - Shaaban #8 Lec # 4 Winter 2005 12 -13 -2005

MIPS Branch I-Type Instruction Fields OP rs rt 6 bits 5 bits [31: 26]

MIPS Branch I-Type Instruction Fields OP rs rt 6 bits 5 bits [31: 26] • • [25: 21] address (e. g. offset) 16 bits [20: 16] [15: 0] Signed address offset in words op: Opcode, operation of the instruction. rs: The first register being compared rt: The second register being compared. address: 16 -bit memory address branch target offset in words added to PC to form branch address. Register in rt Register in rs Examples: Branch on equal beq $1, $2, 100 Branch on not equal bne $1, $2, 100 offset in bytes equal to instruction address field x 4 Added to PC+4 to form branch target Independent RTN for beq: R[rs] = R[rt] : R[rs] ¹ R[rt] : PC ¬ PC + 4 + address x 4 PC ¬ PC + 4 PC-Relative Addressing used (Mode 4) EECC 550 - Shaaban #9 Lec # 4 Winter 2005 12 -13 -2005

MIPS J-Type Instruction Fields J-Type: Include jump j, jump and link jal OP jump

MIPS J-Type Instruction Fields J-Type: Include jump j, jump and link jal OP jump target 6 bits 26 bits [31: 26] • • op: Opcode, operation of the instruction. – Jump j op = 2 – Jump and link jal op = 3 jump target: jump memory address in words. Examples: Jump target in words [25: 0] Jump memory address in bytes equal to instruction field jump target x 4 Jump j 10000 Jump and link jal 10000 Effective 32 -bit jump address: PC(31 -28) From PC+4 4 bits PC(31 -28), jump_target, 00 jump target = 2500 26 bits 0 0 2 bits Independent RTN for j: PC ¬ PC + 4 PC ¬ PC(31 -28), jump_target, 00 J-Type = Jump Type Pseudodirect Addressing used (Mode 5) EECC 550 - Shaaban #10 Lec # 4 Winter 2005 12 -13 -2005

A Subset of MIPS Instructions ADD and SUB: add rd, rs, rt sub rd,

A Subset of MIPS Instructions ADD and SUB: add rd, rs, rt sub rd, rs, rt OR Immediate: ori rt, rs, imm 16 31 31 26 11 6 0 rs 5 bits rt 5 bits rd 5 bits shamt 5 bits funct 6 bits [31: 26] [25: 21] [20: 16] [15: 11] [10: 6] [5: 0] 26 21 16 0 op 6 bits rs 5 bits rt 5 bits immediate 16 bits [31: 26] [25: 21] [20: 16] [15: 0] 6 bits [31: 26] 31 16 op 6 bits LOAD and STORE Word lw rt, rs, imm 16 31 26 sw rt, rs, imm 16 op BRANCH: beq rs, rt, imm 16 21 26 op 6 bits [31: 26] 21 rs 5 bits [25: 21] 16 rt 5 bits [20: 16] 0 immediate 16 bits [15: 0] EECC 550 - Shaaban #11 Lec # 4 Winter 2005 12 -13 -2005

Basic MIPS Instruction Processing Steps Instruction Memory Instruction Fetch Next Obtain instruction from program

Basic MIPS Instruction Processing Steps Instruction Memory Instruction Fetch Next Obtain instruction from program storage Instruction ¬ Mem[PC] Update program counter to address Instruction of next instruction Instruction Determine instruction type PC ¬ PC + 4 Decode Obtain operands from registers Execute Compute result value or status } Common steps for all instructions Done by Control Unit Result Store result in register/memory if needed Store (usually called Write Back). EECC 550 - Shaaban #12 Lec # 4 Winter 2005 12 -13 -2005

Overview of MIPS Instruction Micro-operations • All instructions go through these common steps: –

Overview of MIPS Instruction Micro-operations • All instructions go through these common steps: – Send program counter to instruction memory and fetch the instruction. (fetch) Instruction ¬ Mem[PC] – Update the program counter to point to next instruction PC ¬ PC + 4 – Read one or two registers, using instruction fields. (decode) • Load reads one register only. • Additional instruction execution actions (execution) depend on the instruction in question, but similarities exist: – All instruction classes use the ALU after reading the registers: • Memory reference instructions use it for address calculation. • Arithmetic and logic instructions (R-Type), use it for the specified operation. • Branches use it for comparison. • Additional execution steps where instruction classes differ: – Memory reference instructions: Access memory for a load or store. – Arithmetic and logic instructions: Write ALU result back in register. – Branch instructions: Change next instruction address based on comparison. EECC 550 - Shaaban #13 Lec # 4 Winter 2005 12 -13 -2005

A Single Cycle MIPS CPU Design target: A single-cycle per instruction MIPS CPU design

A Single Cycle MIPS CPU Design target: A single-cycle per instruction MIPS CPU design All micro-operations of an instruction are to be carried out in a single CPU clock cycle. Cycles Per Instruction = CPI = 1 CPU Performance Equation: T = I x CPI x C CPI = 1 Figure 5. 1 page 287 Abstract view of single cycle MIPS CPU showing major functional units (components) and major connections between them EECC 550 - Shaaban #14 Lec # 4 Winter 2005 12 -13 -2005

R-Type Example: Micro-Operation Sequence For ADD add rd, rs, rt OP 6 bits [31:

R-Type Example: Micro-Operation Sequence For ADD add rd, rs, rt OP 6 bits [31: 26] Instruction Word ¬ rs 5 bits [25: 21] rt rd shamt funct 5 bits 6 bits [20: 16] Mem[PC] PC ¬ PC + 4 R[rd] ¬ R[rs] + R[rt] [15: 11] [10: 6] [5: 0] Fetch the instruction Increment PC Program Memory Common Steps Add register rs to register rt result in register rd Independent RTN ? EECC 550 - Shaaban #15 Lec # 4 Winter 2005 12 -13 -2005

Initial Datapath Components Instruction ¬ Mem[PC] Three components needed by: Instruction Fetch: Program Counter

Initial Datapath Components Instruction ¬ Mem[PC] Three components needed by: Instruction Fetch: Program Counter Update: PC ¬ PC + 4 32 32 32 Instruction Word 32 Two state elements (memory) needed to store and access instructions: 1 Instruction memory: • Only read access (by user code). No read control signal needed. 2 Program counter (PC): 32 -bit register. • Written at end of every clock cycle (edge-triggered) : No write control signal. 3 32 -bit Adder: To compute the next instruction address (PC + 4). Basics of logic design/logic building blocks review in Appendix B (Book CD) EECC 550 - Shaaban #16 Lec # 4 Winter 2005 12 -13 -2005

More Datapath Components ISA Register File Main 32 -bit ALU 4 32 32 32

More Datapath Components ISA Register File Main 32 -bit ALU 4 32 32 32 -bit Arithmetic and Logic Unit (ALU) Register File: • Contains all ISA registers. • Two read ports and one write port. • Register writes by asserting write control signal • Clocking Methodology: Writes are edge-triggered. • Thus can read and write to the same register in the same clock cycle. Basics of logic design/logic building blocks review in Appendix B (Book CD) EECC 550 - Shaaban #17 Lec # 4 Winter 2005 12 -13 -2005

Register File Details RW RA RB Write Enable 5 5 5 • Register File

Register File Details RW RA RB Write Enable 5 5 5 • Register File consists of 32 registers: bus. A – Two 32 -bit output busses: bus. W 32 32 32 -bit bus. A and bus. B 32 Registers bus. B Clk – One 32 -bit input bus: bus. W 32 • Register is selected by: – RA (number) selects the register to put on bus. A (data): bus. A = R[RA] – RB (number) selects the register to put on bus. B (data): bus. B = R[RB] – RW (number) selects the register to be written via bus. W (data) when Write Enable is 1 Write Enable: R[RW] ¬ bus. W • Clock input (CLK) – The CLK input is a factor ONLY during write operations. – During read operation, it behaves as a combinational logic block: • RA or RB valid => bus. A or bus. B valid after “access time. ” EECC 550 - Shaaban #18 Lec # 4 Winter 2005 12 -13 -2005

A Possible Register File Implementation Register Write Enable (Reg. Write) . Write Register RW

A Possible Register File Implementation Register Write Enable (Reg. Write) . Write Register RW 5 0 1 5 -to-32 Decoder Write 30 31 . . Write Data In Write . 32 Register 0 Data In . . Each Register contains 32 edge triggered D-Flip Flops Register 1 . . Data In Data Out 32 . . . 0 1 32 32 -to-1 MUX Register Read Data 1 (Bus A) 30 31 5 Read Register 1 (RA) Register 30 Data Out Register 31 Data Out Register Write Data (Bus W) Clock input to registers not shown in diagram . . 32 . . . Data In Write Data Out RW RA RB Write Enable 5 5 5 bus. A bus. W 32 32 32 -bit 32 Registers bus. B Clk 32 Also see Appendix B (Book CD) - The Basics of Logic Design 32 32 . . . 0 1 32 -to-1 MUX 30 31 32 Register Read Data 2 (Bus B) 5 Read Register 2 (RB) EECC 550 - Shaaban #19 Lec # 4 Winter 2005 12 -13 -2005

Idealized Memory Write Enable Address Data In Data. Out • Memory (idealized) 32 32

Idealized Memory Write Enable Address Data In Data. Out • Memory (idealized) 32 32 – One input bus: Data In. Clk – One output bus: Data Out. • Memory word is selected by: – Address selects the word to put on Data Out bus. – Write Enable = 1: address selects the memory word to be written via the Data In bus. • Clock input (CLK): – The CLK input is a factor ONLY during write operation, – During read operation, this memory behaves as a combinational logic block: • Address valid => Data Out valid after “access time. ” • Ideal Memory = Short access time. EECC 550 - Shaaban #20 Lec # 4 Winter 2005 12 -13 -2005

Clocking Methodology Used: Edge Triggered Writes Clk Setup Hold . . . Don’t Care

Clocking Methodology Used: Edge Triggered Writes Clk Setup Hold . . . Don’t Care . . . Critical Path • • All storage element (e. g Flip-Flops, Registers, Data Memory) writes are triggered by the same clock edge. Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew Here writes are triggered on the rising edge of the clock EECC 550 - Shaaban #21 Lec # 4 Winter 2005 12 -13 -2005

Building The Datapath PC ¬ PC + 4 Instruction Fetch & PC Update: 32

Building The Datapath PC ¬ PC + 4 Instruction Fetch & PC Update: 32 32 Instruction ¬ Mem[PC] 32 Portion of the datapath used for fetching instructions and incrementing the program counter. PC write or update is edge triggered at the end of the cycle 32 EECC 550 - Shaaban #22 Lec # 4 Winter 2005 12 -13 -2005

Simplified Datapath For MIPS R-Type Instructions From Instruction Memory [25: 21] rs [20: 16]

Simplified Datapath For MIPS R-Type Instructions From Instruction Memory [25: 21] rs [20: 16] rt 4 R[rs] 32 [15: 11] rd R[rt] 32 32 32 Components and connections as specified by RTN statement R[rd] ¬ R[rs] + R[rt] Destination register R[rd] write or update is edge triggered at the end of the cycle EECC 550 - Shaaban #23 Lec # 4 Winter 2005 12 -13 -2005

More Detailed Datapath For R-Type Instructions With Control Points Identified Rd Rs Reg. Wr

More Detailed Datapath For R-Type Instructions With Control Points Identified Rd Rs Reg. Wr 5 5 Rw 32 Clk Ra Rb 32 32 -bit Registers ALUctr 5 bus. A R[rs] 32 bus. B R[rt] ALU bus. W Rt Result 32 32 R[rd] ¬ R[rs] + R[rt] EECC 550 - Shaaban #24 Lec # 4 Winter 2005 12 -13 -2005

R-Type Register-Register Timing PC+4 Clk Old Value Rs, Rt, Rd, Op, Func PC Clk-to-Q

R-Type Register-Register Timing PC+4 Clk Old Value Rs, Rt, Rd, Op, Func PC Clk-to-Q New Value Old Value ALUctr Old Value Reg. Wr Old Value bus. A, B Old Value bus. W Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value Register File Access Time New Value ALU Delay New Value Rd Rs Rt Reg. Wr 5 5 5 bus. A 32 bus. B 32 ALU bus. W 32 Clk Rw Ra Rb 32 32 -bit Registers Register Write Occurs Here ALUctr Result 32 All register writes occur on falling edge of clock (clocking methodology) EECC 550 - Shaaban #25 Lec # 4 Winter 2005 12 -13 -2005

Logical Operations with Immediate Example: Micro-Operation Sequence For ORI ori rt, rs, imm 16

Logical Operations with Immediate Example: Micro-Operation Sequence For ORI ori rt, rs, imm 16 31 26 21 16 op 6 bits rs 5 bits rt 5 bits [31: 26] [25: 21] [20: 16] Instruction Word ¬ Mem[PC] 0 immediate 16 bits [15: 0] Fetch the instruction PC ¬ PC + 4 Increment PC R[rt] ¬ R[rs] OR Zero. Ext[imm 16] OR register rs with immediate field zero extended to 32 bits, result in register rt EECC 550 - Shaaban #26 Lec # 4 Winter 2005 12 -13 -2005

Datapath For Logical Instructions With Immediate Rd Reg. Dst 1 Rt Mux Reg. Wr

Datapath For Logical Instructions With Immediate Rd Reg. Dst 1 Rt Mux Reg. Wr 5 Rw bus. W 2 x 1 MUX (width 5 bits) 0 Rs Rt 5 5 ALUctr bus. A Ra Rb ALU 32 32 32 -bit Registers 32 Clk R[rs] R[rt] bus. B 16 Zero. Ext imm 16 Mux 32 Result 32 2 x 1 MUX (width 32 bits) 32 ALUSrc R[rt] ¬ R[rs] OR Zero. Ext[imm 16] EECC 550 - Shaaban #27 Lec # 4 Winter 2005 12 -13 -2005

Load Operations Example: Micro-Operation Sequence For LW lw rt, rs, imm 16 31 26

Load Operations Example: Micro-Operation Sequence For LW lw rt, rs, imm 16 31 26 op 6 bits rs 5 bits [31: 26] [25: 21] Instruction Word ¬ PC + 4 21 16 rt 5 bits [20: 16] Mem[PC] Instruction Memory R[rt] ¬ Mem[R[rs] + Sign. Ext[imm 16]] Effective Address Data Memory 0 immediate 16 bits Address offset in bytes [15: 0] Fetch the instruction Increment PC Immediate field sign extended to 32 bits and added to register rs to form memory load address, write word at load effective address to register rt EECC 550 - Shaaban #28 Lec # 4 Winter 2005 12 -13 -2005

Additional Datapath Components For Loads & Stores 32 32 32 For Inputs: for address

Additional Datapath Components For Loads & Stores 32 32 32 For Inputs: for address and write (store) data Output for read (load) data Data memory write or update is edge triggered at the end of the cycle (clocking methodology) Sign. Ext[imm 16] 16 -bit input sign-extended into a 32 -bit value at the output EECC 550 - Shaaban #29 Lec # 4 Winter 2005 12 -13 -2005

Datapath For Loads Rd Reg. Dst 1 Mux 0 Reg. Wr 5 32 Clk

Datapath For Loads Rd Reg. Dst 1 Mux 0 Reg. Wr 5 32 Clk Rs 5 5 ALUctr Base Address register Rw Ra Rb 32 32 -bit Registers 32 bus. B R[rt] 0 1 32 ALUSrc 0 32 Mem. Wr Data In 32 Clk Mux 16 Extender imm 16 Offset Effective Address Mux 32 Memto. Reg R[rs] bus. A ALU bus. W Rt Wr. En Adr Data Memory 32 1 Ext. Op R[rt] ¬ Mem[R[rs] + Sign. Ext[imm 16]] EECC 550 - Shaaban #30 Lec # 4 Winter 2005 12 -13 -2005

Store Operations Example: Micro-Operation Sequence For SW sw rt, rs, imm 16 31 26

Store Operations Example: Micro-Operation Sequence For SW sw rt, rs, imm 16 31 26 op 6 bits Instruction Word ¬ 21 rs 5 bits 16 rt 5 bits Mem[PC] 0 immediate 16 bits Address offset in bytes Fetch the instruction PC ¬ PC + 4 Increment PC Mem[R[rs] + Sign. Ext[imm 16]] ¬ R[rt] Immediate field sign extended to 32 bits and added to register rs to form memory store effective address, register rt written to memory at store effective address. Effective Address EECC 550 - Shaaban #31 Lec # 4 Winter 2005 12 -13 -2005

Datapath For Stores Rd Reg. Dst 1 Mux Reg. Wr 5 32 Clk ALUctr

Datapath For Stores Rd Reg. Dst 1 Mux Reg. Wr 5 32 Clk ALUctr 5 Rs 5 Rt Rw Ra Rb 32 32 -bit Registers Base Address register 32 bus. B R[rt] 32 Ext. Op 1 0 32 R[rt] Data In 32 Clk Wr. En Adr Data Memory Mux Offset Effective Address 0 Mux Extender 16 R[rs] bus. A 32 imm 16 Memto. Reg Mem. Wr 0 ALU bus. W Rt 32 1 ALUSrc Mem[R[rs] + Sign. Ext[imm 16]] ¬ R[rt] EECC 550 - Shaaban #32 Lec # 4 Winter 2005 12 -13 -2005

Conditional Branch Example: Micro-Operation Sequence For BEQ beq rs, rt, imm 16 31 26

Conditional Branch Example: Micro-Operation Sequence For BEQ beq rs, rt, imm 16 31 26 21 op 6 bits rs 5 bits [31: 26] Instruction Word ¬ [25: 21] 16 rt 5 bits [20: 16] Mem[PC] PC ¬ PC + 4 immediate 16 bits [15: 0] PC Offset in words Fetch the instruction Increment PC Zero ¬ R[rs] - R[rt] Condition 0 Calculate the branch condition R[rs] == R[rt] Action (i. e Zero : PC ¬ PC + ( Sign. Ext(imm 16) x 4 ) Branch Target “Zero” is zero flag of main ALU R[rs] - R[rt] = 0 ) Calculate the next instruction’s PC address EECC 550 - Shaaban #33 Lec # 4 Winter 2005 12 -13 -2005

Datapath For Branch Instructions Main ALU evaluates branch condition New adder to compute branch

Datapath For Branch Instructions Main ALU evaluates branch condition New adder to compute branch target: • Sum of incremented PC and the sign-extended lower 16 -bits on the instruction. New 32 -bit Adder (Third ALU) for Branch Target ( Sign. Ext(imm 16) x 4 PC + 4 + ( Sign. Ext(imm 16) x 4 [25: 21] rs [20: 16] rt R[rs] R[rt] [15: 0] imm 16 Sign. Ext(imm 16) Zero flag =1 if R[rs] - R[rt] = 0 (i. e R[rs] = R[rt]) Main ALU Evaluates Branch Condition (subtract) EECC 550 - Shaaban #34 Lec # 4 Winter 2005 12 -13 -2005

More Detailed Datapath For Branch Operations Zero Instruction Address 32 PCSrc 0 Mux Adder

More Detailed Datapath For Branch Operations Zero Instruction Address 32 PCSrc 0 Mux Adder PC Ext Sign extend shift left 2 bus. W PC Adder 32 PC+4 00 4 imm 16 Reg. Wr 5 Clk 5 Branch Target ALU 5 Rt Rw Ra Rb 32 32 -bit Registers PC 1 Branch Target Rs bus. A 32 bus. B 32 Equal? Branch Zero Main ALU (subtract) Clk New 2 X 1 32 -bit MUX to select next PC value EECC 550 - Shaaban #35 Lec # 4 Winter 2005 12 -13 -2005

Combining The Datapaths For Memory Instructions and R-Type Instructions 4 [25: 21] rs R[rs]

Combining The Datapaths For Memory Instructions and R-Type Instructions 4 [25: 21] rs R[rs] [20: 16] rt R[rt] 0 1 rt/rd MUX not shown [15: 0] imm 16 1 R[rt] 0 Sign. Ext(imm 16) Highlighted muliplexors and connections added to combine the datapaths of memory and R-Type instructions into one datapath (This is book version ORI not supported) EECC 550 - Shaaban #36 Lec # 4 Winter 2005 12 -13 -2005

Instruction Fetch Datapath Added to ALU R-Type and Memory Instructions Datapath PC+ 4 PC

Instruction Fetch Datapath Added to ALU R-Type and Memory Instructions Datapath PC+ 4 PC rs rt 4 R[rs] R[rt] 0 1 1 0 rt/rd MUX not shown (This is book version ORI not supported, no zero extend of immediate needed) EECC 550 - Shaaban #37 Lec # 4 Winter 2005 12 -13 -2005

A Simple Datapath For The MIPS Architecture Datapath of branches and a program counter

A Simple Datapath For The MIPS Architecture Datapath of branches and a program counter multiplexor are added. Resulting datapath can execute in a single cycle the basic MIPS instruction: - load/store word - ALU operations - Branches PC +4 Branch Target 4 0 1 1 0 rt/rd MUX not shown (This is book version ORI not supported, no zero extend of immediate needed) Figure 5. 11 page 300 EECC 550 - Shaaban #38 Lec # 4 Winter 2005 12 -13 -2005

Main ALU Control • The main ALU has four control lines (detailed design in

Main ALU Control • The main ALU has four control lines (detailed design in Appendix B) with the following functions: ALU Control Lines 0000 0001 0010 0111 1100 ALU Function AND OR add subtract Set-on-less-than NOR • For our current subset of MIPS instructions only the top five functions will be used (thus only three control lines will be used) • For R-type instruction the ALU function depends on both the opcode and the 6 -bit “funct” function field • For other instructions the ALU function depends on the opcode only. • A local ALU control unit can be designed to accept 2 -bit ALUop control lines (from main control unit) and the 6 -bit function field and generate the correct 4 -bit ALU control lines. EECC 550 - Shaaban #39 Lec # 4 Winter 2005 12 -13 -2005

Local ALU Decoding of “func” Field op 6 Main Control func 6 ALUop Instruction

Local ALU Decoding of “func” Field op 6 Main Control func 6 ALUop Instruction Opcode Instruction Operation LW SW Branch Equal R-Type R-Type Load word Store word branch equal add subtract AND OR set on less than ALUctr 4 ALU 2 ALU Control (Local) Desired ALUOp Funct Field ALU Action 00 00 01 10 10 10 XXXXXX 100000 100010 100101 101010 add subtract and or set on less than ALU Control Lines 0010 0110 0001 0111 EECC 550 - Shaaban #40 Lec # 4 Winter 2005 12 -13 -2005

Local ALU Control Unit (2 lines From main control unit) Function Field 3 ALU

Local ALU Control Unit (2 lines From main control unit) Function Field 3 ALU Control Lines More details found in Appendix C (Book CD) EECC 550 - Shaaban #41 Lec # 4 Winter 2005 12 -13 -2005

Single Cycle MIPS Datapath Necessary multiplexors and control lines are identified here and local

Single Cycle MIPS Datapath Necessary multiplexors and control lines are identified here and local ALU control added: Function Field (2 bits) (This is book version ORI not supported, no zero extend of immediate needed) Figure 5. 15 page 305 EECC 550 - Shaaban #42 Lec # 4 Winter 2005 12 -13 -2005

Putting It All Together: A Single Cycle Datapath PCSrc Branch Zero PC+4 ALUop (2

Putting It All Together: A Single Cycle Datapath PCSrc Branch Zero PC+4 ALUop (2 -bits) Imm 16 Zero Rd Rt 0 1 Function Field Reg. Wr 5 0 Clk imm 16 16 (Includes ORI not in book version) Extender Branch Target 32 1 Mem. Wr Memto. Reg Main ALU = 32 Data In 32 Clk 32 Wr. En Adr 0 Mux 1 bus. W Mux PC Mux Adder Rs Rt 5 5 R[rs] bus. A Rw Ra Rb 32 32 32 -bit R[rt] Registers bus. B 0 32 ALU Control ALU Adder PC Ext imm 16 Rd Reg. Dst 00 4 Rt Instruction<31: 0> <0: 15> Rs <11: 15> Adr <16: 20> <21: 25> Inst Memory 1 Data Memory Ext. Op ALUSrc EECC 550 - Shaaban #43 Lec # 4 Winter 2005 12 -13 -2005

Instruction<31: 0> Rd <0: 25> Rs <0: 15> Rt <11: 15> Op Fun <16:

Instruction<31: 0> Rd <0: 25> Rs <0: 15> Rt <11: 15> Op Fun <16: 20> Adr <21: 25> Instruction Memory Imm 16 Jump_target Control Unit Control Lines Reg. Dst ALUSrc Memto. Reg. Write Mem Read Mem Write Branch ALOp (2 -bits) DATA PATH EECC 550 - Shaaban #44 Lec # 4 Winter 2005 12 -13 -2005

The Effect of The Control Signals Signal Name Effect when deasserted (=0) Effect when

The Effect of The Control Signals Signal Name Effect when deasserted (=0) Effect when asserted (=1) Reg. Dst The register destination number for the write register comes from the rt field (instruction bits 20: 16). The register destination number for the write register comes from the rd field (instruction bits 15: 11). Reg. Write None The register on the write register input is written with the value on the Write data input. ALUSrc The second main ALU operand comes from the second register file output (Read data 2) R[rt] The second main ALU operand is the sign-extended lower 16 bits on the instruction (imm 16) PCSrc The PC is replaced by the output of the adder that computes PC + 4 The PC is replaced by the output of the adder that computes the branch target. Mem. Read None Data memory contents designated by the address input are put on the Read data output. Mem. Write None Memto. Reg The value fed to the register write data input comes from the main ALU. Data memory contents designated by the address input are replaced by the value on the Write data input. The value fed to the register write data input comes from data memory. EECC 550 - Shaaban #45 Lec # 4 Winter 2005 12 -13 -2005

Control Line Settings Instruction R-Format Reg. Dst ALUSrc Memto- Reg Mem Reg Write Read

Control Line Settings Instruction R-Format Reg. Dst ALUSrc Memto- Reg Mem Reg Write Read Mem Branch ALUOp 1 ALUOp 0 Write 1 0 0 0 1 0 lw 0 1 1 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 0 0 1 Figure 5. 18 page 308 EECC 550 - Shaaban #46 Lec # 4 Winter 2005 12 -13 -2005

The Truth Table For The Main Control (Opcode) EECC 550 - Shaaban #47 Lec

The Truth Table For The Main Control (Opcode) EECC 550 - Shaaban #47 Lec # 4 Winter 2005 12 -13 -2005

PLA Implementation of the Main Control Figure C. 2. 5 (Appendix C) PLA =

PLA Implementation of the Main Control Figure C. 2. 5 (Appendix C) PLA = Programmable Logic Array (Appendix B) EECC 550 - Shaaban #48 Lec # 4 Winter 2005 12 -13 -2005

Adding Support For Jump: Micro-Operation Sequence For Jump: J j jump_target OP Jump_target 6

Adding Support For Jump: Micro-Operation Sequence For Jump: J j jump_target OP Jump_target 6 bits 26 bits [31: 26] Jump address in words [25: 0] Instruction Word ¬ Mem[PC] Fetch the instruction PC ¬ PC + 4 Increment PC PC ¬ PC(31 -28), jump_target, 00 Update PC with jump address PC(31 -28) Jump Address jump target = 2500 4 bits 4 highest bits from PC + 4 26 bits 0 0 2 bits EECC 550 - Shaaban #49 Lec # 4 Winter 2005 12 -13 -2005

Datapath For Jump Branch Zero Next Instruction Address 32 4 PCSrc PC+4 32 00

Datapath For Jump Branch Zero Next Instruction Address 32 4 PCSrc PC+4 32 00 Adder Branch Target PC+4(31 -28) Instruction(25 -0) jump_target 26 Shift left 2 28 0 PC 4 32 Mux Adder imm 16 PC Ext Instruction(15 -0) JUMP PC 1 Jump Address Clk PC(31 -28), jump_target, 00 32 EECC 550 - Shaaban #50 Lec # 4 Winter 2005 12 -13 -2005

Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit Added Book figure

Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit Added Book figure has an error! Function Field (This is book version ORI not supported, no zero extend of immediate needed) Figure 5. 24 page 314 EECC 550 - Shaaban #51 Lec # 4 Winter 2005 12 -13 -2005

Control Line Settings (with jump instruction, j added) Instruction Reg. Dst ALUSrc Memto- Reg

Control Line Settings (with jump instruction, j added) Instruction Reg. Dst ALUSrc Memto- Reg Mem Reg Write Read Mem Write Branch ALUOp 1 ALUOp 0 Jump 1 0 0 0 1 0 0 lw 0 1 1 0 0 0 sw X 1 X 0 0 1 0 0 beq X 0 0 0 1 0 j X X X 0 0 0 X X X 1 R-Format Figure 5. 18 page 308 modified to include j EECC 550 - Shaaban #52 Lec # 4 Winter 2005 12 -13 -2005

Worst Case Timing (Load) Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access

Worst Case Timing (Load) Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access Time New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value Ext. Op Old Value New Value ALUSrc Old Value New Value Memto. Reg Old Value New Value Reg. Wr Old Value New Value bus. A bus. B Delay through Control Logic New Value Register Write Occurs Register File Access Time New Value Old Value Delay through Extender & Mux Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time bus. W Old Value New EECC 550 - Shaaban #53 Lec # 4 Winter 2005 12 -13 -2005

Instruction Timing Comparison Arithmetic & Logical PC Inst Memory Reg File mux ALU mux

Instruction Timing Comparison Arithmetic & Logical PC Inst Memory Reg File mux ALU mux setup Load PC Inst Memory ALU Data Mem Store PC mux Reg File Critical Path Inst Memory Reg File ALU Data Mem Branch PC Inst Memory Reg File Jump PC Inst Memory mux cmp mux setup mux EECC 550 - Shaaban #54 Lec # 4 Winter 2005 12 -13 -2005

Simplified Single Cycle Datapath Timing • Assuming the following datapath/control hardware components delays: –

Simplified Single Cycle Datapath Timing • Assuming the following datapath/control hardware components delays: – – Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns } Obtained from low-level target VLSI implementation technology of components • Ignoring Mux and clk-to-Q delays, critical path analysis: Control Unit Instruction Memory Main ALU Register Read Data Memory Critical Path PC + 4 ALU Register Write (Load) Branch Target ALU Time 0 2 ns ns = nanosecond = 10 -9 second 3 ns 4 ns 5 ns 7 ns 8 ns EECC 550 - Shaaban #55 Lec # 4 Winter 2005 12 -13 -2005

Performance of Single-Cycle (CPI=1) CPU • Assuming the following datapath hardware components delays: –

Performance of Single-Cycle (CPI=1) CPU • Assuming the following datapath hardware components delays: – Memory Units: 2 ns – ALU and adders: 2 ns – Register File: 1 ns • Nanosecond, ns = 10 -9 second The delays needed for each instruction type can be found : Instruction Class Instruction Memory Register Read ALU Operation Data Memory ALU 2 ns 1 ns 2 ns Load 2 ns 1 ns 2 ns Store 2 ns 1 ns 2 ns Branch 2 ns 1 ns 2 ns Jump 2 ns Register Write Total Delay 1 ns 6 ns 1 ns 8 ns 7 ns Load has longest delay of 8 ns thus determining the clock cycle of the CPU to be 8 ns 5 ns 2 ns • The clock cycle is determined by the instruction with longest delay: The load in this case which is 8 ns. Clock rate = 1 / 8 ns = 125 MHz • A program with I = 1, 000 instructions executed takes: Execution Time = T = I x CPI x C = 106 x 1 x 8 x 10 -9 = 0. 008 s = 8 msec EECC 550 - Shaaban #56 Lec # 4 Winter 2005 12 -13 -2005

Drawbacks of Single Cycle Processor 1. Long cycle time: – All instructions must take

Drawbacks of Single Cycle Processor 1. Long cycle time: – All instructions must take as much time as the slowest • Here, cycle time for load is longer than needed for all other instructions. – Cycle time must be long enough for the load instruction: PC’s Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew – Real memory is not as well-behaved as idealized memory • Cannot always complete data access in one (short) cycle. 2. Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle. – e. g indirect memory addressing. 3. High and duplicate hardware resource requirements – Any hardware functional unit cannot be used more than once in a single cycle (e. g. ALUs). 4. Does not allow overlap of instruction processing (instruction pipelining, chapter 6). EECC 550 - Shaaban #57 Lec # 4 Winter 2005 12 -13 -2005