CPU Organization Design Datapath Design Components their connections

  • Slides: 65
Download presentation
CPU Organization (Design) • Datapath Design: Components & their connections needed by ISA instructions

CPU Organization (Design) • Datapath Design: Components & their connections needed by ISA instructions – Capabilities & performance characteristics of principal Functional Units (FUs) needed by ISA instructions – (e. g. , Registers, ALU, Shifters, Logic Units, . . . ) Components – Ways in which these components are interconnected (buses connections, multiplexors, etc. ). Connections – How information flows between components. • Control Unit Design: Control/sequencing of operations of datapath components to realize ISA instructions – Logic and means by which such information flow is controlled. – Control and coordination of FUs operation to realize the targeted Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram). • Hardware description with a suitable language, possibly using Register Transfer Notation (RTN). Chapter 5. 1 -5. 4 EECC 550 - Shaaban #1 Lec # 4 Winter 2006 12 -19 -2006

Major CPU Design Steps 1 Analyze instruction set to get datapath requirements: – Using

Major CPU Design Steps 1 Analyze instruction set to get datapath requirements: – Using independent RTN, write the micro-operations required for target ISA instructions. 1 2 • This provides the required datapath components and how they are connected. 2 Select set of datapath components and establish clocking methodology (defines when storage or state elements can read and when they can be written, e. g clock edge-triggered) 3 Assemble datapath meeting the requirements. 4 Identify and define the function of all control points or signals needed by the datapath. – Analyze implementation of each instruction to determine setting of control points that affects its operations. 5 Control unit design, based on micro-operation timing and control signals identified: – Combinational logic: For single cycle CPU. e. g Any instruction completed in one cycle – Hard-Wired: Finite-state machine implementation. – Microprogrammed. EECC 550 - Shaaban #2 Lec # 4 Winter 2006 12 -19 -2006

CPU Design & Implantation Process • Top-down Design: – Specify component behavior from high-level

CPU Design & Implantation Process • Top-down Design: – Specify component behavior from high-level requirements (ISA). • Bottom-up Design: – Assemble components in target technology to establish critical timing (hardware delays, critical path timing). • Iterative refinement: – Establish a partial solution, expand improve. Instruction Set Architecture (ISA): Provides Requirements Reg. File Mux Processor Datapath ALU Target VLSI implementation Technology Reg Cells Control Mem Decoder Sequencer Gates EECC 550 - Shaaban #3 Lec # 4 Winter 2006 12 -19 -2006

Datapath Design Steps • Write the micro-operation sequences required for a number of representative

Datapath Design Steps • Write the micro-operation sequences required for a number of representative target ISA instructions using independent RTN. • Independent RTN statements specify: the required datapath components and how they are connected. 1 2 • From the above, create an initial datapath by determining possible destinations for each data source (i. e registers, ALU). – This establishes connectivity requirements (data paths, or connections) for datapath components. – Whenever multiple sources are connected to a single input, a multiplexor of appropriate size is added. (or destination) • Find the worst-time propagation delay in the datapath to determine the datapath clock cycle (CPU clock cycle). • Complete the micro-operation sequences for all remaining instructions adding datapath components + connections/multiplexors as needed. EECC 550 - Shaaban #4 Lec # 4 Winter 2006 12 -19 -2006

MIPS Instruction Formats 31 R-Type 26 op 6 bits [31: 26] I-Type: ALU 31

MIPS Instruction Formats 31 R-Type 26 op 6 bits [31: 26] I-Type: ALU 31 [31: 26] 31 J-Type: Jumps [25: 21] 16 rt 5 bits [20: 16] 21 rs 5 bits [25: 21] 11 rd 5 bits 6 shamt 5 bits [15: 11] [10: 6] op 6 bits 0 funct 6 bits [5: 0] 16 0 Immediate (imm 16) rt 5 bits 16 bits [20: 16] [15: 0] 26 [31: 26] • • • rs 5 bits 26 op 6 bits Load/Store, Branch 21 0 target address 26 bits [25: 0] op: Opcode, operation of the instruction. rs, rt, rd: The source and destination register specifiers. shamt: Shift amount. funct: Selects the variant of the operation in the “op” field. address / immediate: Address offset or immediate value. target address: Target address of the jump instruction. EECC 550 - Shaaban #5 Lec # 4 Winter 2006 12 -19 -2006

MIPS R-Type (ALU) Instruction Fields R-Type: All ALU instructions that use three registers 1

MIPS R-Type (ALU) Instruction Fields R-Type: All ALU instructions that use three registers 1 st operand OP rs 6 bits 5 bits [31: 26] • • • 2 nd operand [25: 21] rt 5 bits [20: 16] Destination rd shamt funct 5 bits 6 bits [15: 11] [10: 6] [5: 0] Rs, rt , rd op: Opcode, basic operation of the instruction. are register specifier fields – For R-Type op = 0 Independent RTN: rs: The first register source operand. Instruction Word ¬ Mem[PC] rt: The second register source operand. R[rd] ¬ R[rs] funct R[rt] rd: The register destination operand. PC ¬ PC + 4 shamt: Shift amount used in constant shift operations. funct: Function, selects the specific variant of operation in the op field. Funct field value examples: Add = 32 Sub = 34 AND = 36 OR =37 NOR = 39 Operand register in rs Destination register in rd Examples: add $1, $2, $3 sub $1, $2, $3 R-Type = Register Type Register Addressing used (Mode 1) Operand register in rt and $1, $2, $3 or $1, $2, $3 EECC 550 - Shaaban #6 Lec # 4 Winter 2006 12 -19 -2006

MIPS ALU I-Type Instruction Fields I-Type ALU instructions that use two registers and an

MIPS ALU I-Type Instruction Fields I-Type ALU instructions that use two registers and an immediate value Loads/stores, conditional branches. 1 st operand • • Destination OP rs rt 6 bits 5 bits [31: 26] [25: 21] [20: 16] 2 nd operand Immediate (imm 16) 16 bits imm 16 [15: 0] Independent RTN for addi: op: Opcode, operation of the instruction. Instruction Word ¬ Mem[PC] rs: The register source operand. R[rt] ¬ R[rs] + imm 16 PC ¬ PC + 4 rt: The result destination register. immediate: Constant second operand for ALU instruction. OP = 8 Examples: OP = 12 Result register in rt Source operand register in rs add immediate: addi $1, $2, 100 and immediate andi $1, $2, 10 I-Type = Immediate Type Immediate Addressing used (Mode 2) imm 16 = 16 bit immediate field Constant operand in immediate EECC 550 - Shaaban #7 Lec # 4 Winter 2006 12 -19 -2006

MIPS Load/Store I-Type Instruction Fields Base • • Src. /Dest. OP rs rt address

MIPS Load/Store I-Type Instruction Fields Base • • Src. /Dest. OP rs rt address (e. g. offset) 6 bits 5 bits [31: 26] [25: 21] [20: 16] 16 bits imm 16 [15: 0] Signed address offset in bytes op: Opcode, operation of the instruction. – For load word op = 35, for store word op = 43. rs: The register containing memory base address. rt: For loads, the destination register. For stores, the source register of value to be stored. address: 16 -bit memory address offset in bytes added to base register in rs Examples: source register in rt Store word: Load word: Destination register in rt Base or Displacement Addressing used (Mode 3) Offset sw $3, 500($4) lw $1, 32($2) Offset Instruction Word ¬ Mem[PC] Mem[R[rs] + imm 16] ¬ R[rt] PC ¬ PC + 4 Instruction Word ¬ Mem[PC] R[rt] ¬ Mem[R[rs] + imm 16] PC ¬ PC + 4 base register in rs imm 16 = 16 bit immediate field EECC 550 - Shaaban #8 Lec # 4 Winter 2006 12 -19 -2006

MIPS Branch I-Type Instruction Fields OP rs rt address (e. g. offset) 6 bits

MIPS Branch I-Type Instruction Fields OP rs rt address (e. g. offset) 6 bits 5 bits 16 bits imm 16 [31: 26] • • [25: 21] [20: 16] [15: 0] Signed address offset in words op: Opcode, operation of the instruction. Word = 4 bytes rs: The first register being compared rt: The second register being compared. address: 16 -bit memory address branch target offset in words added to PC to form branch address. Register in rt Register in rs OP = 4 Examples: OP = 5 Branch on equal beq $1, $2, 100 Branch on not equal bne $1, $2, 100 offset in bytes equal to instruction address field x 4 Added to PC+4 to form branch target Independent RTN for beq: Instruction Word ¬ Mem[PC] R[rs] = R[rt] : PC ¬ PC + 4 + imm 16 x 4 R[rs] ¹ R[rt] : PC ¬ PC + 4 PC-Relative Addressing used (Mode 4) imm 16 = 16 bit immediate field EECC 550 - Shaaban #9 Lec # 4 Winter 2006 12 -19 -2006

MIPS J-Type Instruction Fields J-Type: Include jump j, jump and link jal OP jump

MIPS J-Type Instruction Fields J-Type: Include jump j, jump and link jal OP jump target 6 bits 26 bits [31: 26] • • Word = 4 bytes op: Opcode, operation of the instruction. – Jump j op = 2 – Jump and link jal op = 3 jump target: jump memory address in words. Examples: Jump memory address in bytes equal to instruction field jump target x 4 Jump j 10000 Jump and link jal 10000 Effective 32 -bit jump address: PC(31 -28) From PC+4 4 bits Independent RTN for j: Jump target in words [25: 0] PC(31 -28), jump_target, 00 jump target = 2500 26 bits Instruction Word ¬ Mem[PC] PC ¬ PC + 4 PC ¬ PC(31 -28), jump_target, 00 J-Type = Jump Type Pseudodirect Addressing used (Mode 5) 0 0 2 bits EECC 550 - Shaaban #10 Lec # 4 Winter 2006 12 -19 -2006

A Subset of MIPS Instructions ADD and SUB: add rd, rs, rt R sub

A Subset of MIPS Instructions ADD and SUB: add rd, rs, rt R sub rd, rs, rt 31 I OR Immediate: 31 ori rt, rs, imm 16 13 0 26 I BRANCH: beq rs, rt, imm 16 4 11 6 0 rs 5 bits rt 5 bits rd 5 bits shamt 5 bits funct 6 bits [31: 26] [25: 21] [20: 16] [15: 11] [10: 6] [5: 0] 26 21 16 op 6 bits rs 5 bits rt 5 bits [31: 26] [25: 21] [20: 16] 6 bits [31: 26] 31 16 op 6 bits LOAD and STORE Word I lw rt, rs, imm 16 31 26 sw rt, rs, imm 16 op 35 = lw 43 = sw 21 32 = add 34 = sub 26 op 6 bits [31: 26] 21 rs 5 bits [25: 21] 0 Immediate (imm 16) 16 bits [15: 0] 16 rt 5 bits [20: 16] 0 Immediate (imm 16) 16 bits Offset in bytes [15: 0] 16 rt 5 bits [20: 16] 0 Immediate (imm 16) 16 bits Offset in words [15: 0] EECC 550 - Shaaban #11 Lec # 4 Winter 2006 12 -19 -2006

Basic MIPS Instruction Processing Steps Instruction Memory Instruction Fetch Next Obtain instruction from program

Basic MIPS Instruction Processing Steps Instruction Memory Instruction Fetch Next Obtain instruction from program storage Instruction ¬ Mem[PC] Update program counter to address Instruction of next instruction Instruction Determine instruction type PC ¬ PC + 4 Decode Obtain operands from registers Execute Compute result value or status } Common steps for all instructions Done by Control Unit (Based on Opcode) Result Store result in register/memory if needed Store (usually called Write Back). EECC 550 - Shaaban #12 Lec # 4 Winter 2006 12 -19 -2006

Overview of MIPS Instruction Micro-operations • All instructions go through these common steps: –

Overview of MIPS Instruction Micro-operations • All instructions go through these common steps: – Send program counter to instruction memory and fetch the instruction. (fetch) Instruction ¬ Mem[PC] – Update the program counter to point to next instruction PC ¬ PC + 4 – Read one or two registers, using instruction fields. (decode) • Load reads one register only. • Additional instruction execution actions (execution) depend on the instruction in question, but similarities exist: – All instruction classes (except J type) use the ALU after reading the registers: • Memory reference instructions use it for address calculation. • Arithmetic and logic instructions (R-Type), use it for the specified operation. • Branches use it for comparison. • Additional execution steps where instruction classes differ: – Memory reference instructions: Access memory for a load or store. – Arithmetic and logic instructions: Write ALU result back in register. – Branch instructions: Change next instruction address based on comparison. EECC 550 - Shaaban #13 Lec # 4 Winter 2006 12 -19 -2006

A Single Cycle MIPS CPU Design target: A single-cycle per instruction MIPS CPU design

A Single Cycle MIPS CPU Design target: A single-cycle per instruction MIPS CPU design All micro-operations of an instruction are to be carried out in a single CPU clock cycle. Cycles Per Instruction = CPI = 1 CPU Performance Equation: T = I x CPI x C CPI = 1 Figure 5. 1 page 287 Abstract view of single cycle MIPS CPU showing major functional units (components) and major connections between them EECC 550 - Shaaban #14 Lec # 4 Winter 2006 12 -19 -2006

R-Type Example: Micro-Operation Sequence For ADD add rd, rs, rt 0 32 = add

R-Type Example: Micro-Operation Sequence For ADD add rd, rs, rt 0 32 = add 34 = sub 0 OP 6 bits [31: 26] Instruction Word ¬ rs 5 bits [25: 21] rt rd shamt funct 5 bits 6 bits [20: 16] Mem[PC] PC ¬ PC + 4 R[rd] ¬ R[rs] + R[rt] Independent RTN ? [15: 11] [10: 6] [5: 0] Fetch the instruction Increment PC Program Memory Common Steps Add register rs to register rt result in register rd i. e Funct =add EECC 550 - Shaaban #15 Lec # 4 Winter 2006 12 -19 -2006

Initial Datapath Components Instruction ¬ Mem[PC] Three components needed by: Instruction Fetch: Program Counter

Initial Datapath Components Instruction ¬ Mem[PC] Three components needed by: Instruction Fetch: Program Counter Update: PC ¬ PC + 4 32 32 32 Instruction Word 32 Two state elements (memory) needed to store and access instructions: 1 Instruction memory: • Only read access (by user code). No read control signal needed. 2 Program counter (PC): 32 -bit register. • Written at end of every clock cycle (edge-triggered) : No write control signal. 3 32 -bit Adder: To compute the next instruction address (PC + 4). Basics of logic design/logic building blocks review in Appendix B (Book CD) EECC 550 - Shaaban #16 Lec # 4 Winter 2006 12 -19 -2006

More Datapath Components ISA Register File Main 32 -bit ALU 4 32 (Function) 32

More Datapath Components ISA Register File Main 32 -bit ALU 4 32 (Function) 32 32 32 -bit Arithmetic and Logic Unit (ALU) Register File: Zero = Zero flag = 1 When ALU result equals zero • Contains all ISA registers. • Two read ports and one write port. • Register writes by asserting write control signal • Clocking Methodology: Writes are edge-triggered. • Thus can read and write to the same register in the same clock cycle. Basics of logic design/logic building blocks review in Appendix B (Book CD) EECC 550 - Shaaban #17 Lec # 4 Winter 2006 12 -19 -2006

Register File Details RW RA RB Write Enable 5 5 5 • Register File

Register File Details RW RA RB Write Enable 5 5 5 • Register File consists of 32 registers: bus. A – Two 32 -bit output busses: bus. W 32 32 32 -bit bus. A and bus. B 32 Registers bus. B Clk – One 32 -bit input bus: bus. W 32 • Register is selected by: – RA (number) selects the register to put on bus. A (data): bus. A = R[RA] – RB (number) selects the register to put on bus. B (data): bus. B = R[RB] – RW (number) selects the register to be written via bus. W (data) when Write Enable is 1 Write Enable: R[RW] ¬ bus. W • Clock input (CLK) – The CLK input is a factor ONLY during write operations. – During read operation, it behaves as a combinational logic block: • RA or RB valid => bus. A or bus. B valid after “access time. ” EECC 550 - Shaaban #18 Lec # 4 Winter 2006 12 -19 -2006

A Possible Register File Implementation Register Write Enable (Reg. Write) . Write Register RW

A Possible Register File Implementation Register Write Enable (Reg. Write) . Write Register RW 5 0 1 5 -to-32 Decoder Write 30 31 . . Write Data In Write . 32 Register 0 Data In . . Each Register contains 32 edge triggered D-Flip Flops Register 1 . . Data Out Register 30 Register 31 Data Out RW RA RB Write Enable 5 5 5 bus. A bus. W 32 32 32 -bit 32 Registers bus. B Clk 32 Also see Appendix B (Book CD) - The Basics of Logic Design . . 32 32 -to-1 MUX Register Read Data 1 (Bus A) 30 31 Read Register 1 (RA) Register Write Data (Bus W) Clock input to registers not shown in diagram 32 . . . 0 1 5 Data Out Data In . . 32 . . . Data In Write Data Out 32 32 . . . 0 1 32 -to-1 MUX 30 31 32 Register Read Data 2 (Bus B) 5 Read Register 2 (RB) EECC 550 - Shaaban #19 Lec # 4 Winter 2006 12 -19 -2006

Idealized Memory Write Enable Address Data In Data. Out • Memory (idealized) 32 32

Idealized Memory Write Enable Address Data In Data. Out • Memory (idealized) 32 32 – One input bus: Data In. Clk – One output bus: Data Out. Read Enable • Memory word is selected by: – Address selects the word to put on Data Out bus. – Write Enable = 1: address selects the memory word to be written via the Data In bus. • Clock input (CLK): – The CLK input is a factor ONLY during write operation, – During read operation, this memory behaves as a combinational logic block: • Address valid => Data Out valid after “access time. ” • Ideal Memory = Short access time. EECC 550 - Shaaban Compared to other components #20 Lec # 4 Winter 2006 12 -19 -2006

Clocking Methodology Used: Edge Triggered Writes Clk Setup Hold Don’t Care CLK-to-Q . .

Clocking Methodology Used: Edge Triggered Writes Clk Setup Hold Don’t Care CLK-to-Q . . . Critical Path (Longest delay path) • • All storage element (e. g Flip-Flops, Registers, Data Memory) writes are triggered by the same clock edge. Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew Here writes are triggered on the rising edge of the clock EECC 550 - Shaaban #21 Lec # 4 Winter 2006 12 -19 -2006

Building The Datapath PC ¬ PC + 4 Instruction Fetch & PC Update: 32

Building The Datapath PC ¬ PC + 4 Instruction Fetch & PC Update: 32 32 Instruction ¬ Mem[PC] 32 Portion of the datapath used for fetching instructions and incrementing the program counter. PC write or update is edge triggered at the end of the cycle Figure 5. 6 page 293 32 Clock input to PC, memory not shown EECC 550 - Shaaban #22 Lec # 4 Winter 2006 12 -19 -2006

Simplified Datapath For MIPS R-Type Instructions From Instruction Memory [25: 21] rs [20: 16]

Simplified Datapath For MIPS R-Type Instructions From Instruction Memory [25: 21] rs [20: 16] rt 4 R[rs] (Function) 32 [15: 11] rd R[rt] 32 32 32 Components and connections as specified by RTN statement R[rd] ¬ R[rs] + R[rt] Clock input to register bank not shown Destination register R[rd] write or update is edge triggered at the end of the cycle i. e Funct = function =add EECC 550 - Shaaban #23 Lec # 4 Winter 2006 12 -19 -2006

More Detailed Datapath For R-Type Instructions With Control Points Identified Rd Rs Reg. Wr

More Detailed Datapath For R-Type Instructions With Control Points Identified Rd Rs Reg. Wr 5 5 Rw 32 Clk Ra Rb 32 32 -bit Registers ALUctr Function =Add, Subtract … 5 bus. A R[rs] 32 bus. B R[rt] ALU bus. W Rt Result 32 32 R[rd] ¬ R[rs] + R[rt] i. e Funct = function =add EECC 550 - Shaaban #24 Lec # 4 Winter 2006 12 -19 -2006

R-Type Register-Register Timing PC+4 Clk Old Value Rs, Rt, Rd, Op, Func PC Clk-to-Q

R-Type Register-Register Timing PC+4 Clk Old Value Rs, Rt, Rd, Op, Func PC Clk-to-Q New Value Old Value ALUctr Old Value Reg. Wr Old Value bus. A, B Old Value bus. W Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value Register File Access Time New Value ALU Delay New Value Rd Rs Rt Reg. Wr 5 5 5 bus. A 32 bus. B 32 ALU bus. W 32 Clk Rw Ra Rb 32 32 -bit Registers Register Write Occurs Here ALUctr Result 32 All register writes occur on falling edge of clock (clocking methodology) EECC 550 - Shaaban #25 Lec # 4 Winter 2006 12 -19 -2006

Logical Operations with Immediate Example: Micro-Operation Sequence For ORI ori rt, rs, imm 16

Logical Operations with Immediate Example: Micro-Operation Sequence For ORI ori rt, rs, imm 16 31 13 26 21 16 op 6 bits rs 5 bits rt 5 bits [31: 26] [25: 21] [20: 16] Instruction Word ¬ Mem[PC] 0 Immediate (imm 16) 16 bits [15: 0] Fetch the instruction PC ¬ PC + 4 Increment PC R[rt] ¬ R[rs] OR Zero. Ext[imm 16] OR register rs with immediate field zero extended to 32 bits, result in register rt Done by Main ALU EECC 550 - Shaaban #26 Lec # 4 Winter 2006 12 -19 -2006

Datapath For Logical Instructions With Immediate Rd Reg. Dst 1 Rt Mux Reg. Wr

Datapath For Logical Instructions With Immediate Rd Reg. Dst 1 Rt Mux Reg. Wr 5 Rw bus. W 2 x 1 MUX (width 5 bits) 0 Rs Rt 5 5 ALUctr bus. A Ra Rb ALU 32 Clk R[rs] 32 32 32 -bit Registers R[rt] bus. B 16 Zero. Ext imm 16 Mux 32 Function = OR Result 32 2 x 1 MUX (width 32 bits) 32 ALUSrc R[rt] ¬ R[rs] OR Zero. Ext[imm 16] EECC 550 - Shaaban #27 Lec # 4 Winter 2006 12 -19 -2006

Load Operations Example: Micro-Operation Sequence For LW lw rt, rs, imm 16 31 35

Load Operations Example: Micro-Operation Sequence For LW lw rt, rs, imm 16 31 35 26 op 6 bits rs 5 bits [31: 26] [25: 21] Instruction Word ¬ PC + 4 21 16 rt 5 bits [20: 16] Mem[PC] Instruction Memory R[rt] ¬ Mem[R[rs] + Sign. Ext[imm 16]] Effective Address Data Memory 0 Immediate (imm 16) 16 bits [15: 0] Address offset in bytes Fetch the instruction Increment PC Immediate field sign extended to 32 bits and added to register rs to form memory load address, write word at load effective address to register rt EECC 550 - Shaaban #28 Lec # 4 Winter 2006 12 -19 -2006

Additional Datapath Components For Loads & Stores 32 32 32 For Inputs: for address

Additional Datapath Components For Loads & Stores 32 32 32 For Inputs: for address and write (store) data Output for read (load) data Data memory write or update is edge triggered at the end of the cycle (clocking methodology) Sign. Ext[imm 16] 16 -bit input sign-extended into a 32 -bit value at the output EECC 550 - Shaaban #29 Lec # 4 Winter 2006 12 -19 -2006

Datapath For Loads Rd Reg. Dst 1 Mux 0 Reg. Wr 5 32 Clk

Datapath For Loads Rd Reg. Dst 1 Mux 0 Reg. Wr 5 32 Clk Rs 5 5 ALUctr Function = add Base Address register Rw Ra Rb 32 32 -bit Registers 32 bus. B R[rt] 0 1 32 ALUSrc Data In 32 Clk Ext. Op 0 32 Mem. Wr Mux Extender 16 Offset Effective Address Mux 32 imm 16 Memto. Reg R[rs] bus. A ALU bus. W Rt Wr. En Adr Data Memory 32 1 Mem. Rd R[rt] ¬ Mem[R[rs] + Sign. Ext[imm 16]] Data Memory Effective Address EECC 550 - Shaaban #30 Lec # 4 Winter 2006 12 -19 -2006

Store Operations Example: Micro-Operation Sequence For SW sw rt, rs, imm 16 31 43

Store Operations Example: Micro-Operation Sequence For SW sw rt, rs, imm 16 31 43 26 op 6 bits [31: 26] Instruction Word ¬ 21 rs 5 bits [25: 21] 16 rt 5 bits [20: 16] Mem[PC] 0 Immediate (imm 16) 16 bits [15: 0] Address offset in bytes Fetch the instruction PC ¬ PC + 4 Increment PC Mem[R[rs] + Sign. Ext[imm 16]] ¬ R[rt] Immediate field sign extended to 32 bits and added to register rs to form memory store effective address, register rt written to memory at store effective address. Effective Address Data Memory EECC 550 - Shaaban #31 Lec # 4 Winter 2006 12 -19 -2006

Datapath For Stores Rd Reg. Dst 1 Rt Mux Reg. Wr 5 32 Clk

Datapath For Stores Rd Reg. Dst 1 Rt Mux Reg. Wr 5 32 Clk 5 Rs 5 Rt Base Address register Rw Ra Rb 32 32 -bit Registers 32 bus. B R[rt] 32 Ext. Op R[rt] Data In 32 1 Clk ALUSrc 0 32 Wr. En Adr Data Memory Mux Offset Effective Address 0 Mux Extender 16 R[rs] bus. A 32 imm 16 Memto. Reg Mem. Wr 0 ALU bus. W ALUctr 32 1 Mem. Rd Mem[R[rs] + Sign. Ext[imm 16]] ¬ R[rt] Data Memory Effective Address EECC 550 - Shaaban #32 Lec # 4 Winter 2006 12 -19 -2006

Conditional Branch Example: Micro-Operation Sequence For BEQ beq rs, rt, imm 16 31 4

Conditional Branch Example: Micro-Operation Sequence For BEQ beq rs, rt, imm 16 31 4 26 21 op 6 bits rs 5 bits [31: 26] Instruction Word ¬ [25: 21] 16 rt 5 bits [20: 16] Mem[PC] PC ¬ PC + 4 immediate 16 bits [15: 0] PC Offset in words Fetch the instruction Increment PC Zero ¬ R[rs] - R[rt] Condition 0 Calculate the branch condition R[rs] == R[rt] Action (i. e Zero : PC ¬ PC + ( Sign. Ext(imm 16) x 4 ) Branch Target “Zero” is zero flag of main ALU R[rs] - R[rt] = 0 ) Then Zero = 1 Calculate the next instruction’s PC address EECC 550 - Shaaban #33 Lec # 4 Winter 2006 12 -19 -2006

Datapath For Branch Instructions Main ALU evaluates branch condition New adder to compute branch

Datapath For Branch Instructions Main ALU evaluates branch condition New adder to compute branch target: • Sum of incremented PC and the sign-extended lower 16 -bits on the instruction. New 32 -bit Adder (Third ALU) for Branch Target ( Sign. Ext(imm 16) x 4 PC + 4 + ( Sign. Ext(imm 16) x 4 [25: 21] rs [20: 16] rt R[rs] R[rt] [15: 0] imm 16 (Main ALU) Sign. Ext(imm 16) Zero flag =1 if R[rs] - R[rt] = 0 (i. e R[rs] = R[rt]) Main ALU Evaluates Branch Condition (subtract) EECC 550 - Shaaban #34 Lec # 4 Winter 2006 12 -19 -2006

More Detailed Datapath For Branch Operations Zero Instruction Address 32 PCSrc Adder 32 PC+4

More Detailed Datapath For Branch Operations Zero Instruction Address 32 PCSrc Adder 32 PC+4 0 bus. W Clk PC Mux Adder PC Ext Sign extend shift left 2 00 4 imm 16 Reg. Wr 5 5 Branch Target ALU New Third ALU 5 Rt Rw Ra Rb 32 32 -bit Registers PC 1 Branch Target Rs R[rs] bus. A 32 R[rt] bus. B 32 Equal? Branch Zero Main ALU (subtract) Clk New 2 X 1 32 -bit MUX to select next PC value EECC 550 - Shaaban #35 Lec # 4 Winter 2006 12 -19 -2006

Combining The Datapaths For Memory Instructions and R-Type Instructions 4 [25: 21] rs R[rs]

Combining The Datapaths For Memory Instructions and R-Type Instructions 4 [25: 21] rs R[rs] 32 [20: 16] rt R[rt] 32 0 1 32 1 0 R[rt] 32 rt/rd MUX not shown [15: 0] imm 16 Sign. Ext(imm 16) 32 Highlighted muliplexors and connections added to combine the datapaths of memory and R-Type instructions into one datapath This is book version ORI not supported Figure 5. 10 Page 299 EECC 550 - Shaaban #36 Lec # 4 Winter 2006 12 -19 -2006

Instruction Fetch Datapath Added to ALU R-Type and Memory Instructions Datapath PC+ 4 32

Instruction Fetch Datapath Added to ALU R-Type and Memory Instructions Datapath PC+ 4 32 PC 32 rs rt 4 R[rs] R[rt] 0 1 1 32 0 rt/rd MUX not shown 32 32 This is book version ORI not supported, no zero extend of immediate needed EECC 550 - Shaaban #37 Lec # 4 Winter 2006 12 -19 -2006

A Simple Datapath For The MIPS Architecture Datapath of branches and a program counter

A Simple Datapath For The MIPS Architecture Datapath of branches and a program counter multiplexor are added. Resulting datapath can execute in a single cycle the basic MIPS instruction: - load/store word - ALU operations - Branches Zero 32 32 PC +4 Branch 32 32 32 Branch Target 4 rs R[rs] rt R[rt] 0 1 rt/rd MUX not shown 1 32 0 32 32 This is book version ORI not supported, no zero extend of immediate needed Figure 5. 11 page 300 EECC 550 - Shaaban #38 Lec # 4 Winter 2006 12 -19 -2006

Main ALU Control • The main ALU has four control lines (detailed design in

Main ALU Control • The main ALU has four control lines (detailed design in Appendix B) with the following functions: ALU Control Lines 0000 0001 0010 0111 1100 ALU Function AND OR add subtract Set-on-less-than NOR Not Used • For our current subset of MIPS instructions only the top five functions will be used (thus only three control lines will be used) • For R-type instruction the ALU function depends on both the opcode and the 6 -bit “funct” function field • For other instructions the ALU function depends on the opcode only. • A local ALU control unit can be designed to accept 2 -bit ALUop control lines (from main control unit) and the 6 -bit function field and generate the correct 4 -bit ALU control lines. EECC 550 - Shaaban #39 Lec # 4 Winter 2006 12 -19 -2006

Local ALU Decoding of “func” Field op 6 Main Control func 6 ALUop Instruction

Local ALU Decoding of “func” Field op 6 Main Control func 6 ALUop Instruction Opcode Instruction Operation LW SW Branch Equal R-Type R-Type Load word Store word branch equal add subtract AND OR set on less than ALUctr 4 ALU 2 ALU Control (Local) Desired ALUOp Funct Field ALU Action 00 00 01 10 10 10 XXXXXX 100000 100010 100101 101010 add subtract and or set on less than ALU Control Lines 0010 0110 0001 0111 EECC 550 - Shaaban #40 Lec # 4 Winter 2006 12 -19 -2006

Local ALU Control Unit Add = 00 Subtract = 01 R-type =10 Add Subtract

Local ALU Control Unit Add = 00 Subtract = 01 R-type =10 Add Subtract AND OR Set-On-less-Than { Page 302 (2 lines From main control unit) Function Field 3 ALU Control Lines 4 th line = 0 More details found in Appendix C (Book CD) EECC 550 - Shaaban #41 Lec # 4 Winter 2006 12 -19 -2006

Single Cycle MIPS Datapath Necessary multiplexors and control lines are identified here and local

Single Cycle MIPS Datapath Necessary multiplexors and control lines are identified here and local ALU control added: 32 32 Zero PC +4 Branch 32 Branch Target rs 32 rt R[rs] R[rt] rd R[rt] 32 imm 16 Function Field ALUOp (2 -bits) 00 = add 01 = subtract 10 = R-Type 32 This is book version ORI not supported, no zero extend of immediate needed Figure 5. 15 page 305 EECC 550 - Shaaban #42 Lec # 4 Winter 2006 12 -19 -2006

Putting It All Together: A Single Cycle Datapath PCSrc Branch Zero PC+4 Zero Function

Putting It All Together: A Single Cycle Datapath PCSrc Branch Zero PC+4 Zero Function Field 0 32 imm 16 16 (Includes ORI not in book version) 1 = 32 Data In 32 Ext. Op ALUSrc Mem. Wr Memto. Reg Clk 32 Wr. En Adr 0 Mux Clk Extender Clk 00 = add 01 = subtract 10 = R-Type Main ALU 1 bus. W Mux PC Mux Adder Rs Rt 5 5 R[rs] bus. A Rw Ra Rb 32 32 32 -bit R[rt] Registers bus. B 0 32 ALU Control Reg. Wr 5 Branch Target e. g Sign Extend + Shift Left 2 ALUop (2 -bits) Imm 16 Rd Rt 0 1 Adder PC Ext imm 16 Rd Reg. Dst 00 4 Rt Instruction<31: 0> <0: 15> Rs <11: 15> Adr <16: 20> <21: 25> Inst Memory 1 Data Memory Mem. Rd EECC 550 - Shaaban #43 Lec # 4 Winter 2006 12 -19 -2006

Instruction<31: 0> Rd <0: 25> Rs <0: 15> Rt <11: 15> Op Fun <16:

Instruction<31: 0> Rd <0: 25> Rs <0: 15> Rt <11: 15> Op Fun <16: 20> Adr <21: 25> Instruction Memory Imm 16 Jump_target Control Unit Control Lines Reg. Dst ALUSrc Memto. Reg. Write Mem Read Mem Write Branch ALOp (2 -bits) DATA PATH EECC 550 - Shaaban #44 Lec # 4 Winter 2006 12 -19 -2006

The Effect of The Control Signals Signal Name Effect when deasserted (=0) Effect when

The Effect of The Control Signals Signal Name Effect when deasserted (=0) Effect when asserted (=1) Reg. Dst The register destination number for the write register comes from the rt field (instruction bits 20: 16). The register destination number for the write register comes from the rd field (instruction bits 15: 11). Reg. Write None The register on the write register input is written with the value on the Write data input. ALUSrc The second main ALU operand comes from the second register file output (Read data 2) R[rt] The second main ALU operand is the sign-extended lower 16 bits on the instruction (imm 16) Branch The PC is replaced by the output of the adder that computes PC + 4 If Zero =1 The PC is replaced by the output of the adder that computes the branch target. Mem. Read None Data memory contents designated by the address input are put on the Read data output. Mem. Write None Memto. Reg The value fed to the register write data input comes from the main ALU. Data memory contents designated by the address input are replaced by the value on the Write data input. The value fed to the register write data input comes from data memory. EECC 550 - Shaaban #45 Lec # 4 Winter 2006 12 -19 -2006

Control Line Settings Instruction R-Format Reg. Dst ALUSrc Memto- Reg Mem Reg Write Read

Control Line Settings Instruction R-Format Reg. Dst ALUSrc Memto- Reg Mem Reg Write Read Mem Branch ALUOp 1 ALUOp 0 Write 1 0 0 0 1 0 lw 0 1 1 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 0 0 1 ALUOp (2 -bits) 00 = add 01 = subtract 10 = R-Type Figure 5. 18 page 308 EECC 550 - Shaaban #46 Lec # 4 Winter 2006 12 -19 -2006

The Truth Table For The Main Control (Opcode) Similar to Figure 5. 22 Page

The Truth Table For The Main Control (Opcode) Similar to Figure 5. 22 Page 312 EECC 550 - Shaaban #47 Lec # 4 Winter 2006 12 -19 -2006

PLA Implementation of the Main Control Figure C. 2. 5 (Appendix C) PLA =

PLA Implementation of the Main Control Figure C. 2. 5 (Appendix C) PLA = Programmable Logic Array (Appendix B) EECC 550 - Shaaban #48 Lec # 4 Winter 2006 12 -19 -2006

Adding Support For Jump: Micro-Operation Sequence For Jump: J j jump_target 2 OP Jump_target

Adding Support For Jump: Micro-Operation Sequence For Jump: J j jump_target 2 OP Jump_target 6 bits 26 bits [31: 26] Jump address in words [25: 0] Instruction Word ¬ Mem[PC] Fetch the instruction PC ¬ PC + 4 Increment PC PC ¬ PC(31 -28), jump_target, 00 Update PC with jump address PC(31 -28) Jump Address jump target = 2500 4 bits 4 highest bits from PC + 4 26 bits 0 0 2 bits EECC 550 - Shaaban #49 Lec # 4 Winter 2006 12 -19 -2006

Datapath For Jump Branch Zero Next Instruction Address 32 4 PCSrc PC+4 32 00

Datapath For Jump Branch Zero Next Instruction Address 32 4 PCSrc PC+4 32 00 Adder Branch Target PC+4(31 -28) Instruction(25 -0) jump_target 26 Shift left 2 28 0 PC 4 32 Mux e. g Sign Extend + Shift Left 2 Adder imm 16 PC Ext Instruction(15 -0) JUMP PC 1 Jump Address Clk PC(31 -28), jump_target, 00 32 EECC 550 - Shaaban #50 Lec # 4 Winter 2006 12 -19 -2006

Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit Added 32 PC

Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit Added 32 PC +4 32 32 Branch Target Opcode rs Book figure has an error! R[rs] rt R[rt] rd R[rt] imm 16 Figure 5. 24 page 314 Function Field This is book version ORI not supported, no zero extend of immediate needed ALUOp (2 -bits) 00 = add 01 = subtract 10 = R-Type 32 EECC 550 - Shaaban #51 Lec # 4 Winter 2006 12 -19 -2006

Control Line Settings (with jump instruction, j added) Instruction Reg. Dst ALUSrc Memto- Reg

Control Line Settings (with jump instruction, j added) Instruction Reg. Dst ALUSrc Memto- Reg Mem Reg Write Read Mem Write Branch ALUOp 1 ALUOp 0 Jump 1 0 0 0 1 0 0 lw 0 1 1 0 0 0 sw X 1 X 0 0 1 0 0 beq X 0 0 0 1 0 j X X X 0 0 0 X X X 1 R-Format Figure 5. 18 page 308 modified to include j EECC 550 - Shaaban #52 Lec # 4 Winter 2006 12 -19 -2006

Worst Case Timing (Load) Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access

Worst Case Timing (Load) Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access Time New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value Ext. Op Old Value New Value ALUSrc Old Value New Value Memto. Reg Old Value New Value Reg. Wr Old Value New Value bus. A bus. B Delay through Control Logic New Value Register Write Occurs Register File Access Time New Value Old Value Delay through Extender & Mux Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time bus. W Old Value New EECC 550 - Shaaban #53 Lec # 4 Winter 2006 12 -19 -2006

Instruction Timing Comparison Arithmetic & Logical PC Inst Memory Reg File mux ALU mux

Instruction Timing Comparison Arithmetic & Logical PC Inst Memory Reg File mux ALU mux setup Load PC Inst Memory ALU Data Mem Store PC mux Reg File Critical Path Inst Memory Reg File ALU Data Mem Branch PC Inst Memory Reg File Jump PC Inst Memory mux cmp mux setup mux EECC 550 - Shaaban #54 Lec # 4 Winter 2006 12 -19 -2006

Simplified Single Cycle Datapath Timing • Assuming the following datapath/control hardware components delays: –

Simplified Single Cycle Datapath Timing • Assuming the following datapath/control hardware components delays: – – Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns } Obtained from low-level target VLSI implementation technology of components • Ignoring Mux and clk-to-Q delays, critical path analysis: Control Unit Instruction Memory Main ALU Register Read Data Memory Critical Path PC + 4 ALU Register Write (Load) Branch Target ALU Time 0 2 ns ns = nanosecond = 10 -9 second 3 ns 4 ns 5 ns 7 ns 8 ns EECC 550 - Shaaban #55 Lec # 4 Winter 2006 12 -19 -2006

Performance of Single-Cycle (CPI=1) CPU • Assuming the following datapath hardware components delays: –

Performance of Single-Cycle (CPI=1) CPU • Assuming the following datapath hardware components delays: – Memory Units: 2 ns – ALU and adders: 2 ns – Register File: 1 ns • Nanosecond, ns = 10 -9 second The delays needed for each instruction type can be found : Instruction Class Instruction Memory Register Read ALU Operation Data Memory ALU 2 ns 1 ns 2 ns Load 2 ns 1 ns 2 ns Store 2 ns 1 ns 2 ns Branch 2 ns 1 ns 2 ns Jump 2 ns Register Write Total Delay 1 ns 6 ns 1 ns 8 ns 7 ns Load has longest delay of 8 ns thus determining the clock cycle of the CPU to be 8 ns 5 ns 2 ns • The clock cycle is determined by the instruction with longest delay: The load in this case which is 8 ns. Clock rate = 1 / 8 ns = 125 MHz • A program with I = 1, 000 instructions executed takes: Execution Time = T = I x CPI x C = 106 x 1 x 8 x 10 -9 = 0. 008 s = 8 msec EECC 550 - Shaaban #56 Lec # 4 Winter 2006 12 -19 -2006

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5. 20)

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5. 20) • The MIPS jump and link instruction, jal is used to support procedure calls by jumping to jump address (similar to j ) and saving the address of the following instruction PC+4 in register $ra ($31) jal Address • jal uses the j instruction format: op (6 bits) Target address (26 bits) • We wish to add jal to the single cycle datapath in Figure 5. 24 page 314. Add any necessary datapaths and control signals to the single-clock datapath and justify the need for the modifications, if any. • Specify control line values for this instruction. EECC 550 - Shaaban #57 Lec # 4 Winter 2006 12 -19 -2006

Exercise 5. 20: jump and link, jal support to Single Cycle Datapath Instruction Word

Exercise 5. 20: jump and link, jal support to Single Cycle Datapath Instruction Word ¬ Mem[PC] R[31] ¬ PC + 4 PC ¬ Jump Address PC + 4 Branch Target PC + 4 rs R[rs] rt 31 R[rt] 2 2 rd imm 16 (For More Practice Exercise 5. 20) EECC 550 - Shaaban #58 Lec # 4 Winter 2006 12 -19 -2006

Exercise 5. 20: jump and link, jal support to Single Cycle Datapath Adding Control

Exercise 5. 20: jump and link, jal support to Single Cycle Datapath Adding Control Lines Settings for jal (For Textbook Single Cycle Datapath including Jump) Reg. Dst Is now 2 bits Memto. Reg Is now 2 bits Reg. Dst ALUSrc Memto. Reg Write Mem Read Mem Write Branch ALUOp 1 ALUOp 0 Jump R-format 01 0 00 1 0 0 lw 00 1 01 1 1 0 0 0 sw xx 1 xx 0 0 1 0 0 beq xx 0 0 0 1 0 J xx x xx 0 0 0 x x x 1 JAL 10 x 10 1 0 0 x x x 1 R[31] PC+ 4 PC ¬ Jump Address Instruction Word ¬ Mem[PC] R[31] ¬ PC + 4 PC ¬ Jump Address (For More Practice Exercise 5. 20) EECC 550 - Shaaban #59 Lec # 4 Winter 2006 12 -19 -2006

Adding Support for LWR to Single Cycle Datapath (For More Practice Exercise 5. 22)

Adding Support for LWR to Single Cycle Datapath (For More Practice Exercise 5. 22) • We wish to add a variant of lw (load word) let’s call it LWR to the single cycle datapath in Figure 5. 24 page 314. LWR $rd, $rs, $rt • The LWR instruction is similar to lw but it sums two registers (specified by $rs, $rt) to obtain the effective load address and uses the R-Type format • Add any necessary datapaths and control signals to the single cycle datapath and justify the need for the modifications, if any. • Specify control line values for this instruction. (For More Practice Exercise 5. 22) EECC 550 - Shaaban #60 Lec # 4 Winter 2006 12 -19 -2006

Exercise 5. 22: LWR (R-format LW) support to Single Cycle Datapath Instruction Word ¬

Exercise 5. 22: LWR (R-format LW) support to Single Cycle Datapath Instruction Word ¬ Mem[PC] PC ¬ PC + 4 R[rd] ¬ Mem[ R[rs] + R[rt] ] No new components or connections are needed for the datapath just the proper control line settings Adding Control Lines Settings for LWR (For Textbook Single Cycle Datapath including Jump) Reg Write Mem Read Mem Write Branch ALUOp 1 ALUOp 0 Jump 0 1 0 0 1 1 0 0 0 x 1 x 0 0 1 0 0 beq x 0 0 0 1 0 J x x x 0 0 0 x x x 1 LWR 1 0 1 1 1 0 0 0 Reg. Dst ALUSrc R-format 1 0 lw 0 sw rd Memto. Reg R[rt] (For More Practice Exercise 5. 22) Add EECC 550 - Shaaban #61 Lec # 4 Winter 2006 12 -19 -2006

Adding Support for jm to Single Cycle Datapath (Based on “For More Practice Exercise

Adding Support for jm to Single Cycle Datapath (Based on “For More Practice Exercise 5. 44” but for single cycle) • We wish to add a new instruction jm (jump memory) to the single cycle datapath in Figure 5. 24 page 314. jm offset($rs) • The jm instruction loads a word from effective address (R[rs] + offset), this is similar to lw except the loaded word is put in the PC instead of register $rt. • Jm used the I-format with field rt not used. OP rs rt 6 bits 5 bits address (imm 16) Not Used 16 bits • Add any necessary datapaths and control signals to the single cycle datapath and justify the need for the modifications, if any. • Specify control line values for this instruction. EECC 550 - Shaaban #62 Lec # 4 Winter 2006 12 -19 -2006

Adding jump memory, jm support to Single Cycle Datapath Instruction Word ¬ Mem[PC] PC

Adding jump memory, jm support to Single Cycle Datapath Instruction Word ¬ Mem[PC] PC ¬ Mem[R[rs] + Sign. Ext[imm 16]] Jump 2 2 Jump PC + 4 2 Branch Target rs R[rs] rt R[rt] rd imm 16 (Based on “For More Practice Exercise 5. 44” but for single cycle) EECC 550 - Shaaban #63 Lec # 4 Winter 2006 12 -19 -2006

Adding jm support to Single Cycle Datapath Adding Control Lines Settings for jm (For

Adding jm support to Single Cycle Datapath Adding Control Lines Settings for jm (For Textbook Single Cycle Datapath including Jump) Jump is now 2 bits Reg. Dst ALUSrc Memto. Reg Write Mem Read Mem Write Branch ALUOp 1 ALUOp 0 Jump R-format 1 0 0 0 1 0 00 lw 0 1 1 0 0 00 sw x 1 x 0 0 1 0 00 beq x 0 0 0 1 00 J x x x 0 0 0 x x x 01 Jm x 1 x 0 1 0 x 0 0 10 R[rs] add PC ¬ Mem[R[rs] + Sign. Ext[imm 16]] EECC 550 - Shaaban #64 Lec # 4 Winter 2006 12 -19 -2006

Drawbacks of Single Cycle Processor 1. Long cycle time: – All instructions must take

Drawbacks of Single Cycle Processor 1. Long cycle time: – All instructions must take as much time as the slowest • Here, cycle time for load is longer than needed for all other instructions. – Cycle time must be long enough for the load instruction: PC’s Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew – Real memory is not as well-behaved as idealized memory • Cannot always complete data access in one (short) cycle. 2. Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle. – e. g indirect memory addressing. 3. High and duplicate hardware resource requirements – Any hardware functional unit cannot be used more than once in a single cycle (e. g. ALUs). 4. Does not allow overlap of instruction processing (instruction pipelining, chapter 6). EECC 550 - Shaaban #65 Lec # 4 Winter 2006 12 -19 -2006