Single Cycle Processor Design ICS 233 Computer Architecture

  • Slides: 51
Download presentation
Single Cycle Processor Design ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh

Single Cycle Processor Design ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals [Adapted from slides of Dr. M. Mudawar, ICS 233, KFUPM]

Outline v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an

Outline v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an Adequate Datapath v Controlling the Execution of Instructions v The Main Controller and ALU Controller v Drawback of the single-cycle processor design Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 2

The Performance Perspective v Recall, performance is determined by: ² Instruction count I-Count ²

The Performance Perspective v Recall, performance is determined by: ² Instruction count I-Count ² Clock cycles per instruction (CPI) ² Clock cycle time CPI Cycle v Processor design will affect ² Clock cycles per instruction ² Clock cycle time v Single cycle datapath and control design: ² Advantage: One clock cycle per instruction ² Disadvantage: long cycle time Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 3

Designing a Processor: Step-by-Step v Analyze instruction set => datapath requirements ² The meaning

Designing a Processor: Step-by-Step v Analyze instruction set => datapath requirements ² The meaning of each instruction is given by the register transfers ² Datapath must include storage elements for ISA registers ² Datapath must support each register transfer v Select datapath components and clocking methodology v Assemble datapath meeting the requirements v Analyze implementation of each instruction ² Determine the setting of control signals for register transfer v Assemble the control logic Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 4

Review of MIPS Instruction Formats v All instructions are 32 -bit wide v Three

Review of MIPS Instruction Formats v All instructions are 32 -bit wide v Three instruction formats: R-type, I-type, and J-type Op 6 Rs 5 Rt 5 Op 6 Rd 5 sa 5 funct 6 immediate 16 immediate 26 ² Op 6: 6 -bit opcode of the instruction ² Rs 5, Rt 5, Rd 5: 5 -bit source and destination register numbers ² sa 5: 5 -bit shift amount used by shift instructions ² funct 6: 6 -bit function field for R-type instructions ² immediate 16: 16 -bit immediate value or address offset ² immediate 26: 26 -bit target address of the jump instruction Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 5

MIPS Subset of Instructions v Only a subset of the MIPS instructions are considered

MIPS Subset of Instructions v Only a subset of the MIPS instructions are considered ² ALU instructions (R-type): add, sub, and, or, xor, slt ² Immediate instructions (I-type): addi, slti, andi, ori, xori ² Load and Store (I-type): lw, sw ² Branch (I-type): beq, bne ² Jump (J-type): j v This subset does not include all the integer instructions v But sufficient to illustrate design of datapath and control v Concepts used to implement the MIPS subset are used to construct a broad spectrum of computers Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 6

Details of the MIPS Subset Instruction add sub and or xor slt addi slti

Details of the MIPS Subset Instruction add sub and or xor slt addi slti andi ori xori lw sw beq bne j Meaning Format rd, rs, rt addition rd, rs, rt subtraction rd, rs, rt bitwise and rd, rs, rt bitwise or rd, rs, rt exclusive or rd, rs, rt set on less than rt, rs, im 16 add immediate rt, rs, im 16 slt immediate rt, rs, im 16 and immediate rt, rs, im 16 or immediate rt, im 16 xor immediate rt, im 16(rs) load word rt, im 16(rs) store word rs, rt, im 16 branch if equal rs, rt, im 16 branch not equal im 26 jump Single Cycle Processor Design op 6 = 0 op 6 = 0 0 x 08 0 x 0 a 0 x 0 c 0 x 0 d 0 x 0 e 0 x 23 0 x 2 b 0 x 04 0 x 05 0 x 02 ICS 233 – KFUPM rs 5 rs 5 rs 5 rs 5 rt 5 rt 5 rt 5 rt 5 rd 5 rd 5 0 0 0 im 16 im 16 im 16 0 x 20 0 x 22 0 x 24 0 x 25 0 x 26 0 x 2 a im 26 © Muhamed Mudawar slide 7

Register Transfer Level (RTL) v RTL is a description of data flow between registers

Register Transfer Level (RTL) v RTL is a description of data flow between registers v RTL gives a meaning to the instructions v All instructions are fetched from memory at address PC Instruction RTL Description ADD Reg(Rd) ← Reg(Rs) + Reg(Rt); PC ← PC + 4 SUB Reg(Rd) ← Reg(Rs) – Reg(Rt); PC ← PC + 4 ORI Reg(Rt) ← Reg(Rs) | zero_ext(Im 16); PC ← PC + 4 LW Reg(Rt) ← MEM[Reg(Rs) + sign_ext(Im 16)]; PC ← PC + 4 SW MEM[Reg(Rs) + sign_ext(Im 16)] ← Reg(Rt); PC ← PC + 4 BEQ if (Reg(Rs) == Reg(Rt)) PC ← PC + 4 × sign_extend(Im 16) else PC ← PC + 4 Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 8

Instructions are Executed in Steps v R-type Fetch instruction: Instruction ← MEM[PC] Fetch operands:

Instructions are Executed in Steps v R-type Fetch instruction: Instruction ← MEM[PC] Fetch operands: data 1 ← Reg(Rs), data 2 ← Reg(Rt) Execute operation: ALU_result ← func(data 1, data 2) Write ALU result: Reg(Rd) ← ALU_result Next PC address: PC ← PC + 4 v I-type Fetch instruction: Instruction ← MEM[PC] Fetch operands: data 1 ← Reg(Rs), data 2 ← Extend(imm 16) Execute operation: ALU_result ← op(data 1, data 2) Write ALU result: Reg(Rt) ← ALU_result Next PC address: PC ← PC + 4 v BEQ Fetch instruction: Instruction ← MEM[PC] Fetch operands: data 1 ← Reg(Rs), data 2 ← Reg(Rt) Equality: zero ← subtract(data 1, data 2) Branch: if (zero) PC ← PC + 4×sign_ext(imm 16) else PC ← PC + 4 Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 9

Instruction Execution – cont’d v LW Fetch instruction: Instruction ← MEM[PC] Fetch base register:

Instruction Execution – cont’d v LW Fetch instruction: Instruction ← MEM[PC] Fetch base register: base ← Reg(Rs) Calculate address: address ← base + sign_extend(imm 16) Read memory: data ← MEM[address] Write register Rt: Reg(Rt) ← data Next PC address: PC ← PC + 4 v SW Fetch instruction: Instruction ← MEM[PC] Fetch registers: base ← Reg(Rs), data ← Reg(Rt) Calculate address: address ← base + sign_extend(imm 16) Write memory: MEM[address] ← data Next PC address: PC ← PC + 4 concatenation v Jump Fetch instruction: Instruction ← MEM[PC] Target PC address: target ← PC[31: 28] , Imm 26 , ‘ 00’ Jump: PC ← target Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 10

Requirements of the Instruction Set v Memory ² Instruction memory where instructions are stored

Requirements of the Instruction Set v Memory ² Instruction memory where instructions are stored ² Data memory where data is stored v Registers ² 32 × 32 -bit general purpose registers, R 0 is always zero ² Read source register Rs ² Read source register Rt ² Write destination register Rt or Rd v Program counter PC register and Adder to increment PC v Sign and Zero extender for immediate constant v ALU for executing instructions Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 11

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an Adequate Datapath v Controlling the Execution of Instructions v The Main Controller and ALU Controller v Drawback of the single-cycle processor design Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 12

Components of the Datapath v Combinational Elements ² ALU, Adder 32 0 16 Extend

Components of the Datapath v Combinational Elements ² ALU, Adder 32 0 16 Extend m u x 32 ² Immediate extender Ext. Op ² Multiplexers v Storage Elements select ² Instruction memory PC Instruction 32 32 ² Register file overflow Data Memory 32 32 Data_out Data_in Mem. Write 32 RA Bus. A RB Bus. B 5 v Clocking methodology ALU result Mem. Read Registers 5 zero Address Instruction Memory ² PC register 32 ALU control 32 Address ² Data memory 32 5 RW ² Timing of reads and writes Single Cycle Processor Design 32 1 A L U ICS 233 – KFUPM Bus. W Clock Reg. Write 32 © Muhamed Mudawar slide 13

Register Element v Register ² Similar to the D-type Flip-Flop v n-bit input and

Register Element v Register ² Similar to the D-type Flip-Flop v n-bit input and output Data_In Register Clock Data_Out v Write Enable: n bits Write Enable n bits ² Enable / disable writing of register ² Negated (0): Data_Out will not change ² Asserted (1): Data_Out will become Data_In after clock edge v Edge triggered Clocking ² Register output is modified at clock edge Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 14

MIPS Register File RW RA RB v Register File consists of 32 × 32

MIPS Register File RW RA RB v Register File consists of 32 × 32 -bit registers ² Bus. A and Bus. B: 32 -bit output busses for reading 2 registers ² Bus. W: 32 -bit input bus for writing a register when Reg. Write is 1 ² Two registers read and one written in a cycle v Registers are selected by: ² RA selects register to be read on Bus. A ² RB selects register to be read on Bus. B ² RW selects the register to be written v Clock input 5 RA Register File Bus. A 32 5 RB 5 RW 32 Bus. B Clock Bus. W Reg. Write 32 ² The clock input is used ONLY during write operation ² During read, register file behaves as a combinational logic block § RA or RB valid => Bus. A or Bus. B valid after access time Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 15

Tri-State Buffers v Allow multiple sources to drive a single bus v Two Inputs:

Tri-State Buffers v Allow multiple sources to drive a single bus v Two Inputs: Enable ² Data signal (data_in) ² Output enable Data_in Data_out v One Output (data_out): ² If (Enable) Data_out = Data_in else Data_out = High Impedance state (output is disconnected) v Tri-state buffers can be used to build multiplexors Data_0 Output Data_1 Select Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 16

Details of the Register File RA 5 Decoder 32 R 0 is not used

Details of the Register File RA 5 Decoder 32 R 0 is not used "0" RB 5 Decoder Tri-state buffer 32 R 1 "0" RW 5 Decoder 32 Bus. W . . . 32 32 R 2 . . . 32 R 31 Clock Single Cycle Processor Design 32 Bus. A 32 32 Reg. Write ICS 233 – KFUPM Bus. B © Muhamed Mudawar slide 17

Shift Operation Building a Multifunction ALU None = 00 SLL = 01 SRL =

Shift Operation Building a Multifunction ALU None = 00 SLL = 01 SRL = 10 SRA = 11 2 SLT: ALU does a SUB and check the sign and overflow 32 Shift Amount Shifter lsb 5 Arithmetic Operation A B c 0 32 32 32 ADD = 0 SUB = 1 A d d e r 0 Logical Operation 1 Single Cycle Processor Design 2 3 2 ICS 233 – KFUPM ALU Result 1 32 2 3 overflow Logic Unit AND = 00 OR = 01 NOR = 10 XOR = 11 sign 0 2 zero ALU Selection Shift = 00 SLT = 01 Arith = 10 Logic = 11 © Muhamed Mudawar slide 18

Instruction and Data Memories v Instruction memory needs only provide read access ² Because

Instruction and Data Memories v Instruction memory needs only provide read access ² Because datapath does not write instructions ² Behaves as combinational logic for read 32 v Data Memory is used for load and store ² Mem. Read: enables output on Data_out ² Mem. Write: enables writing of Data_in § Address selects the memory word to be written § The Clock synchronizes the write operation v Separate instruction and data memories 32 Instruction Memory ² Address selects Instruction after access time § Address selects the word to put on Data_out Address Instruction Data Memory 32 32 Address Data_out 32 Data_in Clock Mem. Read Mem. Write ² Later, we will replace them with caches Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 19

Clocking Methodology v Clocks are needed in a sequential v We assume edgelogic to

Clocking Methodology v Clocks are needed in a sequential v We assume edgelogic to decide when a state element triggered clocking (register) should be updated v All state changes Combinational logic clock rising edge Single Cycle Processor Design falling edge ICS 233 – KFUPM Register 2 Register 1 occur on the same v To ensure correctness, a clocking clock edge methodology defines when data can v Data must be valid be written and read and stable before arrival of clock edge v Edge-triggered clocking allows a register to be read and written during same clock cycle © Muhamed Mudawar slide 20

Determining the Clock Cycle Register 2 Register 1 v With edge-triggered clocking, the clock

Determining the Clock Cycle Register 2 Register 1 v With edge-triggered clocking, the clock cycle must be long enough to accommodate the path from one register through the combinational logic to another register Combinational logic writing edge Tmax_comb Ts v Th: hold time that input to a Th register must hold after arrival of clock edge Tcycle ≥ Tclk-q + Tmax_comb + Ts Single Cycle Processor Design ICS 233 – KFUPM v Tmax_comb : longest delay through combinational logic v Ts : setup time that input to a register must be stable before arrival of clock edge clock Tclk-q v Tclk-q : clock to output delay through register v Hold time (Th) is normally satisfied since Tclk-q > Th © Muhamed Mudawar slide 21

Clock Skew v Clock skew arises because the clock signal uses different paths with

Clock Skew v Clock skew arises because the clock signal uses different paths with slightly different delays to reach state elements v Clock skew is the difference in absolute time between when two storage elements see a clock edge v With a clock skew, the clock cycle time is increased Tcycle ≥ Tclk-q + Tmax_combinational + Tsetup+ Tskew v Clock skew is reduced by balancing the clock delays Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 22

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an Adequate Datapath v Controlling the Execution of Instructions v The Main Controller and ALU Controller v Drawback of the single-cycle processor design Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 23

Instruction Fetching Datapath v We can now assemble the datapath from its components v

Instruction Fetching Datapath v We can now assemble the datapath from its components v For instruction fetching, we need … ² Program Counter (PC) register ² Instruction Memory ² Adder for incrementing PC The least significant 2 bits of the PC are ‘ 00’ since PC is a multiple of 4 32 PC 00 32 A d d Instruction 32 Address Instruction Memory Single Cycle Processor Design 32 next PC Datapath does not handle branch or jump instructions ICS 233 – KFUPM Improved Datapath +1 30 00 4 30 32 PC next PC Improved datapath increments upper 30 bits of PC by 1 Instruction 32 Address Instruction Memory © Muhamed Mudawar slide 24

Datapath for R-type Instructions Op 6 Rs 5 Rt 5 Rd 5 sa 5

Datapath for R-type Instructions Op 6 Rs 5 Rt 5 Rd 5 sa 5 funct 6 Reg. Write ALUCtrl +1 00 30 30 Instruction Memory Instruction PC 32 Address Registers Rs 5 32 Rt 5 Rd 5 RA RB RW 32 Bus. A Bus. B Bus. W 32 A L U 32 ALU result RA & RB come from the instruction’s Rs & Rt fields ALU inputs come from Bus. A & Bus. B RW comes from the Rd field ALU result is connected to Bus. W v Control signals ² ALUCtrl is derived from the funct field because Op = 0 for R-type ² Reg. Write is used to enable the writing of the ALU result Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 25

Datapath for I-type ALU Instructions Op 6 Rs 5 Rt 5 immediate 16 Reg.

Datapath for I-type ALU Instructions Op 6 Rs 5 Rt 5 immediate 16 Reg. Write ALUCtrl +1 00 30 Instruction Memory 30 Instruction PC 32 Address Registers Rs 5 32 5 Rt 5 RA RB RW 32 Bus. A 32 Bus. B Bus. W 32 A L U 32 ALU result Ext. Op RW now comes from Rt, instead of Rd Imm 16 Extender Second ALU input comes from the extended immediate v Control signals ² ALUCtrl is derived from the Op field RB and Bus. B are not used ² Reg. Write is used to enable the writing of the ALU result ² Ext. Op is used to control the extension of the 16 -bit immediate Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 26

Combining R-type & I-type Datapaths Reg. Write ALUCtrl +1 00 30 30 Instruction Memory

Combining R-type & I-type Datapaths Reg. Write ALUCtrl +1 00 30 30 Instruction Memory Instruction PC 32 Registers Rs 5 32 Rt 5 Address A mux selects RW as either Rt or Rd RA RB 0 m u Rd x 5 1 RW Bus. A 32 Bus. B Ext. Op Extender 0 m u x Bus. W Reg. Dst Imm 16 32 A L U 32 1 32 ALUSrc Another mux selects 2 nd ALU input as either source register Rt data on Bus. B or the extended immediate ALU result v Control signals ² ALUCtrl is derived from either the Op or the funct field ² Reg. Write enables the writing of the ALU result ² Ext. Op controls the extension of the 16 -bit immediate ² Reg. Dst selects the register destination as either Rt or Rd ² ALUSrc selects the 2 nd ALU source as Bus. B or extended immediate Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 27

Controlling ALU Instructions Reg. Write = 1 ALUCtrl +1 00 30 30 Instruction Memory

Controlling ALU Instructions Reg. Write = 1 ALUCtrl +1 00 30 30 Instruction Memory Instruction Registers Rs 5 32 Rt 32 PC Address 5 Bus. A RA 32 RB 5 RW 1 A L U 0 Bus. B 0 m u Rd x 32 m u x Bus. W 1 32 Ext. Op ALUSrc = 0 Reg. Dst = 1 ALU result Extender Imm 16 32 For R-type ALU instructions, Reg. Dst is ‘ 1’ to select Rd on RW and ALUSrc is ‘ 0’ to select Bus. B as second ALU input. The active part of datapath is shown in green Reg. Write = 1 ALUCtrl +1 00 30 30 Instruction Memory Instruction PC 32 Registers Rs 5 32 Rt Address 5 Bus. A RA 32 RB 5 RW 1 0 Bus. B 0 m u Rd x 32 m u x Bus. W Ext. Op Single Cycle Processor Design 32 1 32 Reg. Dst = 0 Imm 16 A L U Extender ICS 233 – KFUPM ALUSrc = 1 ALU result For I-type ALU instructions, Reg. Dst is ‘ 0’ to select Rt on RW and ALUSrc is ‘ 1’ to select Extended immediate as second ALU input. The active part of datapath is shown in green © Muhamed Mudawar slide 28

Details of the Extender v Two types of extensions ² Zero-extension for unsigned constants

Details of the Extender v Two types of extensions ² Zero-extension for unsigned constants ² Sign-extension for signed constants v Control signal Ext. Op indicates type of extension v Extender Implementation: wiring and one AND gate Ext. Op = 0 Upper 16 = 0 . . . Ext. Op Upper 16 bits Ext. Op = 1 Upper 16 = sign bit Single Cycle Processor Design Lower 16 bits . . . Imm 16 ICS 233 – KFUPM © Muhamed Mudawar slide 29

Adding Data Memory to Datapath v A data memory is added for load and

Adding Data Memory to Datapath v A data memory is added for load and store instructions Ext. Op Imm 16 +1 00 30 30 Instruction Memory Instruction PC 32 Rs 5 32 Rt 5 Address Extender 5 32 Mem. Read ALUSrc 32 Registers RB Bus. B RW Bus. W 1 0 m u x 1 Mem. Write ALU result Bus. A RA 0 m u Rd x ALUCtrl A L U 32 Memto. Reg Data Memory Address Data_out Data_in 0 32 m 32 u x 1 32 Reg. Dst Reg. Write ALU calculates data memory address v Additional Control signals ² Mem. Read for load instructions ² Mem. Write for store instructions A 3 rd mux selects data on Bus. W as either ALU result or memory data_out Bus. B is connected to Data_in of Data Memory for store instructions ² Memto. Reg selects data on Bus. W as ALU result or Memory Data_out Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 30

Controlling the Execution of Load Ext. Op = ‘sign’ to sign-extend Immmediate 16 to

Controlling the Execution of Load Ext. Op = ‘sign’ to sign-extend Immmediate 16 to 32 bits Imm 16 +1 00 30 30 Instruction Memory Instruction PC 32 Rs 5 32 Rt 5 Address Extender ALUCtrl = ADD 32 Mem. Read =1 ALUSrc =1 32 Registers RB Bus. B RW Bus. W 1 0 m u x 1 Mem. Write =0 ALU result Bus. A RA 0 m u Rd x 5 Reg. Dst = ‘ 0’ selects Rt as destination register Ext. Op = sign A L U 32 Memto. Reg =1 Data Memory Address Data_out Data_in 0 32 m 32 u x 1 32 Reg. Dst Reg. Write =0 =1 Mem. Read = ‘ 1’ to read data memory ALUSrc = ‘ 1’ selects extended immediate as second ALU input Memto. Reg = ‘ 1’ places the data read from memory on Bus. W ALUCtrl = ‘ADD’ to calculate data memory address as Reg(Rs) + sign-extend(Imm 16) Reg. Write = ‘ 1’ to write the memory data on Bus. W to register Rt Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 31

Controlling the Execution of Store Ext. Op = ‘sign’ to sign-extend Immmediate 16 to

Controlling the Execution of Store Ext. Op = ‘sign’ to sign-extend Immmediate 16 to 32 bits Ext. Op = sign Imm 16 +1 00 30 30 Instruction Memory Instruction PC 32 Rs 5 32 Rt 5 Address 32 ALUSrc =1 Registers RB Bus. B RW Bus. W 0 m u x 1 Mem. Write =1 ALU result 32 1 Reg. Ds t=x Mem. Read =0 Bus. A RA 0 m u Rd x 5 Reg. Dst = ‘x’ because no destination register Extender ALUCtrl = ADD A L U 32 Memto. Reg =x Data Memory Address Data_out Data_in 0 32 m 32 u x 1 32 Reg. Write =0 Mem. Write = ‘ 1’ to write data memory ALUSrc = ‘ 1’ to select the extended immediate as second ALU input Memto. Reg = ‘x’ because we don’t care what data is placed on Bus. W ALUCtrl = ‘ADD’ to calculate data memory address as Reg(Rs) + sign-extend(Imm 16) Reg. Write = ‘ 0’ because no register is written by the store instruction Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 32

Adding Jump and Branch to Datapath 30 Jump or Branch Target Address 30 30

Adding Jump and Branch to Datapath 30 Jump or Branch Target Address 30 30 Next PC Imm 26 +1 PC 0 m u x 00 PCSrc 30 Imm 16 Instruction Memory Instruction Rs 5 32 Rt 5 Address 1 Registers RB 5 Ext 0 Bus. B RW m u x Bus. W Mem. Write ALU result Memto. Reg zero Data Memory Bus. A RA 0 m u Rd x Mem. Read A L U Address Data_out Data_in 0 32 m 32 u x 1 1 1 Reg. Dst Reg. Write ALUSrc ALUCtrl v Additional Control Signals ² J, Beq, Bne for jump and branch instructions ² Zero condition of the ALU is examined ² PCSrc = 1 for Jump & taken Branch Single Cycle Processor Design ICS 233 – KFUPM J, Beq, Bne Next PC computes jump or branch target instruction address For Branch, ALU does a subtraction © Muhamed Mudawar slide 33

Details of Next PC PCSrc Branch or Jump Target Address 30 Inc PC 30

Details of Next PC PCSrc Branch or Jump Target Address 30 Inc PC 30 Sign-Extension: Most-significant bit is replicated A D D 30 0 m 30 u x SE Imm 16 Beq Bne msb 4 Imm 26 1 26 Imm 16 is sign-extended to 30 bits J Zero Jump target address: upper 4 bits of PC are concatenated with Imm 26 PCSrc = J + (Beq. Zero) + (Bne. Zero) Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 34

Controlling the Execution of Jump 30 Jump Target Address 30 30 Next PC Imm

Controlling the Execution of Jump 30 Jump Target Address 30 30 Next PC Imm 26 PCSrc =1 00 PC 0 m u x +1 30 Imm 16 Instruction Memory Instruction Rs 5 32 Rt 5 Address 1 RB m u Rd x RW Ext 0 Bus. B m u x Bus. W Mem. Write =0 ALU result Memto. Reg =x zero Bus. A Registers 0 5 J = 1 selects Imm 26 as jump target address RA Mem. Read =0 A L U Data Memory Address Data_out Data_in 0 32 m 32 u x 1 1 1 Reg. Dst Reg. Write =x =0 Ext. Op =x ALUSrc ALUCtrl J = 1 =x =x Upper 4 bits are from the incremented PC Mem. Read, Mem. Write & Reg. Write are 0 PCSrc = 1 to select jump target address We don’t care about Reg. Dst, Ext. Op, ALUSrc, ALUCtrl, and Memto. Reg Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 35

Controlling the Execution of Branch Target Address 30 30 30 Next PC Imm 26

Controlling the Execution of Branch Target Address 30 30 30 Next PC Imm 26 PCSrc =1 00 PC 0 m u x +1 30 Imm 16 Instruction Memory Instruction Rs 5 32 Rt 5 Address 1 RB m u Rd x RW Ext Bus. B Bus. W 0 m u x Mem. Write =0 ALU result Memto. Reg =x zero Bus. A Registers 0 5 Either Beq or Bne =1 RA Mem. Read =0 A L U Data Memory Address Data_out Data_in 0 32 m 32 u x 1 1 1 Reg. Dst Reg. Write =x =0 Ext. Op =x ALUSrc ALUCtrl Beq = 1 =0 = SUB Bne = 1 Next PC outputs branch target address ALUSrc = ‘ 0’ (2 nd ALU input is Bus. B) ALUCtrl = ‘SUB’ produces zero flag Next PC logic determines PCSrc according to zero flag Mem. Read = Mem. Write = Reg. Write = 0 Reg. Dst = Ext. Op = Memto. Reg = x Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 36

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an Adequate Datapath v Controlling the Execution of Instructions v The Main Controller and ALU Controller v Drawback of the single-cycle processor design Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 37

Main Control and ALU Control Op 6 Input: ALUCtrl Main Control ² 6 -bit

Main Control and ALU Control Op 6 Input: ALUCtrl Main Control ² 6 -bit opcode field from instruction ALUOp ALU Control Input: ² 6 -bit function field from instruction Output: ² ALUOp from main control ² 10 control signals for datapath Output: ² ALUOp for ALU Control Single Cycle Processor Design A L U funct 6 J Bne Beq Mem. Write Memto. Reg Mem. Read Reg. Dst Address ALUSrc Datapath 32 Ext. Op Instruction Reg. Write Instruction Memory ² ALUCtrl signal for ALU ICS 233 – KFUPM © Muhamed Mudawar slide 38

Single-Cycle Datapath + Control 30 Jump or Branch Target Address 30 30 Next PC

Single-Cycle Datapath + Control 30 Jump or Branch Target Address 30 30 Next PC Imm 26 +1 PC 0 m u x 00 PCSrc 30 Imm 16 Instruction Memory Instruction Rs 5 32 Rt 5 Address Registers RB 1 5 0 Bus. B 0 m u Rd x Ext RW m u x Bus. W ALU result zero Bus. A RA J, Beq, Bne A L U Data Memory Address 0 32 Data_out Data_in m 32 u x 1 1 1 Reg. Dst Reg. Write Ext. Op ALUSrc ALUCtrl func Op ALU Ctrl ALUOp Mem. Read Mem. Write Memto. Reg Main Control Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 39

Main Control Signals Signal Effect when ‘ 0’ Effect when ‘ 1’ Reg. Dst

Main Control Signals Signal Effect when ‘ 0’ Effect when ‘ 1’ Reg. Dst Destination register = Rd Reg. Write None Destination register is written with the data value on Bus. W Ext. Op 16 -bit immediate is zero-extended 16 -bit immediate is sign-extended ALUSrc Second ALU operand comes from the Second ALU operand comes from second register file output (Bus. B) the extended 16 -bit immediate Mem. Read None Data memory is read Data_out ← Memory[address] Mem. Write None Data memory is written Memory[address] ← Data_in Memto. Reg Bus. W = ALU result Bus. W = Data_out from Memory Beq, Bne PC ← PC + 4 PC ← Branch target address If branch is taken J PC ← PC + 4 PC ← Jump target address ALUOp This multi-bit signal specifies the ALU operation as a function of the opcode Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 40

Main Control Signal Values Op Reg Dst Reg Write Ext Op 1 x R-type

Main Control Signal Values Op Reg Dst Reg Write Ext Op 1 x R-type 1 = Rd ALU Src ALU Op Beq Bne 0=Bus. B R-type J Mem Read Mem Write Mem to. Reg 0 0 0 addi 0 = Rt 1 1=sign 1=Imm ADD 0 0 0 slti 0 = Rt 1 1=sign 1=Imm SLT 0 0 0 andi 0 = Rt 1 0=zero 1=Imm AND 0 0 0 ori 0 = Rt 1 0=zero 1=Imm OR 0 0 0 xori 0 = Rt 1 0=zero 1=Imm XOR 0 0 0 lw 0 = Rt 1 1=sign 1=Imm ADD 0 0 0 1 sw x 0 1=sign 1=Imm ADD 0 0 1 x beq x 0=Bus. B SUB 1 0 0 x bne x 0=Bus. B SUB 0 1 0 0 0 x j x 0 x x x 0 0 1 0 0 x v X is a don’t care (can be 0 or 1), used to minimize logic Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 41

Logic Equations for Control Signals (R-type + beq + Mem. Read <= lw Mem.

Logic Equations for Control Signals (R-type + beq + Mem. Read <= lw Mem. Write <= sw Memto. Reg <= lw Single Cycle Processor Design Logic Equations ICS 233 – KFUPM Beq Bne J <= Memto. Reg ALUSrc bne) Mem. Write (andi + ori + xori) Mem. Read Ext. Op <= Decoder ALUSrc (sw + beq + bne Ext. Op <= Reg. Write + j) Op 6 R-type addi slti andi ori xori lw sw R-type Reg. Dst <= ALUop Reg. Dst © Muhamed Mudawar slide 42

ALU Control Truth Table Op 6 R-type R-type addi slti andi ori xori lw

ALU Control Truth Table Op 6 R-type R-type addi slti andi ori xori lw sw beq bne j ALU Control ALUOp funct 6 ALUCtrl R-type R-type ADD SLT AND OR XOR ADD SUB x Single Cycle Processor Design add sub and or xor slt x x x x x 4 -bit Encoding ADD SUB AND OR XOR SLT ADD SLT AND OR XOR ADD SUB x ICS 233 – KFUPM 0000 0010 0101 0110 1010 0000 1010 0101 0110 0000 0010 x The 4 -bit encoding for ALUctrl is chosen here to be equal to the last 4 bits of the function field Other binary encodings are also possible. The idea is to choose a binary encoding that will minimize the logic for ALU Control © Muhamed Mudawar slide 43

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an Adequate Datapath v Controlling the Execution of Instructions v The Main Controller and ALU Controller v Drawback of the single-cycle processor design Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 44

Drawbacks of Single Cycle Processor v Long cycle time ² All instructions take as

Drawbacks of Single Cycle Processor v Long cycle time ² All instructions take as much time as the slowest ALU Instruction Fetch Reg Read ALU Reg Write longest delay Load Instruction Fetch Reg Read ALU Memory Read Store Instruction Fetch Reg Read ALU Memory Write Branch Instruction Fetch Reg Read ALU Jump Instruction Fetch Reg Write Decode v Alternative Solution: Multicycle implementation ² Break down instruction execution into multiple cycles Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 45

Multicycle Implementation v Break instruction execution into five steps ² Instruction fetch ² Instruction

Multicycle Implementation v Break instruction execution into five steps ² Instruction fetch ² Instruction decode and register read ² Execution, memory address calculation, or branch completion ² Memory access or ALU instruction completion ² Load instruction completion v One step = One clock cycle (clock cycle is reduced) ² First 2 steps are the same for all instructions Instruction # cycles ALU & Store 4 Branch 3 Load 5 Jump 2 Single Cycle Processor Design Instruction ICS 233 – KFUPM # cycles © Muhamed Mudawar slide 46

Performance Example v Assume the following operation times for components: ² Instruction and data

Performance Example v Assume the following operation times for components: ² Instruction and data memories: 200 ps ² ALU and adders: 180 ps ² Decode and Register file access (read or write): 150 ps ² Ignore the delays in PC, mux, extender, and wires v Which of the following would be faster and by how much? ² Single-cycle implementation for all instructions ² Multicycle implementation optimized for every class of instructions v Assume the following instruction mix: ² 40% ALU, 20% Loads, 10% stores, 20% branches, & 10% jumps Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 47

Solution Instruction Class Instruction Memory Register Read ALU Operation Data Memory Register Write Total

Solution Instruction Class Instruction Memory Register Read ALU Operation Data Memory Register Write Total ALU 200 150 180 150 680 ps Load 200 150 180 200 150 880 ps Store 200 150 180 200 Branch 200 150 180 530 ps Jump 200 150 decode and update PC 350 ps 730 ps v For fixed single-cycle implementation: ² Clock cycle = 880 ps determined by longest delay (load instruction) v For multi-cycle implementation: ² Clock cycle = max (200, 150, 180) = 200 ps (maximum delay at any step) ² Average CPI = 0. 4× 4 + 0. 2× 5 + 0. 1× 4+ 0. 2× 3 + 0. 1× 2 = 3. 8 v Speedup = 880 ps / (3. 8 × 200 ps) = 880 / 760 = 1. 16 Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 48

Worst Case Timing (Load Instruction) Clk-to-q Old PC New PC Instruction Memory Access Time

Worst Case Timing (Load Instruction) Clk-to-q Old PC New PC Instruction Memory Access Time Old Instruction New Instruction = (Op, Rs, Rt, Rd, Funct, Imm 16, Imm 26) Delay Through Control Logic Old Control Signal Values New Control Signal Values (Ext. Op, ALUSrc, ALUOp, …) Register File Access Time Old Bus. A Value New Bus. A Value = Register(Rs) Delay Through Extender and ALU Mux Old Second ALU Input New Second ALU Input = sign-extend(Imm 16) ALU Delay New ALU Result = Address Old ALU Result Data Memory Access Time Old Data Memory Output Value New Value Mux delay + Setup time + Clock skew Write Occurs Clock Cycle Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 49

Worst Case Timing – Cont'd v Long cycle time: must be long enough for

Worst Case Timing – Cont'd v Long cycle time: must be long enough for Load operation PC’s Clk-to-Q + Instruction Memory’s Access Time + Maximum of ( Register File’s Access Time, Delay through control logic + extender + ALU mux) + ALU to Perform a 32 -bit Add + Data Memory Access Time + Delay through Memto. Reg Mux + Setup Time for Register File Write + Clock Skew v Cycle time is longer than needed for other instructions ² Therefore, single cycle processor design is not used in practice Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 50

Summary v 5 steps to design a processor ² Analyze instruction set => datapath

Summary v 5 steps to design a processor ² Analyze instruction set => datapath requirements ² Select datapath components & establish clocking methodology ² Assemble datapath meeting the requirements ² Analyze implementation of each instruction to determine control signals ² Assemble the control logic v MIPS makes Control easier ² Instructions are of same size ² Source registers always in same place ² Immediates are of same size and same location ² Operations are always on registers/immediates v Single cycle datapath => CPI=1, but Long Clock Cycle Single Cycle Processor Design ICS 233 – KFUPM © Muhamed Mudawar slide 51