Single Cycle Processor Design COE 301 Computer Organization

  • Slides: 53
Download presentation
Single Cycle Processor Design COE 301 Computer Organization ICS 233 Computer Architecture and Assembly

Single Cycle Processor Design COE 301 Computer Organization ICS 233 Computer Architecture and Assembly Language Dr. Marwan Abu-Amara College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals [Adapted from slides of Dr. M. Mudawar and Dr. A. El-Maleh, KFUPM]

Outline v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an

Outline v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an Adequate Datapath v Controlling the Execution of Instructions v The Main Controller, ALU Controller, PC control v Worst case timing Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 2

Designing a Processor: Step-by-Step v Analyze instruction set => datapath requirements ² The meaning

Designing a Processor: Step-by-Step v Analyze instruction set => datapath requirements ² The meaning of each instruction is given by the register transfers ² Datapath must include storage elements for ISA registers ² Datapath must support each register transfer v Select datapath components and clocking methodology v Assemble datapath meeting the requirements v Analyze implementation of each instruction ² Determine the setting of control signals for register transfer v Assemble the control logic Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 3

Review of MIPS Instruction Formats v All instructions are 32 -bit wide v Three

Review of MIPS Instruction Formats v All instructions are 32 -bit wide v Three instruction formats: R-type, I-type, and J-type Op 6 Rs 5 Rt 5 Op 6 Rd 5 sa 5 funct 6 immediate 16 immediate 26 ² Op 6: 6 -bit opcode of the instruction ² Rs 5, Rt 5, Rd 5: 5 -bit source and destination register numbers ² sa 5: 5 -bit shift amount used by shift instructions ² funct 6: 6 -bit function field for R-type instructions ² immediate 16: 16 -bit immediate value or address offset ² immediate 26: 26 -bit target address of the jump instruction Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 4

MIPS Subset of Instructions v Only a subset of the MIPS instructions are considered

MIPS Subset of Instructions v Only a subset of the MIPS instructions are considered ² ALU instructions (R-type): add, sub, and, or, xor, slt ² Immediate instructions (I-type): addi, slti, andi, ori, xori ² Load and Store (I-type): lw, sw ² Branch (I-type): beq, bne ² Jump (J-type): j v This subset does not include all the integer instructions v But sufficient to illustrate design of datapath and control v Concepts used to implement the MIPS subset are used to construct a broad spectrum of computers Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 5

Details of the MIPS Subset Instruction add sub and or xor slt addi slti

Details of the MIPS Subset Instruction add sub and or xor slt addi slti andi ori xori lw sw beq bne j Meaning rd, rs, rt addition rd, rs, rt subtraction rd, rs, rt bitwise and rd, rs, rt bitwise or rd, rs, rt exclusive or rd, rs, rt set on less than rt, rs, im 16 add immediate rt, rs, im 16 slt immediate rt, rs, im 16 and immediate rt, rs, im 16 or immediate rt, im 16 xor immediate rt, im 16(rs) load word rt, im 16(rs) store word rs, rt, im 16 branch if equal rs, rt, im 16 branch not equal im 26 jump Single Cycle Processor Design Format op 6 = 0 op 6 = 0 0 x 08 0 x 0 a 0 x 0 c 0 x 0 d 0 x 0 e 0 x 23 0 x 2 b 0 x 04 0 x 05 0 x 02 COE 301 Computer Organization – KFUPM rs 5 rs 5 rs 5 rs 5 rt 5 rt 5 rt 5 rt 5 rd 5 rd 5 im 26 © Muhamed Mudawar slide 6 0 0 0 im 16 im 16 im 16 0 x 20 0 x 22 0 x 24 0 x 25 0 x 26 0 x 2 a

Register Transfer Level (RTL) v RTL is a description of data flow between registers

Register Transfer Level (RTL) v RTL is a description of data flow between registers v RTL gives a meaning to the instructions v All instructions are fetched from memory at address PC Instruction RTL Description ADD Reg(Rd) ← Reg(Rs) + Reg(Rt); PC ← PC + 4 SUB Reg(Rd) ← Reg(Rs) – Reg(Rt); PC ← PC + 4 ORI Reg(Rt) ← Reg(Rs) | zero_ext(Im 16); PC ← PC + 4 LW Reg(Rt) ← MEM[Reg(Rs) + sign_ext(Im 16)]; PC ← PC + 4 SW MEM[Reg(Rs) + sign_ext(Im 16)] ← Reg(Rt); PC ← PC + 4 BEQ if (Reg(Rs) == Reg(Rt)) PC ← PC + 4 × sign_extend(Im 16) else PC ← PC + 4 Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 7

Instructions are Executed in Steps v R-type Fetch instruction: Fetch operands: Execute operation: Write

Instructions are Executed in Steps v R-type Fetch instruction: Fetch operands: Execute operation: Write ALU result: Next PC address: Instruction ← MEM[PC] data 1 ← Reg(Rs), data 2 ← Reg(Rt) ALU_result ← func(data 1, data 2) Reg(Rd) ← ALU_result PC ← PC + 4 v I-type Fetch instruction: Fetch operands: Execute operation: Write ALU result: Next PC address: Instruction ← MEM[PC] data 1 ← Reg(Rs), data 2 ← Extend(imm 16) ALU_result ← op(data 1, data 2) Reg(Rt) ← ALU_result PC ← PC + 4 v BEQ Fetch instruction: Fetch operands: Equality: Branch: Instruction ← MEM[PC] data 1 ← Reg(Rs), data 2 ← Reg(Rt) zero ← subtract(data 1, data 2) if (zero) PC ← PC + 4×sign_ext(imm 16) else PC ← PC + 4 Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 8

Instruction Execution – cont’d v LW Fetch instruction: Fetch base register: Calculate address: Read

Instruction Execution – cont’d v LW Fetch instruction: Fetch base register: Calculate address: Read memory: Write register Rt: Next PC address: Instruction ← MEM[PC] base ← Reg(Rs) address ← base + sign_extend(imm 16) data ← MEM[address] Reg(Rt) ← data PC ← PC + 4 v SW Fetch instruction: Fetch registers: Calculate address: Write memory: Next PC address: Instruction ← MEM[PC] base ← Reg(Rs), data ← Reg(Rt) address ← base + sign_extend(imm 16) MEM[address] ← data PC ← PC + 4 v Jump Fetch instruction: Target PC address: Jump: Single Cycle Processor Design concatenation Instruction ← MEM[PC] target ← PC[31: 28] , Imm 26 , ‘ 00’ PC ← target COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 9

Requirements of the Instruction Set v Memory ² Instruction memory where instructions are stored

Requirements of the Instruction Set v Memory ² Instruction memory where instructions are stored ² Data memory where data is stored v Registers ² 32 × 32 -bit general purpose registers, R 0 is always zero ² Read source register Rs ² Read source register Rt ² Write destination register Rt or Rd v Program counter PC register and Adder to increment PC v Sign and Zero extender for immediate constant v ALU for executing instructions Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 10

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an Adequate Datapath v Controlling the Execution of Instructions v The Main Controller, ALU Controller, PC control v Worst case timing Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 11

Components of the Datapath v Combinational Elements ² ALU, Adder 32 0 16 Extend

Components of the Datapath v Combinational Elements ² ALU, Adder 32 0 16 Extend m u x 32 ² Immediate extender Ext. Op ² Multiplexers v Storage Elements select ² Instruction memory PC Instruction 32 ² Data memory 32 32 clk ² Register file v Clocking methodology ² Timing of reads and writes COE 301 Computer Organization – KFUPM 32 Address Registers overflow Data Memory Bus. A RB Bus. B 5 Data_in 32 5 Bus. W clk Reg. Write 32 Data_out 32 32 RA RW ALU result Address clk 5 zero 32 ALU control 32 Instruction Memory ² PC register Single Cycle Processor Design 32 1 A L U 32 © Muhamed Mudawar slide 12 Mem Read Mem Write

Register Element v Register Data_In n bits ² Similar to the D-type Flip-Flop v

Register Element v Register Data_In n bits ² Similar to the D-type Flip-Flop v n-bit input and output Clock Register WE Write Enable n bits v Write Enable (WE): ² Enable / disable writing of register Data_Out ² Negated (0): Data_Out will not change ² Asserted (1): Data_Out will become Data_In after clock edge v Edge triggered Clocking ² Register output is modified at clock edge Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 13

MIPS Register File RW RA RB v Register File consists of 32 × 32

MIPS Register File RW RA RB v Register File consists of 32 × 32 -bit registers ² Bus. A and Bus. B: 32 -bit output busses for reading 2 registers ² Bus. W: 32 -bit input bus for writing a register when Reg. Write is 1 ² Two registers read and one written in a cycle v Registers are selected by: 5 RA ² RA selects register to be read on Bus. A ² RB selects register to be read on Bus. B Register File Bus. A 32 5 RB 5 RW ² RW selects the register to be written v Clock input 32 Bus. B Clock Bus. W Reg. Write 32 ² The clock input is used ONLY during write operation ² During read, register file behaves as a combinational logic block § RA or RB valid => Bus. A or Bus. B valid after access time Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 14

Tri-State Buffers v Allow multiple sources to drive a single bus v Two Inputs:

Tri-State Buffers v Allow multiple sources to drive a single bus v Two Inputs: Enable ² Data_in ² Enable (to enable output) Data_in Data_out v One Output (Data_out): ² If (Enable) Data_out = Data_in else Data_out = High Impedance state (output is disconnected) v Tri-state buffers can be used to build multiplexors Data_0 Output Data_1 Select Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 15

Details of the Register File RA 5 Decoder 32 "0" RB 5 Decoder Tri-state

Details of the Register File RA 5 Decoder 32 "0" RB 5 Decoder Tri-state buffer 32 R 0 is not used WE "0" R 1 RW 5 Decoder 32 Bus. W . . . 32 WE 32 R 2 . . . 32 32 Bus. A 32 WE R 31 32 Reg. Write Clock Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 16 Bus. B

Shift Operation Building a Multifunction ALU None = 00 SLL = 01 SRL =

Shift Operation Building a Multifunction ALU None = 00 SLL = 01 SRL = 10 SRA = 11 2 SLT: ALU does a SUB and check the sign and overflow 32 Shift Amount Shifter lsb 5 Arithmetic Operation A B c 0 32 32 ADD = 0 SUB = 1 32 A d d e r 0 Logical Operation 1 Single Cycle Processor Design 2 3 2 COE 301 Computer Organization – KFUPM ALU Result 2 32 1 0 overflow Logic Unit AND = 00 OR = 01 NOR = 10 XOR = 11 sign 3 2 ALU Selection Shift = 11 SLT = 10 Arith = 01 Logic = 00 © Muhamed Mudawar slide 17 zero

Instruction and Data Memories v Instruction memory needs only provide read access ² Because

Instruction and Data Memories v Instruction memory needs only provide read access ² Because datapath does not write instructions ² Behaves as combinational logic for read 32 v Data Memory is used for load and store ² Mem. Read: enables output on Data_out ² Mem. Write: enables writing of Data_in Data Memory 32 32 § The Clock synchronizes the write operation v Separate instruction and data memories ² Later, we will replace them with caches COE 301 Computer Organization – KFUPM 32 Address Data_out § Address selects the memory word to be written Single Cycle Processor Design 32 Instruction Memory ² Address selects Instruction after access time § Address selects the word to put on Data_out Address Instruction © Muhamed Mudawar slide 18 Data_in Clock Mem. Read Mem. Write

Clocking Methodology v Clocks are needed in a sequential v We assume edgelogic to

Clocking Methodology v Clocks are needed in a sequential v We assume edgelogic to decide when a state element triggered clocking (register) should be updated v All state changes Combinational logic clock rising edge Single Cycle Processor Design falling edge COE 301 Computer Organization – KFUPM Register 2 Register 1 occur on the same v To ensure correctness, a clocking clock edge methodology defines when data can v Data must be valid be written and read and stable before arrival of clock edge v Edge-triggered clocking allows a register to be read and written during same clock cycle © Muhamed Mudawar slide 19

Determining the Clock Cycle Register 2 Register 1 v With edge-triggered clocking, the clock

Determining the Clock Cycle Register 2 Register 1 v With edge-triggered clocking, the clock cycle must be long enough to accommodate the path from one register through the combinational logic to another register Combinational logic writing edge Tmax_comb Ts v Th: hold time that input to a Th register must hold after arrival of clock edge Tcycle ≥ Tclk-q + Tmax_comb + Ts Single Cycle Processor Design COE 301 Computer Organization – KFUPM v Tmax_comb : longest delay through combinational logic v Ts : setup time that input to a register must be stable before arrival of clock edge clock Tclk-q v Tclk-q : clock to output delay through register v Hold time (Th) is normally satisfied since Tclk-q > Th © Muhamed Mudawar slide 20

Clock Skew v Clock skew arises because the clock signal uses different paths with

Clock Skew v Clock skew arises because the clock signal uses different paths with slightly different delays to reach state elements v Clock skew is the difference in absolute time between when two storage elements see a clock edge v With a clock skew, the clock cycle time is increased Tcycle ≥ Tclk-q + Tmax_combinational + Tsetup+ Tskew v Clock skew is reduced by balancing the clock delays Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 21

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an Adequate Datapath v Controlling the Execution of Instructions v The Main Controller, ALU Controller, PC control v Worst case timing Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 22

Instruction Fetching Datapath v We can now assemble the datapath from its components v

Instruction Fetching Datapath v We can now assemble the datapath from its components v For instruction fetching, we need … ² Program Counter (PC) register ² Instruction Memory Improved datapath increments upper 30 bits of PC by 1 ² Adder for incrementing PC The least significant 2 bits of the PC are ‘ 00’ since PC is a multiple of 4 A d d 32 PC 00 32 clk Instruction 32 Address Instruction Memory Single Cycle Processor Design next PC 32 Datapath does not handle branch or jump instructions COE 301 Computer Organization – KFUPM Improved Datapath +1 30 00 4 30 32 PC next PC clk © Muhamed Mudawar slide 23 Instruction Address Instruction Memory 32

Datapath for R-type Instructions Op 6 Rs 5 Rt 5 Rd 5 sa 5

Datapath for R-type Instructions Op 6 Rs 5 Rt 5 Rd 5 sa 5 funct 6 Reg. Wr ALUOp +1 00 30 Instruction Memory 30 Instruction Registers Rs 5 32 32 PC Address Rt 5 Rd 5 RA RB RW 32 Bus. A Bus. B 32 A L U 32 Bus. W ALU result clk Rs and Rt fields select two registers to read. Rd field selects register to write v Control signals Bus. A & Bus. B provide data input to ALU result is connected to Bus. W Same clock updates PC and Rd register ² ALUOp is the ALU operation as defined in the funct field for R-type § Recall: Op = 0 for all R-type ² Reg. Wr is used to enable the writing of the ALU result Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 24

Datapath for I-type ALU Instructions Op 6 Rs 5 Rt 5 immediate 16 Reg.

Datapath for I-type ALU Instructions Op 6 Rs 5 Rt 5 immediate 16 Reg. Wr ALUOp +1 00 30 30 Instruction Memory Instruction Registers Rs 5 32 5 PC 32 Address Rt 5 clk 32 RB Bus. B 32 A L U 32 ALU result Bus. W Same clock edge updates PC and Rt Ext. Op Rt selects register to write, not Rd v Control signals Bus. A RA RW 32 Imm 16 Extender Second ALU input comes from the extended immediate. RB and Bus. B are not used ² ALUOp is derived from the Op field for I-type instructions ² Reg. Wr is used to enable the writing of the ALU result ² Ext. Op controls the extension type (i. e. , 0 -ext or sign-ext) of the 16 -bit immediate Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 25

Combining R-type & I-type Datapaths Reg. Wr 00 30 30 Instruction Memory Instruction 32

Combining R-type & I-type Datapaths Reg. Wr 00 30 30 Instruction Memory Instruction 32 PC A mux selects RW as either Rt or Rd ALUOp +1 Address Registers Rs 5 32 Rt 5 0 1 clk Rd Reg. Dst Imm 16 Bus. A RA 32 RB RW 32 Bus. B Bus. W Ext. Op 0 A L U 32 1 32 ALUSrc ALU result Extender Another mux selects 2 nd ALU input as either data on Bus. B or the extended immediate v Control signals ² ALUOp is derived from either the Op or the funct field ² Reg. Wr enables the writing of the ALU result ² Ext. Op controls the extension type of the 16 -bit immediate ² Reg. Dst selects the register destination as either Rt or Rd ² ALUSrc selects the 2 nd ALU source as Bus. B or extended immediate Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 26

Controlling ALU Instructions Reg. Wr = 1 ALUOp +1 00 30 30 Registers Instruction

Controlling ALU Instructions Reg. Wr = 1 ALUOp +1 00 30 30 Registers Instruction Memory Rs 5 Instruction 32 Rt 32 PC Address Rd clk 5 0 Bus. A RA A L U 32 RB Bus. B RW 1 32 0 1 Bus. W ALUSrc = 0 Ext. Op Reg. Dst = 1 ALU result Extender Imm 16 32 For R-type ALU instructions, Reg. Dst is ‘ 1’ to select Rd on RW and ALUSrc is ‘ 0’ to select Bus. B as second ALU input. The active part of datapath is shown in green Reg. Wr = 1 ALUOp +1 00 30 30 Rs 5 Instruction 32 PC Registers Instruction Memory 32 Rt Address clk Rd 5 0 1 Imm 16 Single Cycle Processor Design Bus. A RA A L U 32 RB Bus. B RW Reg. Dst = 0 32 0 1 Bus. W Ext. Op Extender COE 301 Computer Organization – KFUPM 32 ALUSrc = 1 32 ALU result For I-type ALU instructions, Reg. Dst is ‘ 0’ to select Rt on RW and ALUSrc is ‘ 1’ to select Extended immediate as second ALU input. The active part of datapath is shown in green © Muhamed Mudawar slide 27

Details of the Extender v Two types of extensions ² Zero-extension for unsigned constants

Details of the Extender v Two types of extensions ² Zero-extension for unsigned constants ² Sign-extension for signed constants v Control signal Ext. Op indicates type of extension v Extender Implementation: wiring and one AND gate Ext. Op = 1 . . . Ext. Op = 0 Upper 16 = 0 Ext. Op Single Cycle Processor Design . . . Imm 16 COE 301 Computer Organization – KFUPM Upper 16 bits Upper 16 = sign bit Lower 16 bits © Muhamed Mudawar slide 28

Adding Data Memory to Datapath v A data memory is added for load and

Adding Data Memory to Datapath v A data memory is added for load and store instructions Ext. Op Imm 16 +1 00 30 30 Instruction Memory Instruction PC 32 Rs 5 32 Rt Address Rd 5 0 1 ALUOp 32 E Mem. Rd ALUSrc 32 Registers RB RW Reg. Dst clk Bus. B 0 Bus. W 1 Data Memory A L U 32 Address 0 32 32 Data_out 1 Data_in 32 Reg Wr ALU calculates data memory address v Additional Control signals ² Mem. Rd for load instructions A 3 rd mux selects data on Bus. W as either ALU result or memory data_out Bus. B is connected to Data_in of Data Memory for store instructions ² Mem. Wr for store instructions ² WBdata selects data on Bus. W as ALU result or Memory Data_out Single Cycle Processor Design WBdata ALU result Bus. A RA Mem. Wr COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 29

Controlling the Execution of Load Ext. Op = 1 Imm 16 +1 00 30

Controlling the Execution of Load Ext. Op = 1 Imm 16 +1 00 30 30 Instruction Memory PC 32 Address Rd Registers 5 RB 0 RW 1 Reg. Dst =0 clk Reg. Dst = ‘ 0’ selects Rt as destination register Single Cycle Processor Design Bus. W 0 WBdata =1 Data Memory 0 32 Address 32 32 Data_out 1 Data_in 1 32 Reg. Wr =1 Reg. Wr = ‘ 1’ to enable writing of register file ALUSrc = ‘ 1’ selects extended immediate as second ALU input Mem. Rd = ‘ 1’ to read data memory Bus. B A L U Mem. Wr =0 ALU result Bus. A RA 32 Rt Mem. Rd =1 32 Rs 5 Instruction 32 E ALUSrc =1 ALUOp = ADD Ext. Op = 1 to sign-extend Immmediate 16 to 32 bits ALUOp = ‘ADD’ to calculate data memory address as Reg(Rs) + sign-extend(Imm 16) WBdata = ‘ 1’ places the data read from memory on Bus. W COE 301 Computer Organization – KFUPM Clock edge updates PC and Register Rt © Muhamed Mudawar slide 30

Controlling the Execution of Store Ext. Op = 1 Imm 16 +1 00 30

Controlling the Execution of Store Ext. Op = 1 Imm 16 +1 00 30 30 Instruction Memory PC 32 Address Rd Registers 5 RB 0 RW 1 Reg. Dst =X clk Reg. Dst = ‘X’ because no register is written Single Cycle Processor Design Bus. B 0 Bus. W 1 A L U 32 Mem. Wr =1 ALU result WBdata =X Data Memory 0 Address 32 32 Data_out 1 Data_in 32 Reg. Wr =0 Reg. Wr = ‘ 0’ to disable writing of register file ALUSrc = ‘ 1’ selects extended immediate as second ALU input Mem. Wr = ‘ 1’ to write data memory Bus. A RA 32 Rt Mem. Rd =0 32 Rs 5 Instruction 32 E ALUSrc =1 ALUOp = ADD Ext. Op = 1 to sign-extend Immmediate 16 to 32 bits ALUOp = ‘ADD’ to calculate data memory address as Reg(Rs) + sign-extend(Imm 16) WBdata = ‘X’ because don’t care what data is put on Bus. W COE 301 Computer Organization – KFUPM Clock edge updates PC and Data Memory © Muhamed Mudawar slide 31

Adding Jump and Branch to Datapath PCSrc 2 1 Adding a mux at the

Adding Jump and Branch to Datapath PCSrc 2 1 Adding a mux at the PC input Branch Target Address Jump Target = PC[31: 28] ‖ Imm 26 Ext. Op 0 Next PC Address Imm 16 PC 00 +1 Instruction Memory Rs Instruction Rd Zero Bus. A RA Registers Rt Address + E New adder for computing branch target address RB 1 0 1 Bus. B RW A L U 0 ALU result Data Memory Address Data_out 0 1 Data_in Bus. W clk Op Reg Dst Reg Wr ALU Src ALU Op Mem Wr Rd v Additional Control Signals ² PCSrc for PC control: 1 for a jump and 2 for a taken branch ² Zero flag for branch control: whether branch is taken or not Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 32 WB data

Controlling the Execution of a Jump PCSrc =1 2 1 Branch Target Address Ext.

Controlling the Execution of a Jump PCSrc =1 2 1 Branch Target Address Ext. Op = X 0 Next PC Address Imm 16 00 +1 PC If (Opcode == J) then PCSrc = 1 (Jump Target) Jump Target = PC[31: 28] ‖ Imm 26 Instruction Memory Rs Instruction Rd Zero = X Bus. A RA Registers Rt Address + E RB 1 0 1 Bus. B RW A L U 0 ALU result Data Memory Address 0 Data_out 1 Data_in Bus. W clk Op =J Reg Dst =X Reg Wr =0 ALU Src =X ALU Op =X Mem Wr Rd =0 =0 WB data =X Mem. Rd = Mem. Wr = Reg. Wr = 0, Don't care about other control signals Clock edge updates PC register only Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 33

Controlling the Execution of a Branch PCSrc =2 2 1 Branch Target Address Ext.

Controlling the Execution of a Branch PCSrc =2 2 1 Branch Target Address Ext. Op = 1 0 Next PC Address Imm 16 00 +1 PC If (Opcode == BEQ && Zero == 1) then PCSrc = 2 (Branch Target) else PCSrc = 0 (Next PC) Jump Target = PC[31: 28] ‖ Imm 26 Instruction Memory Rs Instruction Rd Zero = 1 Bus. A RA Registers Rt Address + E RB 1 0 1 Bus. B RW A L U 0 ALU result Data Memory Address Data_out 0 1 Data_in Bus. W clk Op BEQ Reg Dst =X Reg Wr =0 ALU Src =0 ALU Op = SUB Mem Wr Rd =0 =0 WB data =X ALUSrc = 0, ALUOp = SUB, Ext. Op = 1, Mem. Rd = Mem. Wr = Reg. Wr = 0 Clock edge updates PC register only Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 34

Adding Jump & Branch (Design # 2) 30 Jump or Branch Target Address 30

Adding Jump & Branch (Design # 2) 30 Jump or Branch Target Address 30 30 Next PC Imm 26 +1 PC 0 m u x 00 PCSrc 30 Imm 16 Instruction Memory Instruction Rs 5 32 Rt 5 Address 1 Registers RB 5 RW Ext 0 Bus. B m u x Bus. W Mem. Write ALU result Memto. Reg zero Data Memory Bus. A RA 0 m u Rd x Mem. Read A L U Address Data_out Data_in 0 32 m 32 u x 1 1 1 Reg. Dst Reg. Write ALUSrc ALUCtrl v Additional Control Signals ² J, Beq, Bne for jump and branch instructions ² Zero condition of the ALU is examined ² PCSrc = 1 for Jump & taken Branch Single Cycle Processor Design COE 301 Computer Organization – KFUPM J, Beq, Bne “Next PC” computes jump or branch target instruction address For Branch, ALU does a subtraction © Muhamed Mudawar slide 35

Controlling Exec. of Jump (# 2) 30 Jump Target Address 30 30 Next PC

Controlling Exec. of Jump (# 2) 30 Jump Target Address 30 30 Next PC Imm 26 PCSrc =1 00 PC 0 m u x +1 30 Imm 16 Instruction Memory Instruction Rs 5 32 Rt 5 Address 5 J = 1 selects Imm 26 as jump target address RB RW Ext 0 Bus. B m u x Bus. W Mem. Write =0 ALU result Memto. Reg =x zero Data Memory Bus. A Registers 0 m u Rd x 1 RA Mem. Read =0 A L U Address Data_out Data_in 0 32 m 32 u x 1 1 1 Reg. Dst Reg. Write =0 =x Ext. Op =x ALUSrc ALUCtrl J = 1 =x =x Upper 4 bits are from the incremented PC Mem. Read, Mem. Write & Reg. Write are 0 PCSrc = 1 to select jump target address We don’t care about Reg. Dst, Ext. Op, ALUSrc, ALUCtrl, and Memto. Reg Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 36

Controlling Exec. of Branch (# 2) Branch Target Address 30 30 30 Next PC

Controlling Exec. of Branch (# 2) Branch Target Address 30 30 30 Next PC Imm 26 PCSrc =1 00 PC 0 m u x +1 30 Imm 16 Instruction Memory Instruction Rs 5 32 Rt 5 Address 5 Either Beq or Bne =1 RB RW Ext Bus. B Bus. W 0 m u x Mem. Write =0 ALU result Memto. Reg =x zero Data Memory Bus. A Registers 0 m u Rd x 1 RA Mem. Read =0 A L U Address Data_out Data_in 0 32 m 32 u x 1 1 1 Reg. Dst Reg. Write =0 =x Ext. Op =x ALUSrc ALUCtrl Beq = 1 or =0 = SUB Bne = 1 Next PC outputs branch target address ALUSrc = ‘ 0’ (2 nd ALU input is Bus. B) ALUCtrl = ‘SUB’ produces zero flag Next PC logic determines PCSrc according to zero flag Mem. Read = Mem. Write = Reg. Write = 0 Reg. Dst = Ext. Op = Memto. Reg = x Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 37

Details of “Next PC” (Design # 2) Branch or Jump Target Address PCSrc 30

Details of “Next PC” (Design # 2) Branch or Jump Target Address PCSrc 30 Inc PC 30 Sign-Extension: Most-significant bit is replicated A D D 30 0 m 30 u x SE Imm 16 Bne msb 4 Imm 26 Beq 1 26 J Zero § Considered as part of the “Control” path § Imm 16 is sign-extended to 30 bits § Jump target address: upper 4 bits of PC are concatenated with Imm 26 PCSrc = J + (Beq. Zero) + (Bne. Zero) Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 38

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an Adequate Datapath v Controlling the Execution of Instructions v The Main Controller, ALU Controller, PC control v Worst case timing Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 39

Single-Cycle Datapath + Control Branch Target Address Jump Target = PC[31: 28] ‖ Imm

Single-Cycle Datapath + Control Branch Target Address Jump Target = PC[31: 28] ‖ Imm 26 Ext. Op Next PC Address Imm 16 +1 1 00 Rs Instruction Rd Bus. A RA Registers Rt Address 2 ALU result Zero Instruction Memory PC 0 + Ext RB 1 0 Bus. B RW 1 A L U 0 Data Memory Address 0 Data_out 1 Data_in Bus. W PCSrc clk func PC Ctrl Zero Single Cycle Processor Design Op Reg. Dst Reg. Wr ALUop Mem. Rd ALU Ctrl Ext. Op ALUSrc Main Control COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 40 Mem. Wr WBdata

Main Control Signals Signal Effect when ‘ 0’ Effect when ‘ 1’ Reg. Dst

Main Control Signals Signal Effect when ‘ 0’ Effect when ‘ 1’ Reg. Dst Destination register = Rd Reg. Wr No register is written Destination register (Rt or Rd) is written with the data on Bus. W Ext. Op 16 -bit immediate is zero-extended 16 -bit immediate is sign-extended ALUSrc Second ALU operand is the value of register Rt that appears on Bus. B Second ALU operand is the value of the extended 16 -bit immediate Mem. Rd Data memory is NOT read Data memory is read Data_out ← Memory[address] Mem. Wr Data Memory is NOT written Data memory is written Memory[address] ← Data_in WBdata Bus. W = ALU result Bus. W = Data_out from Memory Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 41

Main Control Truth Table Op Reg. Dst Reg. Wr Ext. Op ALUSrc Mem. Rd

Main Control Truth Table Op Reg. Dst Reg. Wr Ext. Op ALUSrc Mem. Rd Mem. Wr WBdata R-type 1 = Rd 1 X 0 = Bus. B 0 0 0 = ALU ADDI 0 = Rt 1 1 = sign 1 = Imm 0 0 0 = ALU SLTI 0 = Rt 1 1 = sign 1 = Imm 0 0 0 = ALU ANDI 0 = Rt 1 0 = zero 1 = Imm 0 0 0 = ALU ORI 0 = Rt 1 0 = zero 1 = Imm 0 0 0 = ALU XORI 0 = Rt 1 0 = zero 1 = Imm 0 0 0 = ALU LW 0 = Rt 1 1 = sign 1 = Imm 1 0 1 = Mem SW X 0 1 = sign 1 = Imm 0 1 X BEQ X 0 1 = sign 0 = Bus. B 0 0 X BNE X 0 1 = sign 0 = Bus. B 0 0 X J X 0 X X 0 0 X X is a don’t care (can be 0 or 1), used to minimize logic Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 42

Logic Equations for Main Control Signals = R-type Reg. Write = (SW + BEQ

Logic Equations for Main Control Signals = R-type Reg. Write = (SW + BEQ + BNE + J) COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 43 BEQ BNE J SW LW Mem. Wr WBdata = LW XORI WBdata Mem. Rd = SW ORI Mem. Wr ANDI = LW SLTI Mem. Rd Logic Equations Ext. Op = (R-type + BEQ + BNE) Reg. Wr ALUSrc ADDI R-type = (ANDI + ORI + XORI) Reg. Dst Ext. Op Single Cycle Processor Design Decoder ALUSrc Reg. Dst Op 6

ALU Control Design Truth Table Op funct ALUop Code R-type R-type ADDI SLTI ANDI

ALU Control Design Truth Table Op funct ALUop Code R-type R-type ADDI SLTI ANDI ORI XORI LW SW BEQ BNE J AND OR XOR ADD SUB SLT X X X X X AND OR XOR ADD SUB SLT ADD SLT AND OR XOR ADD SUB X 0001 0010 0011 0100 0101 0110 0100 0110 0001 0010 0011 0100 0101 X Single Cycle Processor Design COE 301 Computer Organization – KFUPM funct Op ALU Ctrl ALUop The 4 -bit ALUop code defines the binary ALU operations. Can use ROM to generate ALUop code. (What’s the ROM size? ) © Muhamed Mudawar slide 44

ALU Control Design # 2 A c 0 32 B 32 32 3 A

ALU Control Design # 2 A c 0 32 B 32 32 3 A d d e r sign 32 2 ALU Result 1 0 overflow zero 0 Logical Unit ALU funct 1 2 3 2 ADD = X 0 SUB = X 1 AND = 00 OR = 01 XOR = 10 Single Cycle Processor Design 2 ALU Selection COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 45 SLT = 10 Arith = 01 Logic = 00

ALU Control Design # 2 (Contd. ) Instr Op funct ADD SUB AND OR

ALU Control Design # 2 (Contd. ) Instr Op funct ADD SUB AND OR XOR SLT ADDI SLTI ANDI ORI XORI LW SW BEQ BNE J 0 0 0 0 x 08 0 x 0 A 0 x 0 C 0 x 0 D 0 x 0 E 0 x 23 0 x 2 B 0 x 04 0 x 05 0 x 02 0 x 20 0 x 22 0 x 24 0 x 25 0 x 26 0 x 2 A X X X X X Single Cycle Processor Design ALUop (ALUfunct, ALUSelect) COE 301 Computer Organization – KFUPM ADD SUB AND OR XOR SLT ADD SLT AND OR XOR ADD SUB (X 0, 01) (X 1, 01) (00, 00) (01, 00) (10, 00) (XX, 10) (X 0, 01) (XX, 10) (00, 00) (01, 00) (10, 00) (X 0, 01) (X 1, 01) X © Muhamed Mudawar slide 46

ALU Control Design # 2 (Contd. ) ADD SUB AND 6 64 decoder ADD

ALU Control Design # 2 (Contd. ) ADD SUB AND 6 64 decoder ADD 0 x 20 SUB 0 x 22 funct 6 AND 0 x 24 XOR 0 6 6 64 decoder 0 x 08, 0 x 23, 0 x 2 B 0 x 04, 0 x 05 0 x 0 C 0 x 0 D 0 x 0 E 0 x 0 A ADD 0 0 1 0 0 XOR SLT 0 0 1 0 0 0 1 ALUop (ALUfunct, ALUSelect) (x 0 , 0 1) (x 1 , 0 1) (0 0 , 0 0) (0 1 , 0 0) (1 0 , 0 0) (X X , 1 0) ADD 6 SLT 0 x 2 A Single Cycle Processor Design 0 1 0 0 OR 0 x 25 0 x 26 Op 6 1 0 0 0 OR SUB 6 OR 1 SUB XOR SLT AND OR XOR SLT COE 301 Computer Organization – KFUPM AND signal is not needed!!! © Muhamed Mudawar slide 47

PC Control Truth Table Op Zero flag PCSrc R-type X 0 = Increment PC

PC Control Truth Table Op Zero flag PCSrc R-type X 0 = Increment PC J X 1 = Jump Target Address BEQ 0 0 = Increment PC BEQ 1 2 = Branch Target Address BNE 0 2 = Branch Target Address BNE 1 0 = Increment PC Other than Jump or Branch X 0 = Increment PC The ALU Zero flag is used by BEQ and BNE instructions Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 48

PC Control Logic v The PC control logic can be described as follows: if

PC Control Logic v The PC control logic can be described as follows: if (Op == J) PCSrc = 1; else if ((Op == BEQ && Zero == 1) || (Op == BNE && Zero == 0)) PCSrc = 2; Op else PCSrc = 0; Decoder Branch = (BEQ. Zero) + (BNE. Zero) Branch = 1, Jump = 0 PCSrc = 2 Zero BEQ BNE J Branch = 0, Jump = 1 PCSrc = 1 Branch = 0, Jump = 0 PCSrc = 0 Branch Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 49 Jump

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v

Next. . . v Designing a Processor: Step-by-Step v Datapath Components and Clocking v Assembling an Adequate Datapath v Controlling the Execution of Instructions v The Main Controller, ALU Controller, PC control v Worst case timing Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 50

Worst Case Timing (Load Instruction) Clk-to-q Old PC New PC Instruction Memory Access Time

Worst Case Timing (Load Instruction) Clk-to-q Old PC New PC Instruction Memory Access Time Old Instruction New Instruction = (Op, Rs, Rt, Rd, Funct, Imm 16, Imm 26) Delay Through Control Logic Old Control Signal Values New Control Signal Values (Ext. Op, ALUSrc, ALUOp, …) Register File Access Time Old Bus. A Value New Bus. A Value = Register(Rs) Delay Through Extender and ALU Mux Old Second ALU Input New Second ALU Input = sign-extend(Imm 16) ALU Delay New ALU Result = Address Old ALU Result Data Memory Access Time Old Data Memory Output Value New Value Mux delay + Setup time + Clock skew Clock Cycle Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 51 Write Occurs

Worst Case Timing – Cont'd v Long cycle time: must be long enough for

Worst Case Timing – Cont'd v Long cycle time: must be long enough for Load operation PC’s Clk-to-Q + Instruction Memory’s Access Time + Maximum of ( Register File’s Access Time, Delay through control logic + extender + ALU mux) + ALU to Perform a 32 -bit Add + Data Memory Access Time + Delay through WBdata Mux + Setup Time for Register File Write + Clock Skew v Cycle time is longer than needed for other instructions Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 52

Summary v 5 steps to design a processor ² Analyze instruction set => datapath

Summary v 5 steps to design a processor ² Analyze instruction set => datapath requirements ² Select datapath components & establish clocking methodology ² Assemble datapath meeting the requirements ² Analyze implementation of each instruction to determine control signals ² Assemble the control logic v MIPS makes Control easier ² Instructions are of same size ² Source registers always in same place ² Immediates are of same size and same location ² Operations are always on registers/immediates Single Cycle Processor Design COE 301 Computer Organization – KFUPM © Muhamed Mudawar slide 53