361 Computer Architecture Lecture 8 Designing a Single

  • Slides: 39
Download presentation
361 Computer Architecture Lecture 8: Designing a Single Cycle Datapath 361 datapath. 1

361 Computer Architecture Lecture 8: Designing a Single Cycle Datapath 361 datapath. 1

Outline of Today’s Lecture ° Introduction ° Where are we with respect to the

Outline of Today’s Lecture ° Introduction ° Where are we with respect to the BIG picture? ° Questions and Administrative Matters ° The Steps of Designing a Processor ° Datapath and timing for Reg-Reg Operations ° Datapath for Logical Operations with Immediate ° Datapath for Load and Store Operations ° Datapath for Branch and Jump Operations 361 datapath. 2

The Big Picture: Where are We Now? ° The Five Classic Components of a

The Big Picture: Where are We Now? ° The Five Classic Components of a Computer Processor Input Control Datapath Memory Output ° Today’s Topic: Design a Single Cycle Processor machine design inst. set design 361 datapath. 3 Arithmetic technology

The Big Picture: The Performance Perspective ° Performance of a machine is determined by:

The Big Picture: The Performance Perspective ° Performance of a machine is determined by: CPI • Instruction count • Clock cycle time • Clock cycles per instruction Inst. Count Cycle Time ° Processor design (datapath and control) will determine: • Clock cycle time • Clock cycles per instruction ° Today: • Single cycle processor: - Advantage: One clock cycle per instruction - Disadvantage: long cycle time 361 datapath. 4

How to Design a Processor: step-by-step ° 1. Analyze instruction set => datapath requirements

How to Design a Processor: step-by-step ° 1. Analyze instruction set => datapath requirements • the meaning of each instruction is given by the register transfers • datapath must include storage element for ISA registers - possibly more • datapath must support each register transfer ° 2. Select set of datapath components and establish clocking methodology ° 3. Assemble datapath meeting the requirements ° 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. ° 5. Assemble the control logic 361 datapath. 5

The MIPS Instruction Formats ° All MIPS instructions are 32 bits long. The three

The MIPS Instruction Formats ° All MIPS instructions are 32 bits long. The three instruction formats: 31 26 op • R-type rs 6 bits • I-type • J-type 31 21 op 6 bits 31 rt 5 bits 26 16 5 bits 21 rs 5 bits 11 6 rd shamt funct 5 bits 6 bits 16 rt 5 bits 0 immediate 16 bits 26 op 6 bits 0 target address 26 bits ° The different fields are: • op: operation of the instruction • rs, rt, rd: the source and destination register specifiers • shamt: shift amount • funct: selects the variant of the operation in the “op” field • address / immediate: address offset or immediate value • target address: target address of the jump instruction 361 datapath. 6 0

Step 1 a: The MIPS-lite Subset for today ° ADD and SUB • add.

Step 1 a: The MIPS-lite Subset for today ° ADD and SUB • add. U rd, rs, rt • sub. U rd, rs, rt 31 26 op 6 bits rs 5 bits 16 rt 5 bits 26 21 16 op 6 bits 31 26 ° LOAD and STORE Word op • lw rt, rs, imm 16 6 bits rs 5 bits 21 rs 5 bits rt 5 bits 16 rt 5 bits 21 16 ° OR Immediate: • ori rt, rs, imm 16 31 21 11 rd 5 bits 6 shamt 5 bits 0 funct 6 bits 0 immediate 16 bits • sw rt, rs, imm 16 ° BRANCH: • beq rs, rt, imm 16 361 datapath. 7 31 26 op 6 bits rs 5 bits rt 5 bits 0 immediate 16 bits

Logical Register Transfers ° RTL gives the meaning of the instructions ° All start

Logical Register Transfers ° RTL gives the meaning of the instructions ° All start by fetching the instruction op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm 16 = MEM[ PC ] inst Register Transfers ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4 SUBU R[rd] <– R[rs] – R[rt]; PC <– PC + 4 ORi R[rt] <– R[rs] + zero_ext(Imm 16); PC <– PC + 4 LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm 16)]; PC <– PC + 4 STORE MEM[ R[rs] + sign_ext(Imm 16) ] <– R[rt]; PC <– PC + 4 BEQ 361 datapath. 8 if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm 16)] || 00 else PC <– PC + 4

Step 1: Requirements of the Instruction Set ° Memory • instruction & data °

Step 1: Requirements of the Instruction Set ° Memory • instruction & data ° Registers (32 x 32) • read RS • read RT • Write RT or RD ° PC ° Extender ° Add and Sub register or extended immediate ° Add 4 or extended immediate to PC 361 datapath. 9

Step 2: Components of the Datapath ° Combinational Elements ° Storage Elements • Clocking

Step 2: Components of the Datapath ° Combinational Elements ° Storage Elements • Clocking methodology 361 datapath. 10

Combinational Logic Elements (Basic Building Blocks) ° Adder Carry. In A ° MUX Sum

Combinational Logic Elements (Basic Building Blocks) ° Adder Carry. In A ° MUX Sum Carry 32 MUX 32 Y 32 O P A 32 32 ALU B 361 datapath. 11 32 Selec t A 32 B ° ALU Adder B 32 32 Result

Storage Element: Register (Basic Building Block) ° Register Write Enable • Similar to the

Storage Element: Register (Basic Building Block) ° Register Write Enable • Similar to the D Flip Flop except Data In Data Out - N-bit input and output N N - Write Enable input • Write Enable: Clk - negated (0): Data Out will not change - asserted (1): Data Out will become Data In 361 datapath. 12

Storage Element: Register File ° Register File consists of 32 registers: RWRARB Write Enable

Storage Element: Register File ° Register File consists of 32 registers: RWRARB Write Enable 5 5 5 • Two 32 -bit output busses: bus. A and bus. B bus. W 32 32 32 -bit • One 32 -bit input bus: bus. W 32 Registers bus. B Clk ° Register is selected by: 32 • RA (number) selects the register to put on bus. A (data) • RB (number) selects the register to put on bus. B (data) • RW (number) selects the register to be written via bus. W (data) when Write Enable is 1 ° Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block: - RA or RB valid => bus. A or bus. B valid after “access time. ” 361 datapath. 13

Storage Element: Idealized Memory ° Memory (idealized) • One input bus: Data In •

Storage Element: Idealized Memory ° Memory (idealized) • One input bus: Data In • One output bus: Data Out Write Enable Address Data In 32 Clk Data. Out 32 ° Memory word is selected by: • Address selects the word to put on Data Out • Write Enable = 1: address selects the memory word to be written via the Data In bus ° Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block: - Address valid => Data Out valid after “access time. ” 361 datapath. 14

Clocking Methodology Clk Setup Hold . . . Don’t Care . . . °

Clocking Methodology Clk Setup Hold . . . Don’t Care . . . ° All storage elements are clocked by the same clock edge ° Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew 361 datapath. 15

Step 3 ° Register Transfer Requirements –> Datapath Assembly ° Instruction Fetch ° Read

Step 3 ° Register Transfer Requirements –> Datapath Assembly ° Instruction Fetch ° Read Operands and Execute Operation 361 datapath. 17

3 a: Overview of the Instruction Fetch Unit ° The common RTL operations •

3 a: Overview of the Instruction Fetch Unit ° The common RTL operations • Fetch the Instruction: mem[PC] • Update the program counter: - Sequential Code: PC <- PC + 4 - Branch and Jump: PC <- “something else” Clk PC Next Address Logic Address Instruction Memory 361 datapath. 18 Instruction Word 32

RTL: The ADD Instruction 31 26 op 6 bits ° add 21 rs 5

RTL: The ADD Instruction 31 26 op 6 bits ° add 21 rs 5 bits 16 rt 5 bits 11 rd 5 bits 6 shamt 5 bits 0 funct 6 bits rd, rs, rt • mem[PC] Fetch the instruction from memory • R[rd] <- R[rs] + R[rt] The actual operation • PC <- PC + 4 Calculate the next instruction’s address 361 datapath. 19

RTL: The Subtract Instruction 31 26 op 6 bits ° sub 21 rs 5

RTL: The Subtract Instruction 31 26 op 6 bits ° sub 21 rs 5 bits 16 rt 5 bits 11 rd 5 bits 6 shamt 5 bits 0 funct 6 bits rd, rs, rt • mem[PC] Fetch the instruction from memory • R[rd] <- R[rs] - R[rt] The actual operation • PC <- PC + 4 Calculate the next instruction’s address 361 datapath. 20

3 b: Add & Subtract ° R[rd] <- R[rs] op R[rt] Example: add. U

3 b: Add & Subtract ° R[rd] <- R[rs] op R[rt] Example: add. U rd, rs, rt • Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields • ALUctr and Reg. Wr: control logic after decoding the instruction 31 26 op 6 bits 21 rs 5 bits 16 11 rt 5 bits rd 5 bits Rd Rs Rt Reg. Wr 5 5 5 361 datapath. 21 shamt 5 bits 0 funct 6 bits ALUctr bus. A 32 bus. B 32 ALU bus. W 32 Clk Rw Ra Rb 32 32 -bit Registers 6 Result 32

Datapath for Register-Register Operations (in general) ° R[rd] <- R[rs] op R[rt] Example: add

Datapath for Register-Register Operations (in general) ° R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt • Ra, Rb, and Rw comes from instruction’s rs, rt, and rd fields • ALUctr and Reg. Wr: control logic after decoding the instruction 31 26 op 6 bits 21 rs 5 bits 16 11 rt 5 bits rd 5 bits Rd Rs Rt Reg. Wr 5 5 5 361 datapath. 22 shamt 5 bits 0 funct 6 bits ALUctr bus. A 32 bus. B 32 ALU bus. W 32 Clk Rw Ra Rb 32 32 -bit Registers 6 Result 32

Register-Register Timing Clk PC Old Value Clk-to-Q New Value Rs, Rt, Rd, Op, Func

Register-Register Timing Clk PC Old Value Clk-to-Q New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value Reg. Wr Old Value bus. A, B bus. W Instruction Memory Access Time New Value Delay through Control Logic New Value Register File Access Time New Value Old Value ALU Delay New Value Old Value Rd Rs Rt Reg. Wr 5 5 5 361 datapath. 23 Register Write Occurs Here bus. A 32 bus. B 32 ALU bus. W 32 Clk Rw Ra Rb 32 32 -bit Registers ALUctr Result 32

RTL: The OR Immediate Instruction 31 26 op 6 bits ° ori 21 rs

RTL: The OR Immediate Instruction 31 26 op 6 bits ° ori 21 rs 5 bits 16 rt 0 immediate 5 bits 16 bits rt, rs, imm 16 • mem[PC] Fetch the instruction from memory • R[rt] <- R[rs] or Zero. Ext(imm 16) The OR operation • PC <- PC + 4 Calculate the next instruction’s address 31 00000000 16 bits 361 datapath. 24 16 15 0 immediate 16 bits

3 c: Logical Operations with Immediate ° R[rt] <- R[rs] op Zero. Ext[imm 16]

3 c: Logical Operations with Immediate ° R[rt] <- R[rs] op Zero. Ext[imm 16] ] 31 26 op 6 bits 21 rs 5 bits rt 5 bits 31 Rd Reg. Dst 16 15 0 immediate 16 bits rd? immediate 16 bits 00000000 16 bits Rt Mux Reg. Wr 5 32 Clk Rs 5 5 Rw Ra Rb 32 32 -bit Registers 32 bus. B 32 Mux 16 bus. A Zero. Ext imm 16 ALUctr ALU bus. W 361 datapath. 25 11 16 32 ALUSrc Result 32 0

RTL: The Load Instruction 31 ° lw rt, rs, imm 16 26 op 6

RTL: The Load Instruction 31 ° lw rt, rs, imm 16 26 op 6 bits • mem[PC] 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits Fetch the instruction from memory • Addr <- R[rs] + Sign. Ext(imm 16) Calculate the memory address R[rt] <- Mem[Addr] Load the data into the register • PC <- PC + 4 Calculate the next instruction’s address 31 16 15 0 00000000 16 bits 31 16 15 11111111 1 16 bits 361 datapath. 26 0 immediate 16 bits

3 d: Load Operations ° R[rt] <- Mem[R[rs] + Sign. Ext[imm 16]] 31 26

3 d: Load Operations ° R[rt] <- Mem[R[rs] + Sign. Ext[imm 16]] 31 26 op 6 bits Rd Reg. Dst Mux Reg. Wr 5 32 Clk rt 5 bits rt, rs, imm 16 11 0 immediate 16 bits rd Rt Rs 5 5 ALUctr Rw Ra Rb 32 32 -bit Registers W_Src 32 32 Ext. Op 32 Mem. Wr ALUSrc Data In 32 Clk Mux bus. B 32 Mux 16 bus. A Extender imm 16 361 datapath. 27 rs 5 bits 16 ALU bus. W 21 Example: lw Wr. En Adr Data Memory 32

3 e: Store Operations ° Mem[ R[rs] + Sign. Ext[imm 16] <- R[rt] ]

3 e: Store Operations ° Mem[ R[rs] + Sign. Ext[imm 16] <- R[rt] ] Example: sw 31 26 21 op 6 bits Rd Reg. Dst rs 5 bits rt, rs, imm 16 16 0 rt 5 bits Rt immediate 16 bits ALUctr Mem. Wr W_Src Mux Reg. Wr 5 32 Clk Rw Ra Rb 32 32 -bit Registers 32 32 Ext. Op 32 Data In 32 Clk ALUSrc Wr. En Adr Data Memory Mux bus. B 32 Mux 16 bus. A Extender imm 16 361 datapath. 28 5 Rt ALU bus. W 5 Rs 32

3 f: The Branch Instruction 31 26 op 6 bits ° beq 21 rs

3 f: The Branch Instruction 31 26 op 6 bits ° beq 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits rs, rt, imm 16 • mem[PC] Fetch the instruction from memory • Equal <- R[rs] == R[rt] Calculate the branch condition • if (COND eq 0) Calculate the next instruction’s address - PC <- PC + 4 + ( Sign. Ext(imm 16) x 4 ) • else - PC <- PC + 4 361 datapath. 29

Datapath for Branch Operations rs, rt, imm 16 31 Datapath generates condition (equal) 26

Datapath for Branch Operations rs, rt, imm 16 31 Datapath generates condition (equal) 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits Inst Address n. PC_sel 4 Adder 00 32 bus. W PC Mux 361 datapath. 30 Reg. Wr 5 Clk Adder PC Ext imm 16 Cond Clk 5 Rs 5 Rt Rw Ra Rb 32 32 -bit Registers bus. A 32 bus. B 32 Equal? ° beq

Putting it All Together: A Single Cycle Datapath n. PC_sel Reg. Dst 00 5

Putting it All Together: A Single Cycle Datapath n. PC_sel Reg. Dst 00 5 Rs Memto. Reg 5 Rt Clk imm 16 16 Extender Clk = 32 0 1 32 Data In 32 Ext. Op Clk ALUSrc 32 0 Mux bus. A Rw Ra Rb 32 32 -bit Registers bus. B 32 Mux PC Mux Adder imm 16 ALUctr Mem. Wr Equal ALU Adder PC Ext 361 datapath. 31 32 Imm 16 0 Reg. Wr 5 bus. W Rd Rd Rt 1 4 Rt Instruction<31: 0> <0: 15> Rs <11: 15> Adr <16: 20> <21: 25> Inst Memory Wr. En Adr Data Memory 1

An Abstract View of the Critical Path ° Register file and ideal memory: •

An Abstract View of the Critical Path ° Register file and ideal memory: • The CLK input is a factor ONLY during write operation • During read operation, behave as combinational logic: - Address valid => Output valid after “access time. ” Ideal Instruction Memory Instruction Rd Rs 5 5 Imm 16 A PC 32 Clk Rt 5 Rw Ra Rb 32 32 -bit Registers 32 32 ALU Next Address Instruction Address 361 datapath. 32 Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32 -bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew B 32 Data Address Data In Clk Ideal Data Memory

Binary Arithmetics for the Next Address ° In theory, the PC is a 32

Binary Arithmetics for the Next Address ° In theory, the PC is a 32 -bit byte address into the instruction memory: • Sequential operation: PC<31: 0> = PC<31: 0> + 4 • Branch operation: PC<31: 0> = PC<31: 0> + 4 + Sign. Ext[Imm 16] * 4 ° The magic number “ 4” always comes up because: • The 32 -bit PC is a byte address • And all our instructions are 4 bytes (32 bits) long ° In other words: • The 2 LSBs of the 32 -bit PC are always zeros • There is no reason to have hardware to keep the 2 LSBs ° In practice, we can simply the hardware by using a 30 -bit PC<31: 2>: • Sequential operation: PC<31: 2> = PC<31: 2> + 1 • Branch operation: PC<31: 2> = PC<31: 2> + 1 + Sign. Ext[Imm 16] • In either case: Instruction Memory Address = PC<31: 2> concat “ 00” 361 datapath. 33

Next Address Logic: Expensive and Fast Solution ° Using a 30 -bit PC: •

Next Address Logic: Expensive and Fast Solution ° Using a 30 -bit PC: • Sequential operation: PC<31: 2> = PC<31: 2> + 1 • Branch operation: PC<31: 2> = PC<31: 2> + 1 + Sign. Ext[Imm 16] • In either case: Instruction Memory Address = PC<31: 2> concat “ 00” 30 30 Adder PC 30 Adder imm 16 Instruction<15: 0> 16 Sign. Ext Clk 1 32 30 30 Instruction<31: 0> Branch 361 datapath. 34 Instruction Memory Mux 30 “ 1” “ 00” 0 Addr<31: 2> Addr<1: 0> Zero

Next Address Logic: Cheap and Slow Solution ° Why is this slow? • Cannot

Next Address Logic: Cheap and Slow Solution ° Why is this slow? • Cannot start the address add until Zero (output of ALU) is valid ° Does it matter that this is slow in the overall scheme of things? • Probably not here. Critical path is the load operation. 30 PC 30 “ 1” “ 0” Carry In Adder 0 Mux imm 16 Instruction<15: 0> 16 Sign. Ext Clk 1 30 “ 00” 30 Addr<31: 2> Addr<1: 0> Instruction Memory 32 30 Instruction<31: 0> Branch Zero 361 datapath. 35

RTL: The Jump Instruction 31 26 op 6 bits ° j 0 target address

RTL: The Jump Instruction 31 26 op 6 bits ° j 0 target address 26 bits target • mem[PC] Fetch the instruction from memory • PC<31: 2> <- PC<31: 28> concat target<25: 0> Calculate the next instruction’s address 361 datapath. 36

Instruction Fetch Unit ° j target • PC<31: 2> <- PC<31: 28> concat target<25:

Instruction Fetch Unit ° j target • PC<31: 2> <- PC<31: 28> concat target<25: 0> 30 30 PC<31: 28> Jump 1 30 30 Branch 361 datapath. 37 0 0 Mux imm 16 Instruction<15: 0> 16 Sign. Ext Clk 30 Adder “ 1” 1 30 26 Adder PC 30 4 Mux Target Instruction<25: 0> Zero “ 00” Addr<31: 2> Addr<1: 0> Instruction Memory 32 Instruction<31: 0>

Putting it All Together: A Single Cycle Datapath ° We have everything except control

Putting it All Together: A Single Cycle Datapath ° We have everything except control signals (underline) Instruction<31: 0> Branch 5 Rs 5 Rt Rt ALUctr bus. A 0 1 32 361 datapath. 38 Ext. Op Clk Imm 16 Memto. Reg Mem. Wr 0 32 Data In 32 ALUSrc Rd Wr. En Adr Data Memory 32 Mux 16 Extender imm 16 32 Mux 32 Clk Rw Ra Rb 32 32 -bit Registers bus. B 32 ALU bus. W Zero Rs <0: 15> Reg. Wr 5 <11: 15> 1 Mux 0 <16: 20> Reg. Dst Rt <21: 25> Rd Instruction Fetch Unit Jump Clk 1

An Abstract View of the Implementation Control Ideal Instruction Memory Instruction Rd Rs 5

An Abstract View of the Implementation Control Ideal Instruction Memory Instruction Rd Rs 5 5 A PC 32 Rw Ra Rb 32 32 -bit Registers Clk 32 32 B 32 Datapath ° Logical vs. Physical Structure 361 datapath. 39 Conditions Rt 5 ALU Next Address Instruction Address Control Signals Data Address Data In Clk Ideal Data Memory Data Out

Summary ° 5 steps to design a processor • • 1. Analyze instruction set

Summary ° 5 steps to design a processor • • 1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. • 5. Assemble the control logic ° MIPS makes it easier • • Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates ° Single cycle datapath => CPI=1, CCT => long ° Next time: implementing control (Steps 4 and 5) 361 datapath. 41