Designing a Single Cycle Processor Outline t t
- Slides: 70
Designing a Single. Cycle Processor 國立清華大學資訊 程學系 黃婷婷教授
Outline t t t Introduction to designing a processor Analyzing the instruction set(step 1) Building the datapath(steps 2 and 3) A single-cycle implementation Control for the single-cycle CPU(steps 4 and 5) l l l t Control of CPU operations ALU controller Main controller Adding jump instruction 1
Introduction t CPU performance factors l l t We will examine two MIPS implementations l l t Instruction count n Determined by ISA and compiler CPI and Cycle time n Determined by CPU hardware A simplified version A more realistic pipelined version Simple subset, shows most aspects l l l Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j 2
Instruction Execution t t t PC instruction memory, fetch instruction Register numbers register file, read registers Depending on instruction class l Use ALU to calculate n n n l l Arithmetic result Memory address for load/store Branch target address Access data memory for load/store PC target address or PC + 4 3
CPU Overview 4
Multiplexers t Can’t just join wires together l Use multiplexers 5
Control 6
Logic Design Basics t Information encoded in binary l l l t Combinational element l l t Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses Operate on data Output is a function of input State (sequential) elements l Store information 7
Combinational Elements t AND-gate l t Y=A&B A Adder l A Y B Y=A+B + Y B t Multiplexer l t Y = S ? I 1 : I 0 I 1 M u x Y Arithmetic/Logic Unit l Y = F(A, B) A ALU Y B S F 8
Sequential Elements t Register: stores data in a circuit l l D Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 Q Clk D Clk Q 9
Sequential Elements t Register with write control l l Only updates on clock edge when write control input is 1 Used when stored value is required later Clk D Write Clk Q Write D Q 10
Clocking Methodology t Combinational logic transforms data during clock cycles l l l Between clock edges Input from state elements, output to state element Longest delay determines clock period 11
How to Design a Processor? 1. Analyze instruction set (datapath requirements) l l l The meaning of each instruction is given by the register transfers Datapath must include storage element Datapath must support each register transfer 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points effecting register transfer 5. Assemble the control logic 12
Outline t t t Introduction to designing a processor Analyzing the instruction set (step 1) Building the datapath (steps 2 and 3) A single-cycle implementation Control for the single-cycle CPU l l l Control of CPU operations ALU controller Main controller 13
Step 1: Analyze Instruction Set t All MIPS instructions are 32 bits long with 3 formats: l R-type: l I-type: l J-type: 31 26 op 31 31 6 bits op 6 bits 21 rs 26 26 5 bits rs 5 bits op 6 bits t The different fields are: l l l 16 rt 21 5 bits 16 11 6 0 rd shamt funct 5 bits 6 bits rt 5 bits 0 immediate 16 bits 0 target address 26 bits op: operation of the instruction rs, rt, rd: source and destination register shamt: shift amount funct: selects variant of the “op” field address / immediate target address: target address of jump 14
Our Example: A MIPS Subset t t R-Type: l add rd, rs, rt l sub rd, rs, rt l and rd, rs, rt l or rd, rs, rt l slt rd, rs, rt Load/Store: l lw rt, rs, imm 16 l sw rt, rs, imm 16 Imm operand: l addi rt, rs, imm 16 Branch: l beq rs, rt, imm 16 Jump: l j target 31 26 op 6 bits 31 21 rs 5 bits 26 op 6 bits 31 op 6 bits rt 5 bits 21 rs 5 bits 26 16 11 rd 5 bits shamt 5 bits 0 funct 6 bits 16 0 immediate 16 bits rt 5 bits 21 6 16 address 0 26 bits 15
Logical Register Transfers t t RTL gives the meaning of the instructions All start by fetching the instruction, read registers, then use ALU => simplicity and regularity help MEM[ PC ] = op | rs | rt | rd | shamt | funct or = op | rs | rt | Imm 16 or = op | Imm 26 (added at the end) Inst ADD SUB LOAD STORE ADDI BEQ Register transfers R[rd] <- R[rs] + R[rt]; PC <- PC + 4 R[rd] <- R[rs] - R[rt]; PC <- PC + 4 R[rt] <- MEM[ R[rs] + sign_ext(Imm 16)]; PC <- PC + 4 MEM[ R[rs] + sign_ext(Imm 16) ] <-R[rt]; PC <- PC + 4 R[rt] <- R[rs] + sign_ext(Imm 16)]; PC <- PC + 4 if (R[rs] == R[rt]) then PC <- PC + 4 + sign_ext(Imm 16)] || 00 else PC <- PC + 4 16
Requirements of Instruction Set After checking the register transfers, we can see that datapath needs the followings: t Memory l t Registers (32 x 32) l l l t t store instructions and data read RS read RT Write RT or RD PC Extender for zero- or sign-extension Add and sub register or extended immediate (ALU) Add 4 or extended immediate to PC 17
Outline t t t Introduction to designing a processor Analyzing the instruction set (step 1) Building the datapath (steps 2, 3) A single-cycle implementation Control for the single-cycle CPU l l l t Control of CPU operations ALU controller Main controller Adding jump instruction 18
Step 2 a: Combinational Components for Datapath t Basic building blocks of combinational logic elements : Carry. In A 32 32 Adder A Sum Carry B 32 MUX Adder B 32 Select 32 Y 32 MUX ALU control 4 A 32 B 32 32 Result ALU 19
Step 2 b: Sequential Components for Datapath Storage elements: t Register: l Similar to the D Flip Flop except n n l N-bit input and output Write Enable input Write Enable: n n negated (0): Data Out will not change asserted (1): Data Out will become Data In Write Enable Data In N Data Out N Clk 20
Storage Element: Register File t Consists of 32 registers: l l l t Register is selected by: l l l t Appendix B. 8 Two 32 -bit output busses: bus. A and bus. B One 32 -bit input bus: bus. W Write Enable bus. W 32 Clk RW RA RB 5 5 5 32 -bit Registers bus. A 32 bus. B 32 RA selects the register to put on bus. A (data) RB selects the register to put on bus. B (data) RW selects the register to be written via bus. W (data) when Write Enable is 1 Clock input (CLK) l l The CLK input is a factor ONLY during write operation During read, behaves as a combinational circuit 21
Storage Element: Memory t Memory (idealized) l l l t Word is selected by: l l t Appendix B. 8 One input bus: Data In One output bus: Data Out Write Enable Address Data In 32 Clk Data. Out 32 Address selects the word to put on Data Out Write Enable = 1: address selects the memory word to be written via the Data In bus Clock input (CLK) l l The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: n n Address valid => Data Out valid after access time No need for read control 22
Step 3 a: Datapath Assembly t Instruction fetch unit: common operations l l Fetch the instruction: mem[PC] Update the program counter: n n Sequential code: PC <- PC + 4 Branch and Jump: PC <- “Something else” 23
Step 3 b: Add and Subtract t R[rd] <- R[rs] op R[rt] Ex: add l l rd, rs, rt Ra, Rb, Rw come from inst. ’s rs, rt, and rd fields ALU and Reg. Write: control logic after decode 31 26 21 op 6 bits rs 5 bits rs rt Instruction rd Read register 1 16 rt 5 bits 11 rd 5 bits 6 shamt 5 bits 4 Read data 1 Read register 2 Registers Write register Read data 2 Write data 0 funct 6 bits ALU operation (funct) Zero ALU result Reg. Write 24
Step 3 c: Store/Load Operations t R[rt]<-Mem[R[rs]+Sign. Ext[imm 16]] Ex: lw rt, rs, imm 16 31 26 op 6 bits rs 21 rs 5 bits 11 16 rt 5 bits 0 immediate 16 bits rd 4 rt rt 25
R-Type/Load/Store Datapath 26
Step 3 d: Branch Operations t beq rs, rt, imm 16 mem[PC] Fetch inst. from memory Equal <- R[rs] == R[rt] Calculate branch condition if (COND == 0) Calculate next inst. address PC <- PC + 4 + ( Sign. Ext(imm 16) x 4 ) else PC <- PC + 4 31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits 27
Datapath for Branch Operations t beq rs, rt, imm 16 4 28
Outline t t t Introduction to designing a processor Analyzing the instruction set Building the datapath A single-cycle implementation Control for the single-cycle CPU l l l Control of CPU operations ALU controller Main controller 29
A Single Cycle Datapath 30
Data Flow during add 100. . 0100 4 • Clocking • data flows in other paths 31
Clocking Methodology t Combinational logic transforms data during clock cycles l l l Between clock edges Input from state elements, output to state element Longest delay determines clock period 32
Clocking Methodology t t Define when signals are read and written Assume edge-triggered: l l Values in storage (state) elements updated only on a clock edge => clock edge should arrive only after input signals stable Any combinational circuit must have inputs from and outputs to storage elements Clock cycle: time for signals to propagate from one storage element, through combinational circuit, to reach the second storage element A register can be read, its value propagated through some combinational circuit, new value is written back to the same register, all in same cycle => no feedback within a single cycle 33
Register-Register Timing Clk PC Old Value Clk-to-Q New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value Reg. Wr Old Value bus. A, B bus. W Delay through Control Logic New Value Register File Access Time New Value Old Value ALU Delay New Value Old Value 32 Rd Rs Rt Reg. Wr 5 5 5 bus. W 32 Clk Rw Ra Rb 32 32 -bit Registers ALUctr Register Write Occurs Here bus. A 32 bus. B 32 ALU PC Ideal Instruction Memory Clk Instruction Memory Access Time New Value Result 32 34
The Critical Path t Register file and ideal memory: l During read, behave as combinational logic: n Address valid => Output valid after access time Ideal Instruction Memory Instruction Rd 5 Instruction Address Rt 5 Imm 16 A PC 32 Clk Rs 5 Clk Rw Ra Rb 32 32 -bit Registers 32 ALU Next Address Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction memory’s Access Time + Register file’s Access Time + ALU to Perform a 32 -bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew Data 32 Address Ideal Data In Memory B 32 Clk 35
Worst Case Timing (Load) Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access Time New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value Ext. Op Old Value New Value ALUSrc Old Value New Value Memto. Reg Old Value New Value Reg. Wr Old Value New Value bus. A bus. B Delay through Control Logic New Value Register Write Occurs Register File Access Time New Value Old Value Delay through Extender & Mux Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time bus. W Old Value New 36
Outline t t t Introduction to designing a processor Analyzing the instruction set Building the datapath A single-cycle implementation Control for the single-cycle CPU l l l t Control of CPU operations (step 4) ALU controller ( step 5 a) Main controller (step 5 b) Adding jump instruction 37
Step 4: Control Points and Signals Rd <0: 15> Rs <11: 15> Op Funct Rt <16: 20> <21: 25> Addr <21: 25> Inst. Memory Instruction<31: 0> Imm 16 Control PCsrc Reg. Dst ALUSrc Mem. Wr Reg. Wr Mem. Rd ALUctr Memto. Reg Equal Datapath 38
Datapath with Mux and Control point 39
Designing Main Control t Some observations: l opcode (Op[5 -0]) is always in bits 31 -26 40
Datapath with Control Unit 41
Instruction Fetch at Start of Add t instruction <- mem[PC]; PC + 4 42
Instruction Decode of Add t Fetch the two operands and decode instruction: 43
ALU Operation during Add t R[rs] + R[rt] 44
Write Back at the End of Add t R[rd] <- ALU; PC <- PC + 4 45
Datapath Operation for lw t R[rt] <- Memory {R[rs] + Sign. Ext[imm 16]} 46
Datapath Operation for beq if (R[rs]-R[rt]==0) then Zero<-1 else Zero<-0 if (Zero==1) then PC=PC+4+sign. Ext[imm 16]*4; else PC = PC + 4 47
Outline t t t Designing a processor Analyzing the instruction set Building the datapath A single-cycle implementation Control for the single-cycle CPU l l l t Control of CPU operations (step 4) ALU controller (step 5 a) Main controller (step 5 b) Adding jump instruction 48
Datapath with Control Unit 49
Step 5 a: ALU Control t ALU used for l l l Load/Store: F = add Branch: F = subtract R-type: F depends on funct field ALU control 0000 0001 0010 0111 1100 Function AND OR add subtract set-on-less-than NOR 50
Our Plan for the Controller 7 31 R-type t 26 op 21 rs ALU Control (Local) 6 ALUop 2 Main Control 16 rt ALUctr 4 11 rd ALU Op code 6 func 6 shamt 0 funct ALUop is 2 -bit wide to represent: load/store requiring the ALU to perform add (00) l beq requiring the ALU to perform sub (01) l “R-type” need to reference func field (10) l ALUop (Symbolic) ALUop<1: 0> R-type “R-type” 10 lw Add 00 sw Add 00 beq Subtract 01 jump xxx 51
ALU Control t Assume 2 -bit ALUOp derived from opcode l Combinational logic derives ALU control opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001 set-on-lessthan 101010 set-on-lessthan 0111 52
Logic Equation for ALUctr ALUop func bit<1> bit<0> bit<5>bit<4>bit<3> bit<2> bit<1> bit<0> x x 0 0 x x ALUctr bit<3>bit<2> bit<1> bit<0> 0 1 0 0 x 1 x x x 0 1 1 1 1 1 x x x x 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 1 53
Logic Equation for ALUctr 2 ALUop bit<1> bit<0> x 1 1 x func bit<5> bit<4> bit<3> bit<2> x x x 0 0 x x 1 0 bit<1> bit<0> x 1 1 x 0 0 ALUctr<2> 1 1 1 This makes func<3> a don’t care ALUctr 2 = ALUop 0 + ALUop 1‧func 2’‧func 1‧func 0’ 54
Logic Equation for ALUctr 1 ALUop bit<1> 0 x 1 1 1 bit<0> 0 1 x x x func bit<5> bit<4> bit<3> bit<2> x x x x x 0 0 x x 1 0 x bit<1> x x 0 1 1 bit<0> x x 0 0 0 ALUctr<1> 1 1 1 ALUctr 1 = ALUop 1’ + ALUop 1‧func 2’‧func 0’ 55
Logic Equation for ALUctr 0 ALUop bit<1> bit<0> 1 x bit<5> x x func bit<4> bit<3> bit<2> bit<1> bit<0> x 0 1 x 1 0 ALUctr<0> 1 1 ALUctr 0 = ALUop 1 ‧func 3’‧func 2‧func 1’‧func 0 + ALUop 1’‧func 3 ‧func 2’‧func 1‧func 0’ 56
The Resultant ALU Control Block 0 Operation 3 57
Outline t t t Introduction to designing a processor Analyzing the instruction set Building the datapath A single-cycle implementation Control for the single-cycle CPU l l l t Control of CPU operations ALU controller Main controller (step 5 b) Adding jump instruction 58
Datapath with Control Unit 59
Step 5 b: The Main Control Unit t Control signals derived from instruction R-type Load/ Store Branch 0 rs rt rd shamt funct 31: 26 25: 21 20: 16 15: 11 10: 6 5: 0 35 or 43 rs rt address 31: 26 25: 21 20: 16 15: 0 4 rs rt address 31: 26 25: 21 20: 16 15: 0 opcode always read, except for load write for R -type and load sign-extend add 60
Truth Table of Control Signals (6 inputs and 9 outputs) See Appendix A func 10 0000 10 0010 op 00 0000 add 1 0 0 0 1 00 0000 sub 1 0 0 0 1 0 0 Reg. Dst ALUSrc Memto. Reg. Write Mem. Read Mem. Write Branch ALUop 1 ALUop 0 Op code 6 Main Control Reg. Dst ALUSrc : ALUop 2 We Don’t Care : -) 10 0011 lw 0 1 1 0 0 0 10 1011 sw x 1 x 0 0 1 0 0 00 0100 beq x 0 0 0 1 0 func 6 ALU Control (Local) ALUctr 4 61
Truth Table for Reg. Write Op code 00 0000 R-type Reg. Write 10 0011 10 1011 00 0100 lw sw beq 1 1 0 0 Reg. Write = R-type + lw = op 5’‧op 4’‧op 3’‧op 2’‧op 1’‧op 0’(R-type) + op 5‧op 4’‧op 3’‧op 2’‧op 1‧ op 0 (lw) op<5>. . <0> R-type op<5>. . <0> lw op<5>. . <0> sw op<5>. . <0> beq op<0> jump Reg. Write X 62
PLA Implementing Main Control 63
Outline t t t Introduction to designing a processor Analyzing the instruction set (step 1) Building the datapath (steps 2, 3) A single-cycle implementation Control for the single-cycle CPU l l l t Control of CPU operations ALU controller Main controller Adding jump instruction 64
Implementing Jumps Jump t t address 31: 26 25: 0 Jump uses word address Update PC with concatenation of l l l t 2 Top 4 bits of old PC 26 -bit jump address 00 Need an extra control signal decoded from opcode 65
Putting it Altogether (+ jump instruction) 66
Worst Case Timing (Load) Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access Time New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value Ext. Op Old Value New Value ALUSrc Old Value New Value Memto. Reg Old Value New Value Reg. Wr Old Value New Value bus. A bus. B Delay through Control Logic New Value Register Write Occurs Register File Access Time New Value Old Value Delay through Extender & Mux Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time bus. W Old Value New 67
Drawback of Single-Cycle Design t Long cycle time: l Cycle time must be long enough for the load instruction: PC’s Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew t Cycle time for load is much longer than needed for all other instructions 68
Summary t t Single cycle datapath => CPI=1, Clock cycle time long MIPS makes control easier l l Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates 69
- Mem
- Store4memory
- Single cycle processor
- Single cycle processor
- V clock
- Conditional macro expansion example
- Scalar cpu
- Single purpose processor in embedded system
- Single purpose processor
- Example of sentence outline
- Single paragraph outline writing revolution
- Sisd in computer architecture
- Dataxin
- Multi channel multi phase example
- What is a rock cycle
- Difference between single cycle and multicycle datapath
- Single cycle datapath
- Single cycle cpu design
- Brain pop water cycle
- Denitrification definition
- Difference between phosphorus cycle and carbon cycle
- Difference between open cycle and closed cycle gas turbine
- Forbidden latency in computer architecture
- Chapter 5 two-cycle and four-cycle engines answers
- Grille évaluation handball cycle 3
- Difference between open cycle and closed cycle mhd system
- Cycle 3 consolidation
- Cycle 3 cycle de consolidation
- Water cycle the hydrologic cycle
- Php stand for?
- Miles kimball food processor
- Processor expert
- Image picker
- Multiprocessor characteristics
- Processor organization
- Minima processor
- Logic design conventions in computer architecture
- Cray 1 the first supercomputer
- Instruction set of 8051
- Superscalar machine
- Ctp dicom
- Processor organization
- Ppt processor
- Vector processor
- Pipelined processor design
- Personal hypertext processor
- Aplikasi word processor
- Io management in os
- Dedicated processor assignment
- Multi core processor example
- Contoh pemodelan kognitif
- Argtab
- Macro processors are
- What is a macro processor
- What is conditional macro expansion
- Linear pipelining
- Features of word processing/desktop publishing software
- Ia32 architecture
- Intel roadmap
- I o interface
- History of processors
- The physical parts of a computer
- Formatting objects processor
- Fixed point processor
- Embedded processor market
- 32 bit risc processor
- Network processor design
- Dsp processor fundamentals
- Compare the 8086, 80386, pentium processor
- Characteristics of embedded system
- Visual chunking