Processor Hakim Weatherspoon CS 3410 Spring 2013 Computer

  • Slides: 63
Download presentation
Processor Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter

Processor Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter 2. 16 -20, 4. 1 -4

Big Picture: Building a Processor memory inst +4 register file +4 =? PC control

Big Picture: Building a Processor memory inst +4 register file +4 =? PC control offset new pc alu target imm cmp extend A Single cycle processor addr din dout memory

Goal for Today Understanding the basics of a processor We now have enough building

Goal for Today Understanding the basics of a processor We now have enough building blocks to build machines that can perform non-trivial computational tasks Putting it all together: • Arithmetic Logic Unit (ALU)—Lab 0 & 1, Lecture 2 & 3 • Register File—Lecture 4 and 5 • Memory—Lecture 5 – SRAM: cache – DRAM: main memory • Instruction-types • Instruction Datapaths

MIPS Register File memory inst +4 register file +4 =? PC control offset new

MIPS Register File memory inst +4 register file +4 =? PC control offset new pc alu target imm cmp extend A Single cycle processor addr din dout memory

MIPS Register file MIPS register file • 32 registers, 32 -bits each QA (with

MIPS Register file MIPS register file • 32 registers, 32 -bits each QA (with r 0 wired to zero) 32 DW Dual-Read-Port Single-Write-Port Q • Write port indexed via RW B 32 x 32 – Writes occur on falling edge Register File but only if WE is high • Read ports indexed via RA, WE RW RA RB RB 1 5 5 5 32 32

MIPS Register file MIPS register file • 32 registers, 32 -bits each (with r

MIPS Register file MIPS register file • 32 registers, 32 -bits each (with r 0 wired to zero) 32 W • Write port indexed via RW – Writes occur on falling edge but only if WE is high • Read ports indexed via RA, RB A r 1 r 2 … r 31 B WE RW RA RB 1 5 5 5 32 32

MIPS Register file Registers • • Numbered from 0 to 31. Each register can

MIPS Register file Registers • • Numbered from 0 to 31. Each register can be referred by number or name. $0, $1, $2, $3 … $31 Or, by convention, each register has a name. – $16 - $23 $s 0 - $s 7 – $8 - $15 $t 0 - $t 7 – $0 is always $zero. – Patterson and Hennessy p 121.

MIPS Memory memory inst +4 register file +4 =? PC control offset new pc

MIPS Memory memory inst +4 register file +4 =? PC control offset new pc alu target imm cmp extend A Single cycle processor addr din dout memory

MIPS Memory • Up to 32 -bit address 32 • 32 -bit data (but

MIPS Memory • Up to 32 -bit address 32 • 32 -bit data (but byte addressed) • Enable + 2 bit memory control (mc) 00: read word (4 byte aligned) 01: write byte 10: write halfword (2 byte aligned) 11: write word (4 byte aligned) memory 32 addr 2 mc 32 E

Putting it all together: Basic Processor memory inst +4 register file +4 =? PC

Putting it all together: Basic Processor memory inst +4 register file +4 =? PC control offset new pc alu target imm cmp extend A Single cycle processor addr din dout memory

Putting it all together: Basic Processor Let’s build a MIPS CPU • …but using

Putting it all together: Basic Processor Let’s build a MIPS CPU • …but using (modified) Harvard architecture Registers ALU CPU Control data, address, control 1010000 10110000011 0010101. . . Program Memory 00100000001 001000000100. . . Data Memory

Takeaway A processor executes instructions • Processor has some internal state in storage elements

Takeaway A processor executes instructions • Processor has some internal state in storage elements (registers) A memory holds instructions and data • Harvard architecture: separate insts and data • von Neumann architecture: combined inst and data A bus connects the two

Next Goal How do we create computer programs and execute machine instructions?

Next Goal How do we create computer programs and execute machine instructions?

Levels of Interpretation: Instructions for (i = 0; i < 10; i++) printf(“go cucs”);

Levels of Interpretation: Instructions for (i = 0; i < 10; i++) printf(“go cucs”); Programs written in a High Level Language • C, Java, Python, Ruby, … • Loops, control flow, variables main: addi r 2, r 0, 10 addi r 1, r 0, 0 loop: slt r 3, r 1, r 2. . . 00100000000000001010 0010000000000000000100001100000101010 ALU, Control, Register File, … Need translation to a lowerlevel computer understandable format • Assembly is human readable machine language • Processors operate on Machine Language Machine Implementation

Levels of Interpretation: Instructions for (i = 0; i < 10; i++) printf(“go cucs”);

Levels of Interpretation: Instructions for (i = 0; i < 10; i++) printf(“go cucs”); High Level Language • C, Java, Python, Ruby, … • Loops, control flow, variables main: addi r 2, r 0, 10 addi r 1, r 0, 0 loop: slt r 3, r 1, r 2. . . Assembly Language 00100000000000001010 0010000000000000000100001100000101010 Machine Langauge ALU, Control, Register File, … • No symbols (except labels) • One operation per statement • Binary-encoded assembly • Labels become addresses Machine Implementation

Instruction Usage 10 Instructions are stored in op=addi r 0 r 2 00100000000000001010 memory,

Instruction Usage 10 Instructions are stored in op=addi r 0 r 2 00100000000000001010 memory, encoded in 0010000000000000000100001100000101010 binary A basic processor addr data • fetches • decodes • executes one instruction at a time pc cur inst adder decode regs execute

Instruction Types Arithmetic • add, subtract, shift left, shift right, multiply, divide Memory •

Instruction Types Arithmetic • add, subtract, shift left, shift right, multiply, divide Memory • load value from memory to a register • store value to memory from a register Control flow • unconditional jumps • conditional jumps (branches) • jump and link (subroutine call) Many other instructions are possible • vector add/sub/mul/div, string operations • manipulate coprocessor • I/O

Instruction Set Architecture The types of operations permissible in machine language define the ISA

Instruction Set Architecture The types of operations permissible in machine language define the ISA • MIPS: load/store, arithmetic, control flow, … • VAX: load/store, arithmetic, control flow, strings, … • Cray: vector operations, … Two classes of ISAs • Reduced Instruction Set Computers (RISC) • Complex Instruction Set Computers (CISC) We’ll study the MIPS ISA in this course

Instruction Set Architecture (ISA) • Different CPU architecture specifies different set of instructions. Intel

Instruction Set Architecture (ISA) • Different CPU architecture specifies different set of instructions. Intel x 86, IBM Power. PC, Sun Sparc, MIPS, etc. MIPS • ≈ 200 instructions, 32 bits each, 3 formats – mostly orthogonal • all operands in registers – almost all are 32 bits each, can be used interchangeably • ≈ 1 addressing mode: Mem[reg + imm] x 86 = Complex Instruction Set Computer (Cl. SC) • > 1000 instructions, 1 to 15 bytes each • operands in special registers, general purpose registers, memory, on stack, … – can be 1, 2, 4, 8 bytes, signed or unsigned • 10 s of addressing modes – e. g. Mem[segment + reg*scale + offset]

Instructions Load/store architecture • Data must be in registers to be operated on •

Instructions Load/store architecture • Data must be in registers to be operated on • Keeps hardware simple Emphasis on efficient implementation Integer data types: • byte: 8 bits • half-words: 16 bits • words: 32 bits MIPS supports signed and unsigned data types

MIPS instruction formats All MIPS instructions are 32 bits long, has 3 formats R-type

MIPS instruction formats All MIPS instructions are 32 bits long, has 3 formats R-type op 6 bits I-type op 6 bits J-type rs rt 5 bits rs rt rd shamt func 5 bits 6 bits immediate 5 bits 16 bits op immediate (target address) 6 bits 26 bits

MIPS Design Principles Simplicity favors regularity • 32 bit instructions Smaller is faster •

MIPS Design Principles Simplicity favors regularity • 32 bit instructions Smaller is faster • Small register file Make the common case fast • Include support for constants Good design demands good compromises • Support for different type of interpretations/classes

Takeaway

Takeaway

Next Goal How are instructions executed? What are the datapaths for different instruction-types

Next Goal How are instructions executed? What are the datapaths for different instruction-types

Five Stages of MIPS Datapath Prog. inst Mem +4 Data Mem 555 PC Fetch

Five Stages of MIPS Datapath Prog. inst Mem +4 Data Mem 555 PC Fetch ALU Reg. File control Decode Execute A Single cycle processor Memory WB

Five Stages of MIPS datapath Basic CPU execution loop 1. 2. 3. 4. 5.

Five Stages of MIPS datapath Basic CPU execution loop 1. 2. 3. 4. 5. Instruction Fetch Instruction Decode Execution (ALU) Memory Access Register Writeback Instruction types/format • Arithmetic/Register: • Arithmetic/Immediate: • Memory: • Control/Jump: addu $s 0, $s 2, $s 3 slti $s 0, $s 2, 4 lw $s 0, 20($s 3) j 0 xdeadbeef

Stages of datapath (1/5) Stage 1: Instruction Fetch • Fetch 32 -bit instruction from

Stages of datapath (1/5) Stage 1: Instruction Fetch • Fetch 32 -bit instruction from memory. (Instruction cache or memory) • Increment PC accordingly. – +4, byte addressing – +N Prog. inst Mem +4 PC

Stages of datapath (2/5) Stage 2: Instruction Decode • Gather data from the instruction

Stages of datapath (2/5) Stage 2: Instruction Decode • Gather data from the instruction • Read opcode to determine instruction type and field length • Read in data from register file – for addu, read two registers. – for addi, read one registers. – for jal, read no registers. Reg. File 555 control

Stages of datapath (2/5) All MIPS instructions are 32 bits long, has 3 formats

Stages of datapath (2/5) All MIPS instructions are 32 bits long, has 3 formats R-type op 6 bits I-type op 6 bits J-type rs rt 5 bits rs rt rd shamt func 5 bits 6 bits immediate 5 bits 16 bits op immediate (target address) 6 bits 26 bits

Stages of datapath (3/5) Stage 3: Execution (ALU) • Useful work is done here

Stages of datapath (3/5) Stage 3: Execution (ALU) • Useful work is done here (+, -, *, /), shift, logic operation, comparison (slt). • Load/Store? – lw $t 2, 32($t 3) – Compute the address of the memory. ALU

Stages of datapath (4/5) Stage 4: Memory access • Used by load and store

Stages of datapath (4/5) Stage 4: Memory access • Used by load and store instructions only. • Other instructions will skip this stage. • This stage is expected to be fast, why? Target addr from ALU R/W Data from memory Data Mem

Stages of datapath (5/5) Stage 5: • For instructions that need to write value

Stages of datapath (5/5) Stage 5: • For instructions that need to write value to register. • Examples: arithmetic, logic, shift, etc, load. • Store, branches, jump? ? PC Write. Back from ALU or Memory New instruction address If branch or jump Reg. File

Datapath and Clocking Prog. inst Mem +4 Data Mem 555 PC Fetch ALU Reg.

Datapath and Clocking Prog. inst Mem +4 Data Mem 555 PC Fetch ALU Reg. File control Decode Execute Memory WB

Takeaway

Takeaway

Next Goal MIPS Instruction datapaths

Next Goal MIPS Instruction datapaths

MIPS Instruction Types Arithmetic/Logical • R-type: result and two source registers, shift amount •

MIPS Instruction Types Arithmetic/Logical • R-type: result and two source registers, shift amount • I-type: 16 -bit immediate with sign/zero extension Memory Access • load/store between registers and memory • word, half-word and byte operations Control flow • conditional branches: pc-relative addresses • jumps: fixed offsets, register absolute

MIPS instruction formats All MIPS instructions are 32 bits long, has 3 formats R-type

MIPS instruction formats All MIPS instructions are 32 bits long, has 3 formats R-type op 6 bits I-type op 6 bits J-type rs rt 5 bits rs rt rd shamt func 5 bits 6 bits immediate 5 bits 16 bits op immediate (target address) 6 bits 26 bits

Arithmetic Instructions 0000000110000000100110 op 6 bits rs rt 5 bits rd - func 5

Arithmetic Instructions 0000000110000000100110 op 6 bits rs rt 5 bits rd - func 5 bits 6 bits R-Type op 0 x 0 func 0 x 21 mnemonic ADDU rd, rs, rt description R[rd] = R[rs] + R[rt] 0 x 0 0 x 23 0 x 25 0 x 26 0 x 27 SUBU rd, rs, rt OR rd, rs, rt XOR rd, rs, rt NOR rd, rs rt R[rd] = R[rs] – R[rt] R[rd] = R[rs] | R[rt] R[rd] = R[rs] R[rt] R[rd] = ~ ( R[rs] | R[rt] )

Instruction Fetch Circuit • Fetch instruction from memory • Calculate address of next instruction

Instruction Fetch Circuit • Fetch instruction from memory • Calculate address of next instruction • Repeat Program Memory 32 PC 32 inst 2 00 +4

Arithmetic and Logic Prog. inst Mem +4 PC Reg. File 555 control ALU

Arithmetic and Logic Prog. inst Mem +4 PC Reg. File 555 control ALU

Arithmetic Instructions: Shift 0000000100000110000011 op 6 bits - rt 5 bits rd shamt func

Arithmetic Instructions: Shift 0000000100000110000011 op 6 bits - rt 5 bits rd shamt func 5 bits R-Type 6 bits op 0 x 0 func 0 x 0 mnemonic SLL rd, rs, shamt description R[rd] = R[rt] << shamt 0 x 0 0 x 2 0 x 3 SRL rd, rs, shamt SRA rd, rs, shamt R[rd] = R[rt] >>> shamt (zero ext. ) R[rd] = R[rs] >> shamt (sign ext. ) ex: r 5 = r 3 * 8

Shift Prog. inst Mem +4 Reg. File 555 PC shamt control ALU

Shift Prog. inst Mem +4 Reg. File 555 PC shamt control ALU

Arithmetic Instructions: Immediates 001001001010000000101 op 6 bits rs rd immediate 5 bits I-Type 16

Arithmetic Instructions: Immediates 001001001010000000101 op 6 bits rs rd immediate 5 bits I-Type 16 bits op 0 x 9 mnemonic ADDIU rd, rs, imm description R[rd] = R[rs] + sign_extend(imm) imm 0 xc 0 xd ANDI rd, rs, imm ORI rd, rs, imm R[rd] = R[rs] & zero_extend(imm) imm R[rd] = R[rs] | zero_extend(imm) imm ex: r 5 += 5 ex: r 9 = -1 ex: r 9 = 65535

Immediates Prog. inst Mem +4 ALU Reg. File 555 PC control imm shamt extend

Immediates Prog. inst Mem +4 ALU Reg. File 555 PC control imm shamt extend

Immediates Prog. inst Mem +4 ALU Reg. File 555 PC control imm shamt extend

Immediates Prog. inst Mem +4 ALU Reg. File 555 PC control imm shamt extend

Arithmetic Instructions: Immediates 00111100000001010000000101 op 6 bits op 0 x. F - rd 5

Arithmetic Instructions: Immediates 00111100000001010000000101 op 6 bits op 0 x. F - rd 5 bits mnemonic LUI rd, imm ex: r 5 = 0 xdeadbeef immediate 16 bits description R[rd] = imm << 16 I-Type

Immediates Prog. inst Mem +4 ALU Reg. File 555 PC control imm shamt extend

Immediates Prog. inst Mem +4 ALU Reg. File 555 PC control imm shamt extend 16

MIPS Instruction Types Arithmetic/Logical • R-type: result and two source registers, shift amount •

MIPS Instruction Types Arithmetic/Logical • R-type: result and two source registers, shift amount • I-type: 16 -bit immediate with sign/zero extension Memory Access • load/store between registers and memory • word, half-word and byte operations Control flow • conditional branches: pc-relative addresses • jumps: fixed offsets, register absolute

Memory Instructions 1010010010100000000010 op 6 bits rs rd 5 bits offset 16 bits I-Type

Memory Instructions 1010010010100000000010 op 6 bits rs rd 5 bits offset 16 bits I-Type base + offset addressing op 0 x 20 mnemonic LB rd, offset(rs) description R[rd] = sign_ext(Mem[offset+R[rs]]) 0 x 24 0 x 21 0 x 25 0 x 23 0 x 28 0 x 29 0 x 2 b LBU rd, offset(rs) LHU rd, offset(rs) LW rd, offset(rs) SB rd, offset(rs) SH rd, offset(rs) SW rd, offset(rs) R[rd] = zero_ext(Mem[offset+R[rs]]) R[rd] = sign_ext(Mem[offset+R[rs]]) R[rd] = zero_ext(Mem[offset+R[rs]]) R[rd] = Mem[offset+R[rs]] signed Mem[offset+R[rs]] = R[rd] offsets Mem[offset+R[rs]] = R[rd]

Memory Operations Prog. inst Mem ALU Reg. File +4 addr 555 PC Data Mem

Memory Operations Prog. inst Mem ALU Reg. File +4 addr 555 PC Data Mem control imm ext

MIPS Instruction Types Arithmetic/Logical • R-type: result and two source registers, shift amount •

MIPS Instruction Types Arithmetic/Logical • R-type: result and two source registers, shift amount • I-type: 16 -bit immediate with sign/zero extension Memory Access • load/store between registers and memory • word, half-word and byte operations Control flow • conditional branches: pc-relative addresses • jumps: fixed offsets, register absolute

Control Flow: Absolute Jump 00001010100001001000011000000011 op 0 x 2 op immediate 6 bits 26

Control Flow: Absolute Jump 00001010100001001000011000000011 op 0 x 2 op immediate 6 bits 26 bits mnemonic J target J-Type description PC = (PC+4) target 31. . 28 || 00|| target || 00 Absolute addressing for jumps • Jump from 0 x 30000000 to 0 x 20000000? NO Reverse? NO – But: Jumps from 0 x 2 FFFFFFF to 0 x 3 xxxxxxx are possible, but not reverse • Trade-off: out-of-region jumps vs. 32 -bit instruction encoding MIPS Quirk: • jump targets computed using already incremented PC

Absolute Jump Prog. inst Mem ALU Reg. File +4 addr 555 PC Data Mem

Absolute Jump Prog. inst Mem ALU Reg. File +4 addr 555 PC Data Mem control imm || tgt ext

Control Flow: Jump Register 00000110000000001000 op rs 6 bits op 0 x 0 -

Control Flow: Jump Register 00000110000000001000 op rs 6 bits op 0 x 0 - 5 bits func 0 x 08 - - func 5 bits 6 bits mnemonic JR rs description PC = R[rs] R-Type

Jump Register Prog. inst Mem ALU Reg. File +4 addr 555 PC Data Mem

Jump Register Prog. inst Mem ALU Reg. File +4 addr 555 PC Data Mem control imm || tgt ext

Control Flow: Branches 00010100000000011 op 6 bits rs rd 5 bits offset I-Type 16

Control Flow: Branches 00010100000000011 op 6 bits rs rd 5 bits offset I-Type 16 bits signed offsets op mnemonic 0 x 4 BEQ rs, rd, offset description if R[rs] == R[rd] then PC = PC+4 + (offset<<2) 0 x 5 BNE rs, rd, offset if R[rs] != R[rd] then PC = PC+4 + (offset<<2)

Absolute Jump Prog. inst Mem ALU Reg. File +4 555 PC offset =? Data

Absolute Jump Prog. inst Mem ALU Reg. File +4 555 PC offset =? Data Mem control imm + || addr tgt Could have used ALU for branch add ext Could have used ALU for branch cmp

Absolute Jump Prog. inst Mem ALU Reg. File +4 555 PC offset =? Data

Absolute Jump Prog. inst Mem ALU Reg. File +4 555 PC offset =? Data Mem control imm + || addr tgt Could have used ALU for branch add ext Could have used ALU for branch cmp

Control Flow: More Branches 0000010010100000000010 op 6 bits rs subop 5 bits op subop

Control Flow: More Branches 0000010010100000000010 op 6 bits rs subop 5 bits op subop mnemonic 0 x 1 0 x 0 BLTZ rs, offset 0 x 1 0 x 6 0 x 0 0 x 7 0 x 0 offset 16 bits almost I-Type signed offsets description if R[rs] < 0 then PC = PC+4+ (offset<<2) BGEZ rs, offset if R[rs] ≥ 0 then PC = PC+4+ (offset<<2) BLEZ rs, offset if R[rs] ≤ 0 then PC = PC+4+ (offset<<2) BGTZ rs, offset if R[rs] > 0 then PC = PC+4+ (offset<<2)

Absolute Jump Prog. inst Mem ALU Reg. File +4 555 PC offset + ||

Absolute Jump Prog. inst Mem ALU Reg. File +4 555 PC offset + || tgt control imm addr =? Data Mem cmp ext Could have used ALU for branch cmp

Control Flow: Jump and Link 00001100000001001000011000000010 op immediate 6 bits 26 bits op 0

Control Flow: Jump and Link 00001100000001001000011000000010 op immediate 6 bits 26 bits op 0 x 3 mnemonic JAL target description r 31 = PC+8 PC = (PC+4)32. . 29 || target || 00 J-Type

Absolute Jump Prog. inst Mem +4 +4 555 PC offset control imm + ||

Absolute Jump Prog. inst Mem +4 +4 555 PC offset control imm + || ALU Reg. File tgt Could have used ALU for link add =? cmp ext addr Data Mem

Next Time CPU Performance Pipelined CPU

Next Time CPU Performance Pipelined CPU