Introduction to VLSI Programming Lecture 7 Introduction to

  • Slides: 25
Download presentation
Introduction to VLSI Programming Lecture 7: Introduction to the DLX (course 2 IN 30)

Introduction to VLSI Programming Lecture 7: Introduction to the DLX (course 2 IN 30) Prof. dr. ir. Kees van Berkel

Time table 2005 date class | lab subject Aug. 30 2 | 0 hours

Time table 2005 date class | lab subject Aug. 30 2 | 0 hours intro; VLSI Sep. 6 3 | 0 hours handshake circuits Sep. 13 3 | 0 hours handshake circuits Sep. 20 3 | 0 hours Tangram Sep. 27 no lecture Oct. 4 no lecture assignment Oct. 11 1 | 2 hours demo, fifos, registers | deadline assignment Oct. 18 1 | 2 hours design cases; Oct. 25 1 | 2 hours DLX introduction Nov. 1 1 | 2 hours low-cost DLX Nov. 8 1 | 2 hours high-speed DLX Nov. 29 9/15/2021 deadline final report Kees van Berkel 2

VLSI programming for … • Low costs: – introduce resource sharing. • Low delay

VLSI programming for … • Low costs: – introduce resource sharing. • Low delay (high throughput): – introduce parallelism. • Low energy (low power): – reduce activity; … 9/15/2021 Kees van Berkel 3

VLSI programming for low costs • Keep it simple!! • Introduce resource sharing: commands,

VLSI programming for low costs • Keep it simple!! • Introduce resource sharing: commands, auxiliary variables, expressions, operators. • Enable resource sharing, by: – reducing parallelism – making similar commands equal 9/15/2021 Kees van Berkel 4

Procedure definition vs declaration Procedure definition: P = proc (). S – provides a

Procedure definition vs declaration Procedure definition: P = proc (). S – provides a textual shorthand (expansion) – each call generates copy of resource, i. e. no sharing Procedure declaration: P : proc (). S – defines a sharable resource – each call generates access to this resource 9/15/2021 Kees van Berkel 5

Hints and Tips: optimization • When asked to optimize for area (low cost) it

Hints and Tips: optimization • When asked to optimize for area (low cost) it is allowed to invest time (execution time, extra iterations, …) • When asked to optimize for speed, it is allowed to invest area (pipeline stages, parallelism, …) 9/15/2021 Kees van Berkel 6

Hints and Tips: a known bug • Statement of form if –x then S

Hints and Tips: a known bug • Statement of form if –x then S 0 else S 1 fi • During simulation wrong alternative is selected (e. g. S 0 when x = true) • Work around: remove negation: if x then S 1 else S 0 fi 9/15/2021 Kees van Berkel 7

Instruction Set Architecture ISA is interface between hardware and software. Hence, a good ISA:

Instruction Set Architecture ISA is interface between hardware and software. Hence, a good ISA: • allows easy programming (compilers, OS, . . ); • allows efficient implementations (hardware); • has a long lifetime (survives many HW generations); • is general purpose. 9/15/2021 Kees van Berkel 8

ISA classification Code sequence for 9/15/2021 C: = A+B Kees van Berkel 9

ISA classification Code sequence for 9/15/2021 C: = A+B Kees van Berkel 9

Reduced Instruction Set Computer 1980: Patterson and Ditzel: “The Case for RISC” • fixed

Reduced Instruction Set Computer 1980: Patterson and Ditzel: “The Case for RISC” • fixed 32 -bit instruction set, with few formats • load-store architecture • large register bank (32 registers), all general purpose On processor organization: • hard-wired decode logic • pipelined execution • single clock-cycle execution 9/15/2021 Kees van Berkel 10

RISC processors Advantages: Disadvantages: • smaller die size (single chip processor) • poor code

RISC processors Advantages: Disadvantages: • smaller die size (single chip processor) • poor code density • cannot execute X 86 code • shorter development time (simplicity) • higher performance 9/15/2021 Kees van Berkel 11

A “Typical” RISC • 32 -bit instructions, 3 fixed formats • 32 general purpose

A “Typical” RISC • 32 -bit instructions, 3 fixed formats • 32 general purpose registers, 32 -bit • 3 address arithmetic instructions, reg-reg • single address mode for load/store: “address + displacement” • simple branch conditions; delayed branch 9/15/2021 Kees van Berkel 12

DLX (“Deluxe”) (AMD 29 K + DECstation 3100 + HP 850 + IBM 801

DLX (“Deluxe”) (AMD 29 K + DECstation 3100 + HP 850 + IBM 801 + Intel i 860 + MIPS M/120 A + MIPS M/1000 + Motorola 88 K + RISC I + SGI 4 D/60 + SPARCstation-1 + Sun 4/110 + Sun-4/260) / 13 = DLX Other RISC examples include: Cray-1, 2, 3, AMD 2900, DEC Alpha, ARM. 9/15/2021 Kees van Berkel 13

DLX instruction formats 31 R-type 26, 25 Opcode 21, 20 16, 15 11, 10

DLX instruction formats 31 R-type 26, 25 Opcode 21, 20 16, 15 11, 10 rs 1 rs 2 rd 0 function Reg-reg ALU operations I-type Opcode rs 1 rd Immediate loads, stores, conditional branch, . . J-type Opcode offset Jump, jump and link, trap, return from exception 9/15/2021 Kees van Berkel 14

Example instructions 9/15/2021 Kees van Berkel 15

Example instructions 9/15/2021 Kees van Berkel 15

GCD in GCL x, y: = X, Y ; do x y if x>y

GCD in GCL x, y: = X, Y ; do x y if x>y x: = x-y [] x<y y: = y-x fi od { R: x=gcd(X, Y) } 9/15/2021 Kees van Berkel 16

GCD in DLX assembler pre: loop: pos 1: pos 2: exit: 9/15/2021 LW LW

GCD in DLX assembler pre: loop: pos 1: pos 2: exit: 9/15/2021 LW LW SUB BEQZ SLT BEQZ SUB J SW HLT R 1, 4(R 0) R 2, 8(R 0) R 3, R 1, R 2 R 3, ”exit” R 4, R 1, R 2 R 4, ”pos 2” R 2, R 1 “loop” R 1, R 2 “loop” 20(R 0), R 1: =Mem[4+0] R 2: =Mem[8+0] R 3: =R 1 -R 2 if (R 3=0) then PC: =“exit” R 4: =(R 1<R 2) if (R 4=0) then PC: =“pos 2” R 2: =R 2 -R 1 PC: =“loop” R 1: =R 1 -R 2 PC: =“loop” Mem[20+0]: =R 1 Kees van Berkel 17

DLX instruction mixes [from H&P, Figs 2. 26, 2. 27] 9/15/2021 Kees van Berkel

DLX instruction mixes [from H&P, Figs 2. 26, 2. 27] 9/15/2021 Kees van Berkel 18

DLX interface, state address pc r 0 r 1 r 2 address Mem Instruction

DLX interface, state address pc r 0 r 1 r 2 address Mem Instruction memory (Data instruction DLX Reg CPU r 31 clock 9/15/2021 data memory) r/w interrupt Kees van Berkel 19

DLX: “Moore machine” (ignoring interrupts) Reg[0], pc : = 0, 0 ; do Mem[Reg[rs

DLX: “Moore machine” (ignoring interrupts) Reg[0], pc : = 0, 0 ; do Mem[Reg[rs 1 +immediate], pc, Reg[rd] : = if SW Reg[rs 1+immediate] fi , if J pc+4+offset [] BEQZ if Reg[rs]=0 pc+4 +immediate [] Reg[rs]#0 pc+4 fi [] else pc+4 fi , if LW Mem[rs 1+immediate] [] ADD ALU(add, Reg[rs 1], Reg[rs 2]) fi od 9/15/2021 Kees van Berkel 20

DLX: 5 -step sequential execution 9/15/2021 Kees van Berkel 21

DLX: 5 -step sequential execution 9/15/2021 Kees van Berkel 21

DLX: 5 -step sequential execution IF ID EX MM WB 4 lmd Mem B

DLX: 5 -step sequential execution IF ID EX MM WB 4 lmd Mem B aluo Reg ir Instr. mem pc A cond npc 0? Imm 9/15/2021 Kees van Berkel 22

Bibliography • Computer Architecture; a Quantitative Approach (3 rd Ed. ); John L Hennessy

Bibliography • Computer Architecture; a Quantitative Approach (3 rd Ed. ); John L Hennessy & David A Patterson; Morgan Kaufmann Publishers Inc, 1996. • ARM System Architecture; Steve Furber; Addison Wesley, 1996. • DSP Processor Fundamentals, Architectures and Features; Phil Lapsey et al (Berkeley Design Technology Inc. ), IEEE, 1996. • www. handshakesolutions. com • www. arm. com/news/6936. html • www. research. philips. com/ newscenter/archive/2004/handshake. html 9/15/2021 Kees van Berkel 23

Some references • www. handshakesolutions • www. arm. com/news/6936. html • www. research. philips.

Some references • www. handshakesolutions • www. arm. com/news/6936. html • www. research. philips. com/ newscenter/archive/2004/handshake. html 9/15/2021 Kees van Berkel 24

Next week: lecture 8 Outline: • VLSI programming for high performance. • Pipelining the

Next week: lecture 8 Outline: • VLSI programming for high performance. • Pipelining the DLX. • Lab work: Assignment 4 (improve the performance of the Tangram DLX. ) 9/15/2021 Kees van Berkel 25