CS 184 a Computer Architecture Structure and Organization

  • Slides: 47
Download presentation
CS 184 a: Computer Architecture (Structure and Organization) Day 10: January 28, 2005 Empirical

CS 184 a: Computer Architecture (Structure and Organization) Day 10: January 28, 2005 Empirical Comparisons Caltech CS 184 Winter 2005 -- De. Hon 1

Last Time • Instruction Space Modeling Caltech CS 184 Winter 2005 -- De. Hon

Last Time • Instruction Space Modeling Caltech CS 184 Winter 2005 -- De. Hon 2

Today • Empirical Data – Processors – FPGAs – Custom • Gate Array •

Today • Empirical Data – Processors – FPGAs – Custom • Gate Array • Std. Cell • Full – Tasks Caltech CS 184 Winter 2005 -- De. Hon 3

Empirical Comparisons Caltech CS 184 Winter 2005 -- De. Hon 4

Empirical Comparisons Caltech CS 184 Winter 2005 -- De. Hon 4

Empirical • Ground modeling in some concretes • Start sorting out – custom vs.

Empirical • Ground modeling in some concretes • Start sorting out – custom vs. configurable – spatial configurable vs. temporal Caltech CS 184 Winter 2005 -- De. Hon 5

Full Custom • Get to define all layers • Use any geometry you like

Full Custom • Get to define all layers • Use any geometry you like • Only rules are process design rules • CS 181 Caltech CS 184 Winter 2005 -- De. Hon 6

Standard Cell Area inv nand 3 inv AOI 4 Cell area nor 3 Inv

Standard Cell Area inv nand 3 inv AOI 4 Cell area nor 3 Inv All cells uniform height Width of channel determined by routing Identify the full custom and standard cell regions on 386 DX die http: //microscope. fsu. edu/chipshots/intel/386 dxlarge. html 7 Caltech CS 184 Winter 2005 -- De. Hon

MPGA • Metal Programmable Gate Array • Gates pre-placed (poly, diffusion) • Only get

MPGA • Metal Programmable Gate Array • Gates pre-placed (poly, diffusion) • Only get to define metal connections – Cheap – only have to pay for metal mask(s) Caltech CS 184 Winter 2005 -- De. Hon 8

MPGA vs. Custom? • AMI CICC’ 83 – MPGA 1. 0 – Std-Cell 0.

MPGA vs. Custom? • AMI CICC’ 83 – MPGA 1. 0 – Std-Cell 0. 7 – Custom 0. 5 • Toshiba DSP – Custom 0. 3 • Mosaid RAM – Custom 0. 2 Caltech CS 184 Winter 2005 -- De. Hon • GE CICC’ 86 – MPGA 1. 0 – Std-Cell 0. 4 --0. 7 • FF/counter 0. 7 • Full. Adder 0. 4 • RAM 0. 2 MPGA = Metal Programmable Gate Array (traditional Gate Array) 9

Metal Programmable Gate Arrays Caltech CS 184 Winter 2005 -- De. Hon 10

Metal Programmable Gate Arrays Caltech CS 184 Winter 2005 -- De. Hon 10

MPGAs • Modern -- “Sea of Gates” • yield 35 --70% • maybe 5

MPGAs • Modern -- “Sea of Gates” • yield 35 --70% • maybe 5 kl 2/gate ? – (quite a bit of variance) Caltech CS 184 Winter 2005 -- De. Hon 11

FPGA Table Caltech CS 184 Winter 2005 -- De. Hon 12

FPGA Table Caltech CS 184 Winter 2005 -- De. Hon 12

Modern FPGAs • APEX 20 K 1500 E § 52 K LEs § 0.

Modern FPGAs • APEX 20 K 1500 E § 52 K LEs § 0. 18 mm § 24 mm 22 mm § 1. 25 Ml 2/LE • XC 2 V 1000 § 10. 44 mm x 9. 90 mm [source: Chipworks] § 0. 15 mm § 11, 520 4 -LUTs § 1. 5 Ml 2/4 -LUT [Both also have RAM in cited area] Caltech CS 184 Winter 2005 -- De. Hon 13

Conventional FPGA Tile K-LUT (typical k=4) w/ optional output Flip-Flop Caltech CS 184 Winter

Conventional FPGA Tile K-LUT (typical k=4) w/ optional output Flip-Flop Caltech CS 184 Winter 2005 -- De. Hon 14

Toronto FPGA Model Caltech CS 184 Winter 2005 -- De. Hon 15

Toronto FPGA Model Caltech CS 184 Winter 2005 -- De. Hon 15

How many gates? Caltech CS 184 Winter 2005 -- De. Hon 16

How many gates? Caltech CS 184 Winter 2005 -- De. Hon 16

“gates” in 2 -LUT Caltech CS 184 Winter 2005 -- De. Hon 17

“gates” in 2 -LUT Caltech CS 184 Winter 2005 -- De. Hon 17

Now how many? Caltech CS 184 Winter 2005 -- De. Hon 18

Now how many? Caltech CS 184 Winter 2005 -- De. Hon 18

Which gives: More usable gates? More gates/unit area? Caltech CS 184 Winter 2005 --

Which gives: More usable gates? More gates/unit area? Caltech CS 184 Winter 2005 -- De. Hon 19

Gates Required? Depth=3, Depth=2048? Caltech CS 184 Winter 2005 -- De. Hon 20

Gates Required? Depth=3, Depth=2048? Caltech CS 184 Winter 2005 -- De. Hon 20

Gate metric for FPGAs? • Day 8: several components for computations – compute element

Gate metric for FPGAs? • Day 8: several components for computations – compute element – interconnect: • space • time – instructions • Not all applications need in same balance • Assigning a single “capacity” number to device is an oversimplification Caltech CS 184 Winter 2005 -- De. Hon 21

MPGA vs. FPGA • MPGA (SOG GA) – 5 Kl 2/gate – 35 -70%

MPGA vs. FPGA • MPGA (SOG GA) – 5 Kl 2/gate – 35 -70% usable (50%) – 7 -17 Kl 2/gate net • Xilinx XC 4 K – 1. 25 Ml 2 /CLB – 17 --48 gates (26? ) – 26 -73 Kl 2/gate net • Ratio: 2 --10 (5) Adding ~2 x Custom/MPGA, Custom/FPGA ~10 x Caltech CS 184 Winter 2005 -- De. Hon 22

MPGA vs. FPGA • MPGA (SOG GA) l=0. 6 m tgd~1 ns • Xilinx

MPGA vs. FPGA • MPGA (SOG GA) l=0. 6 m tgd~1 ns • Xilinx XC 4 K l=0. 6 m 1 -7 gates in 7 ns 2 -3 gates typical • Ratio: 1 --7 (2. 5) Caltech CS 184 Winter 2005 -- De. Hon 23

Processors vs. FPGAs Caltech CS 184 Winter 2005 -- De. Hon 24

Processors vs. FPGAs Caltech CS 184 Winter 2005 -- De. Hon 24

Processors and FPGAs Caltech CS 184 Winter 2005 -- De. Hon 25

Processors and FPGAs Caltech CS 184 Winter 2005 -- De. Hon 25

Component Example • Single die in 0. 35 mm XC 4085 XL-09 3, 136

Component Example • Single die in 0. 35 mm XC 4085 XL-09 3, 136 CLBs 682 Bit Ops/ns Alpha 1996 2 64 b ALUs 55. 7 Bit Ops/ns 4. 6 ns 2. 3 ns [1 “bit op” = 2 gate evaluations] Caltech CS 184 Winter 2005 -- De. Hon 26

Processors and FPGAs Caltech CS 184 Winter 2005 -- De. Hon 27

Processors and FPGAs Caltech CS 184 Winter 2005 -- De. Hon 27

Raw Density Summary • Area – MPGA 2 -3 x Custom – FPGA 5

Raw Density Summary • Area – MPGA 2 -3 x Custom – FPGA 5 x MPGA • Area-Time – Gate Array 6 -10 x Custom – FPGA 15 -20 x Gate Array – Processor 10 x FPGA Caltech CS 184 Winter 2005 -- De. Hon 28

Raw Density Caveats • Processor/FPGA may solve more specialized problem • Problems have different

Raw Density Caveats • Processor/FPGA may solve more specialized problem • Problems have different resource balance requirements – …can lead to low yield of raw density Caltech CS 184 Winter 2005 -- De. Hon 29

Homework • Day behind • Current assignment – Involves cascades PLAs Caltech CS 184

Homework • Day behind • Current assignment – Involves cascades PLAs Caltech CS 184 Winter 2005 -- De. Hon 30

Task Comparisons Caltech CS 184 Winter 2005 -- De. Hon 31

Task Comparisons Caltech CS 184 Winter 2005 -- De. Hon 31

Broadening Picture • Compare larger computations • For comparison – throughput density metric: results/area-time

Broadening Picture • Compare larger computations • For comparison – throughput density metric: results/area-time • normalize out area-time point selection • high throughput density most in fixed area least area to satisfy fixed throughput target Caltech CS 184 Winter 2005 -- De. Hon 32

Multiply Caltech CS 184 Winter 2005 -- De. Hon 33

Multiply Caltech CS 184 Winter 2005 -- De. Hon 33

Example: FIR Filtering Yi=w 1 xi+w 2 xi+1+. . . Application metric: TAPs =

Example: FIR Filtering Yi=w 1 xi+w 2 xi+1+. . . Application metric: TAPs = filter taps multiply accumulate Caltech CS 184 Winter 2005 -- De. Hon 34

IIR/Biquad Simplest IIR: Yi=A Xi+B Yi-1 Caltech CS 184 Winter 2005 -- De. Hon

IIR/Biquad Simplest IIR: Yi=A Xi+B Yi-1 Caltech CS 184 Winter 2005 -- De. Hon 35

DES Keysearch <http: //www. cs. berkeley. edu/~iang/isaac/hardware/> Caltech CS 184 Winter 2005 -- De.

DES Keysearch <http: //www. cs. berkeley. edu/~iang/isaac/hardware/> Caltech CS 184 Winter 2005 -- De. Hon 36

DNA Sequence Match • Problem: “cost” of transform S 1 S 2 • Given:

DNA Sequence Match • Problem: “cost” of transform S 1 S 2 • Given: cost of insertion, deletion, substitution • Relevance: similarity of DNA sequences – evolutionary similarity – structure predict function • Typically: new sequence compared to large databse Caltech CS 184 Winter 2005 -- De. Hon 37

DNA Sequence Match Caltech CS 184 Winter 2005 -- De. Hon 38

DNA Sequence Match Caltech CS 184 Winter 2005 -- De. Hon 38

Floating-Point Add (single prec. ) Caltech CS 184 Winter 2005 -- De. Hon 39

Floating-Point Add (single prec. ) Caltech CS 184 Winter 2005 -- De. Hon 39

Floating-Point Mpy (single prec. ) Caltech CS 184 Winter 2005 -- De. Hon 40

Floating-Point Mpy (single prec. ) Caltech CS 184 Winter 2005 -- De. Hon 40

FPGA vs. Processor FP (Double Precision FP MAC) Caltech CS 184 Winter 2005 --

FPGA vs. Processor FP (Double Precision FP MAC) Caltech CS 184 Winter 2005 -- De. Hon [Underwood/FPGA’ 2004] 41

Degrade from Peak Caltech CS 184 Winter 2005 -- De. Hon 42

Degrade from Peak Caltech CS 184 Winter 2005 -- De. Hon 42

Degrade from Peak: FPGAs • Long path length not run at cycle • Limited

Degrade from Peak: FPGAs • Long path length not run at cycle • Limited throughput requirement – bottlenecks elsewhere limit throughput req. • Insufficient interconnect • Insufficient retiming resources (bandwidth) Caltech CS 184 Winter 2005 -- De. Hon 43

Degrade from Peak: Processors • Ops w/ no gate evaluations (interconnect) • Ops use

Degrade from Peak: Processors • Ops w/ no gate evaluations (interconnect) • Ops use limited word width • Stalls waiting for retimed data Caltech CS 184 Winter 2005 -- De. Hon 44

Degrade from Peak: Custom/MPGA • Solve more general problem than required – (more gates

Degrade from Peak: Custom/MPGA • Solve more general problem than required – (more gates than really need) • Long path length • Limited throughput requirement • Not needed or applicable to a problem Caltech CS 184 Winter 2005 -- De. Hon 45

Degrade Notes • We’ll cover these issues in more detail as we get into

Degrade Notes • We’ll cover these issues in more detail as we get into them later in the course Caltech CS 184 Winter 2005 -- De. Hon 46

Big Ideas [MSB Ideas] • Raw densities: custom: ga: fpga: processor – 1: 5:

Big Ideas [MSB Ideas] • Raw densities: custom: ga: fpga: processor – 1: 5: 1000 – close gap with specialization Caltech CS 184 Winter 2005 -- De. Hon 47