Vector Processors Ryan Mc Pherson ELEC 6200 Fall

  • Slides: 11
Download presentation
Vector Processors Ryan Mc. Pherson ELEC 6200 Fall 2007 ELEC 6200, Fall 07, Oct

Vector Processors Ryan Mc. Pherson ELEC 6200 Fall 2007 ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 1

Overview • • • History Description Advantages Disadvantages Applications Conclusions ELEC 6200, Fall 07,

Overview • • • History Description Advantages Disadvantages Applications Conclusions ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 2

What is a Vector Processor? • Also called an Array Processor. • Runs multiple

What is a Vector Processor? • Also called an Array Processor. • Runs multiple mathematical operations on multiple data elements simultaneously. • Common in supercomputers of the 1970’s 80’s and 90’s. • Today most CPU designs contains at least some vector processing instructions, typically referred to as SIMD. • Typically operate on a few vectors elements per clock cycle in a pipeline v. SIMD which will operate on all at once. ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 3

History • 1962 University of Illinois Illiac IV - completed 1972 with 64 ALUs

History • 1962 University of Illinois Illiac IV - completed 1972 with 64 ALUs 100 -150 MFlops (massively parallel computer) • (1973) TI’s Advance Scientific Computer (ASC) 20 -80 MFlops • (1975) Cray-1 first to have vector registers instead of keeping data in memory (8 registers with 64 64 -bit words in each) • Cray-1 had separate pipelines for different instruction types allowing vector chaining. 80 -240 MFlops ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 4

How It Works • Typical Vector Processor (Cell Processor) ELEC 6200, Fall 07, Oct

How It Works • Typical Vector Processor (Cell Processor) ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 3 5

Cell Processor • Result of a partnership between IBM, SCEI/SONY, and TOSHIBA • Parallelism

Cell Processor • Result of a partnership between IBM, SCEI/SONY, and TOSHIBA • Parallelism at all levels Thread - multicore design (8 processors) Instruction - statically scheduled and power aware Data - data parallel instructions • Contains a data processor instead of a control system • Statistics: Ø Ø Ø Observed clock speed: > 4 GHz Peak performance (single precision): > 256 GFlops Peak performance (double precision): >26 GFlops Local storage size per SPU: 256 KB Total number of transistors: 234 M ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 6

How It Works (con’t) 3 • • • VRF is Dynamic - 128 entry

How It Works (con’t) 3 • • • VRF is Dynamic - 128 entry 128 b wide (128 x 1 64 x 2 32 x 4 16 x 8 8 x 16 1 x 128) Stores Scalar and Vector data Computes all answers, then sorts them to reduce latency. Accesses memory in blocks. Operates on low-latency SRAM ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 7

Advantages • Each result is independent of previous results - allowing deep pipelines and

Advantages • Each result is independent of previous results - allowing deep pipelines and high clock rates. • A single vector instruction performs a great deal of work - meaning less fetches and ewer branches (and in turn fewer mispredictions). • Vector instructions access memory a block at a time which allows memory latency to be amortized over many elements. • Vector instructions access memory with known patterns, which allows multiple memory banks to simultaneously supply operands. • Less memory access = faster processing time. ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 8

Disadvantages • • • Not as fast with scalar instructions Complexity of the multi-ported

Disadvantages • • • Not as fast with scalar instructions Complexity of the multi-ported VRF Difficulties implementing precise exceptions High price of on-chip vector memory systems Increased code complexity ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 9

Applications • • • Servers Home Cinema Super Computing Cluster Computing Mainframes “Astrophysicist Replaces

Applications • • • Servers Home Cinema Super Computing Cluster Computing Mainframes “Astrophysicist Replaces Supercomputer With 8 PS 3’s” 2 ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 10

References 1. 2. 3. 4. 5. Overcoming the limitations of conventional vector processors Kozyrakis,

References 1. 2. 3. 4. 5. Overcoming the limitations of conventional vector processors Kozyrakis, C. ; Patterson, D. ; Computer Architecture, 2003. Proceedings. 30 th Annual International Symposium on 9 -11 June 2003 Page(s): 399 - 409 Astrophysicist Replaces Supercomputer with Eight Play. Station 3. Wired Magazine October 17, 2007. A novel SIMD architecture for the Cell heterogeneous chip-multiprocessor. Hot Chips 17 (Aug 15, 2005). Computer Organization and Design. Patterson, David and Hennessy, John. Chapter 9. 11 p. 48 -51. Morgan Kaufmann Publishers. 2005. © 2006 IBM Corporation. Chip Multiprocessing and the Cell Broadband Engine. M. Gschwind, Computing Frontiers 2006 ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 11