Vector Processors Ryan Mc Pherson ELEC 6200 Fall
- Slides: 11
Vector Processors Ryan Mc. Pherson ELEC 6200 Fall 2007 ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 1
Overview • • • History Description Advantages Disadvantages Applications Conclusions ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 2
What is a Vector Processor? • Also called an Array Processor. • Runs multiple mathematical operations on multiple data elements simultaneously. • Common in supercomputers of the 1970’s 80’s and 90’s. • Today most CPU designs contains at least some vector processing instructions, typically referred to as SIMD. • Typically operate on a few vectors elements per clock cycle in a pipeline v. SIMD which will operate on all at once. ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 3
History • 1962 University of Illinois Illiac IV - completed 1972 with 64 ALUs 100 -150 MFlops (massively parallel computer) • (1973) TI’s Advance Scientific Computer (ASC) 20 -80 MFlops • (1975) Cray-1 first to have vector registers instead of keeping data in memory (8 registers with 64 64 -bit words in each) • Cray-1 had separate pipelines for different instruction types allowing vector chaining. 80 -240 MFlops ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 4
How It Works • Typical Vector Processor (Cell Processor) ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 3 5
Cell Processor • Result of a partnership between IBM, SCEI/SONY, and TOSHIBA • Parallelism at all levels Thread - multicore design (8 processors) Instruction - statically scheduled and power aware Data - data parallel instructions • Contains a data processor instead of a control system • Statistics: Ø Ø Ø Observed clock speed: > 4 GHz Peak performance (single precision): > 256 GFlops Peak performance (double precision): >26 GFlops Local storage size per SPU: 256 KB Total number of transistors: 234 M ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 6
How It Works (con’t) 3 • • • VRF is Dynamic - 128 entry 128 b wide (128 x 1 64 x 2 32 x 4 16 x 8 8 x 16 1 x 128) Stores Scalar and Vector data Computes all answers, then sorts them to reduce latency. Accesses memory in blocks. Operates on low-latency SRAM ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 7
Advantages • Each result is independent of previous results - allowing deep pipelines and high clock rates. • A single vector instruction performs a great deal of work - meaning less fetches and ewer branches (and in turn fewer mispredictions). • Vector instructions access memory a block at a time which allows memory latency to be amortized over many elements. • Vector instructions access memory with known patterns, which allows multiple memory banks to simultaneously supply operands. • Less memory access = faster processing time. ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 8
Disadvantages • • • Not as fast with scalar instructions Complexity of the multi-ported VRF Difficulties implementing precise exceptions High price of on-chip vector memory systems Increased code complexity ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 9
Applications • • • Servers Home Cinema Super Computing Cluster Computing Mainframes “Astrophysicist Replaces Supercomputer With 8 PS 3’s” 2 ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 10
References 1. 2. 3. 4. 5. Overcoming the limitations of conventional vector processors Kozyrakis, C. ; Patterson, D. ; Computer Architecture, 2003. Proceedings. 30 th Annual International Symposium on 9 -11 June 2003 Page(s): 399 - 409 Astrophysicist Replaces Supercomputer with Eight Play. Station 3. Wired Magazine October 17, 2007. A novel SIMD architecture for the Cell heterogeneous chip-multiprocessor. Hot Chips 17 (Aug 15, 2005). Computer Organization and Design. Patterson, David and Hennessy, John. Chapter 9. 11 p. 48 -51. Morgan Kaufmann Publishers. 2005. © 2006 IBM Corporation. Chip Multiprocessing and the Cell Broadband Engine. M. Gschwind, Computing Frontiers 2006 ELEC 6200, Fall 07, Oct 29 Mc. Pherson: Vector Processors 11