Evolution of Glen Cullers Architectures for Interactive Scientific

  • Slides: 14
Download presentation
Evolution of Glen Culler’s Architectures for Interactive Scientific Computing David Culler SC 2000 Masterworks

Evolution of Glen Culler’s Architectures for Interactive Scientific Computing David Culler SC 2000 Masterworks Nov. 7, 2000 11/7/2000 GJC Evolution

Plan for this session • Evolution of GJCs Array Processors – “Internal architecture that

Plan for this session • Evolution of GJCs Array Processors – “Internal architecture that expresses algebra of bilinear forms” • Video of GJC presentation at ACM Conference on the History of the Personal Computer 11/7/2000 GJC Evolution 2

11/7/2000 GJC Evolution 1990 Star 910/VP Vector Workstation 1986 Personal Supercomputer 1985 Culler-7 MP

11/7/2000 GJC Evolution 1990 Star 910/VP Vector Workstation 1986 Personal Supercomputer 1985 Culler-7 MP AP Unix Computer Sever 1981 CHI-5 General Purpose AP 1982 Motorola Single Chip APU 1976 UCLA Plasma Simulation System 1979 LPCAP 1972 CHI AP 120 Array Processor 1974 CHI AP 90 B 1975 FPS AP 120 B 1970 MP 32 A Sonor Signal Processor 1969 Culler-Harrison Inc 1966 IBM 360 On-Line system 1963 UCSB On-Line Computer Classroom 1964 Teleputer 1959 UCSB 1961 RW-400 Culler-Fried System 1954 Ramo Wooldridge 1951 Rad. Lab Timeline 3

MP 32 A - 1970 • 16 -bit fixed-point processor @ 6 MHz •

MP 32 A - 1970 • 16 -bit fixed-point processor @ 6 MHz • Multiple operations per microinstruction – 28 -bit instructions – 2 cycle multiply • Parallel memories – 64 -word scratch pad – 512 -word fixed + 64 -word writable instruction memory – 64 KW instruction & data memory • SONAR signal processing 11/7/2000 GJC Evolution 4

AP 120: 1972 -73 • CHI Serial 1 • DARPA acoustical research center at

AP 120: 1972 -73 • CHI Serial 1 • DARPA acoustical research center at Moffett Field 11/7/2000 GJC Evolution 5

AP 120 (CHI Serial 2) – 1974 • Constructed to perform signal analysis and

AP 120 (CHI Serial 2) – 1974 • Constructed to perform signal analysis and speech compression • Used for real-time digital speech transmission on ARPA net – with SRI, Lincoln labs, ISI • basis for Floating-Point Systems AP 120 b 11/7/2000 GJC Evolution 6

Floating Point Systems AP 120 B (1975) • 6 MHz (167 ns), 38 -bit

Floating Point Systems AP 120 B (1975) • 6 MHz (167 ns), 38 -bit floating point, 64 -bit instructions • Independent floating Add (2 stage) and Mult (3 stage) – peak 12 MFLOPS • Memories – – Two 32 -word data pad (DX, DY) – 2 per cycle 2560 word fixed table memory – 1 per cycle, 2 cycle delay 64 KW data memory – ½ per cycle, 3 cycle delay 512 word instruction memory • Two blocks of 32 word accumulators (dx, dy) • Address indexing & counting (SPAD & ALU) 11/7/2000 GJC Evolution 7

UCLA Plasma Simulation Interactive (PSI) System - 1976 • MP 32 A: Scheduling and

UCLA Plasma Simulation Interactive (PSI) System - 1976 • MP 32 A: Scheduling and Control • FPS AP 120 B: most calculation – 6 MHz, parallel pipelined Multiply Add • four CHI IOPs: data movement – Fixed pgm microprcessors, 4 way xfer at ¼ MW/s • Math System Language interactive interpreter • • 2 -1/2 D Million Particle Simulation: 6 MFLOPS “out of core” 3 D Magneto. Hydordynamics @ 4 MFLOPS Particle-by-particle or grid-by-grid, not vectorization 4 x IBM 360/91 at 1/160 th the cost 11/7/2000 GJC Evolution 8

LPCAP - 1979 • 12/24 -bit fixed point speech processor • Statistical models of

LPCAP - 1979 • 12/24 -bit fixed point speech processor • Statistical models of speech – Linear Predictive Coding • Very small form factor (large shoe box) • Used in ground-air comm 11/7/2000 GJC Evolution 9

CHI-5 General-Purpose AP (1980) • 16/ 32/ 48 -bit fixed point speech processor with

CHI-5 General-Purpose AP (1980) • 16/ 32/ 48 -bit fixed point speech processor with parallel memories • Stand-alone or hosted operation – Very fast macro-micro dispatch • Program sequencer (80 -bit x 3 KW) • Three 16 -bit adders (linked to form 32 or 48) • Parallel storage – – Four accumulators 16/32 bit main memory + 16 address registers Two 1024 x 16 -bit array memories 32 -bit ROM table memory • Extensive bussing • Host block transfer, A/D D/A 8 KHz, Serail ports 11/7/2000 GJC Evolution 10

Motorola APU: 1982 • 3 micron CMOS platinum silicide, 4 MHz, 100 pin –

Motorola APU: 1982 • 3 micron CMOS platinum silicide, 4 MHz, 100 pin – 16 MHz multiplexed instruction port (78 -bit instr) – 30. 5 K transistors, 296 x 305 mils – 20 16 -bit data buses, 184 control lines • 16/32 -bit fixed or floating point array and signal processor • Data arithmetic processor – 1 Multiply, 3 Add, 4 accum, multiplier storage • Array memory address controllers – 2 D 9 -point stencil matrix addressing • External X, Y, R busses • Control => Micro-nets of array processors 11/7/2000 GJC Evolution 11

Culler-7 (1985 – PC AT) • • 2 -16 MFLOPS Linpack @ 250 K$

Culler-7 (1985 – PC AT) • • 2 -16 MFLOPS Linpack @ 250 K$ - 1 M$ Bipolar TTL 1 -4 Computer Processors + Kernel Proc A, XY, & D machine per processor Dual 64 -bit data busses 96 -bit instructions (48 A, 48 XY) Memories – – 11/7/2000 Kernel memory (2 MB) Global Data memory (5 -42 MB, 32 -bit VAS) Program memory (256 KB real, 32 MB virtual) Array memory – 4 x 16 KB GJC Evolution 12

Personal Supercomputer (1986) • ¼ Cray 1 S under 100 K$ (< 6 k$

Personal Supercomputer (1986) • ¼ Cray 1 S under 100 K$ (< 6 k$ PER mips) – PC-AT / Sun 3 days • 3 -4 mflops DP linpack (387 does 0. 02) • 200 -bit wide instruction • Multiple levels of parallelism – Multiple processors – XY and A Machines per processor – Multiple operations per instruction in each • Very high delivered/peak 11/7/2000 GJC Evolution 13

Star 910/VP (1990) • 40 MHz Sparc (cypress chip-set) • TI 8847 CMOS vector

Star 910/VP (1990) • 40 MHz Sparc (cypress chip-set) • TI 8847 CMOS vector processor – 80 MFLOPS SP, 160 MFLOPS SP • Vector DMA, Vector Cache – 1. 3 GB/s • 320 MB/s shared memory system • 18 MFLOPS Linpack for 200 K$ 11/7/2000 GJC Evolution 14