Evolution of Glen Cullers Architectures for Interactive Scientific
- Slides: 14
Evolution of Glen Culler’s Architectures for Interactive Scientific Computing David Culler SC 2000 Masterworks Nov. 7, 2000 11/7/2000 GJC Evolution
Plan for this session • Evolution of GJCs Array Processors – “Internal architecture that expresses algebra of bilinear forms” • Video of GJC presentation at ACM Conference on the History of the Personal Computer 11/7/2000 GJC Evolution 2
11/7/2000 GJC Evolution 1990 Star 910/VP Vector Workstation 1986 Personal Supercomputer 1985 Culler-7 MP AP Unix Computer Sever 1981 CHI-5 General Purpose AP 1982 Motorola Single Chip APU 1976 UCLA Plasma Simulation System 1979 LPCAP 1972 CHI AP 120 Array Processor 1974 CHI AP 90 B 1975 FPS AP 120 B 1970 MP 32 A Sonor Signal Processor 1969 Culler-Harrison Inc 1966 IBM 360 On-Line system 1963 UCSB On-Line Computer Classroom 1964 Teleputer 1959 UCSB 1961 RW-400 Culler-Fried System 1954 Ramo Wooldridge 1951 Rad. Lab Timeline 3
MP 32 A - 1970 • 16 -bit fixed-point processor @ 6 MHz • Multiple operations per microinstruction – 28 -bit instructions – 2 cycle multiply • Parallel memories – 64 -word scratch pad – 512 -word fixed + 64 -word writable instruction memory – 64 KW instruction & data memory • SONAR signal processing 11/7/2000 GJC Evolution 4
AP 120: 1972 -73 • CHI Serial 1 • DARPA acoustical research center at Moffett Field 11/7/2000 GJC Evolution 5
AP 120 (CHI Serial 2) – 1974 • Constructed to perform signal analysis and speech compression • Used for real-time digital speech transmission on ARPA net – with SRI, Lincoln labs, ISI • basis for Floating-Point Systems AP 120 b 11/7/2000 GJC Evolution 6
Floating Point Systems AP 120 B (1975) • 6 MHz (167 ns), 38 -bit floating point, 64 -bit instructions • Independent floating Add (2 stage) and Mult (3 stage) – peak 12 MFLOPS • Memories – – Two 32 -word data pad (DX, DY) – 2 per cycle 2560 word fixed table memory – 1 per cycle, 2 cycle delay 64 KW data memory – ½ per cycle, 3 cycle delay 512 word instruction memory • Two blocks of 32 word accumulators (dx, dy) • Address indexing & counting (SPAD & ALU) 11/7/2000 GJC Evolution 7
UCLA Plasma Simulation Interactive (PSI) System - 1976 • MP 32 A: Scheduling and Control • FPS AP 120 B: most calculation – 6 MHz, parallel pipelined Multiply Add • four CHI IOPs: data movement – Fixed pgm microprcessors, 4 way xfer at ¼ MW/s • Math System Language interactive interpreter • • 2 -1/2 D Million Particle Simulation: 6 MFLOPS “out of core” 3 D Magneto. Hydordynamics @ 4 MFLOPS Particle-by-particle or grid-by-grid, not vectorization 4 x IBM 360/91 at 1/160 th the cost 11/7/2000 GJC Evolution 8
LPCAP - 1979 • 12/24 -bit fixed point speech processor • Statistical models of speech – Linear Predictive Coding • Very small form factor (large shoe box) • Used in ground-air comm 11/7/2000 GJC Evolution 9
CHI-5 General-Purpose AP (1980) • 16/ 32/ 48 -bit fixed point speech processor with parallel memories • Stand-alone or hosted operation – Very fast macro-micro dispatch • Program sequencer (80 -bit x 3 KW) • Three 16 -bit adders (linked to form 32 or 48) • Parallel storage – – Four accumulators 16/32 bit main memory + 16 address registers Two 1024 x 16 -bit array memories 32 -bit ROM table memory • Extensive bussing • Host block transfer, A/D D/A 8 KHz, Serail ports 11/7/2000 GJC Evolution 10
Motorola APU: 1982 • 3 micron CMOS platinum silicide, 4 MHz, 100 pin – 16 MHz multiplexed instruction port (78 -bit instr) – 30. 5 K transistors, 296 x 305 mils – 20 16 -bit data buses, 184 control lines • 16/32 -bit fixed or floating point array and signal processor • Data arithmetic processor – 1 Multiply, 3 Add, 4 accum, multiplier storage • Array memory address controllers – 2 D 9 -point stencil matrix addressing • External X, Y, R busses • Control => Micro-nets of array processors 11/7/2000 GJC Evolution 11
Culler-7 (1985 – PC AT) • • 2 -16 MFLOPS Linpack @ 250 K$ - 1 M$ Bipolar TTL 1 -4 Computer Processors + Kernel Proc A, XY, & D machine per processor Dual 64 -bit data busses 96 -bit instructions (48 A, 48 XY) Memories – – 11/7/2000 Kernel memory (2 MB) Global Data memory (5 -42 MB, 32 -bit VAS) Program memory (256 KB real, 32 MB virtual) Array memory – 4 x 16 KB GJC Evolution 12
Personal Supercomputer (1986) • ¼ Cray 1 S under 100 K$ (< 6 k$ PER mips) – PC-AT / Sun 3 days • 3 -4 mflops DP linpack (387 does 0. 02) • 200 -bit wide instruction • Multiple levels of parallelism – Multiple processors – XY and A Machines per processor – Multiple operations per instruction in each • Very high delivered/peak 11/7/2000 GJC Evolution 13
Star 910/VP (1990) • 40 MHz Sparc (cypress chip-set) • TI 8847 CMOS vector processor – 80 MFLOPS SP, 160 MFLOPS SP • Vector DMA, Vector Cache – 1. 3 GB/s • 320 MB/s shared memory system • 18 MFLOPS Linpack for 200 K$ 11/7/2000 GJC Evolution 14
- Culler meaning
- Scientific method interactive notebook
- Product architecture diagram
- Data warehouse architecture in data mining
- Autoencoders
- Gpu cache coherence
- Database system architectures
- Database and storage architectures
- Client server model of e commerce
- What is isa in computer
- George schlossnagle
- Why systolic architectures
- Cdn architectures
- Ansi/sparc
- Banking system architecture diagram