Limits of ParallelDistributed Computing Prof Erik DIRKX VUBINFOPADS
Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik. Dirkx@vub. ac. be http: //parallel. vub. ac. be 2005 ©Erik F. Dirkx
Introduction • (Cluster)Computers : a tool for a new way of doing science & engineering (cheap: BYO !!!) • “Hardware” • “Software” 2005 ©Erik F. Dirkx
Need for Speed • Processing : signal : structured (e. g. MP 3) dynamic : unstructured • Data : pictures, movie, simulation, … • Interconnect : bandwidth ><latency 2005 ©Erik F. Dirkx
Scaleable computing : COW • Cluster of • Workstations (TX : Ranch. OW) Fundamental Observation : (Erik’s law) (><marginal production cost = 0) (remember : 20%+ of earth = Si …) • Only general purpose programmable devices will survive in the long term yet … “programmable” = ? ? 2005 ©Erik F. Dirkx
The original cluster (neo-cortex) • 10**11 general purpose neurons => compute & memory = “gray” matter • 10**5 connections / neuron => interconnect = “white” matter • Switching time >1 ms (digital PPM) • Input ~100 Mbps (pre-thalamus) • Output<<Input storage : ~ 10**17 bits (do not drink & think …) • ~20 W , Electro-Chemical, Carbo-Hydrate powered 2005 ©Erik F. Dirkx
General Purpose (neo)Cortex • General purpose “cellular columns” (e. g. blind musician) • 6 layer : 1 in, 1 out, 4 compute • 4 A 4 pages constant density • Tuned by “emotional” subsystem : real time, pre-emptive priorities • Hierarchy root = “prefrontal cortex” (L=+, R=-) 2005 ©Erik F. Dirkx
Comparison • Human (general purpose) speed - - [12 km/h] endurance - - [42. 195 km] power - - [200 w@120 km] force - - [52*13 ? ? ? ] accuracy - - • Other predator (special purpose) speed ++ (e. g. cheetah) endurance ++ (e. g. orca) power ++ (e. g. hyena) force ++ (e. g. shark) accuracy ++ (e. g. eagle) ++ @ price of general purposeness (re? )-configurability +++ => Learning (Software ? ) => Genetics (Hardware? ) 2005 ©Erik F. Dirkx
Fundamental Bound (to Your enthousiasm ? ) • (physical) technology • problem 2005 ©Erik F. Dirkx
Granularity • Critical Parameter informally Gray > (Compute cap. ) => Hard <> Easy Problems White < (Communication cap. ) 2005 ©Erik F. Dirkx
Granularity (II) • Experience : situation, optimum : too coarse => sub-optimal : ! too fine => comm bottleneck : ! • Tcomp = # instr * CPI * 1/f = Rproblem* Rmachine • Tcomm = latency + #bits/bandwidth ? =? Cproblem * Cmachine • Cproblem = #databits • Cmachine = … 2005 ©Erik F. Dirkx
Granularity (III) • Cmachine = • bandwidth ~ 1012 b/s => bw-1~1 ps • latency 10 Ghz = 0. 1 ns = 3 cm (vacuum); 3 mm (si) => 1 ps ~ 30 m 2005 ©Erik F. Dirkx
Granularity (IV) • Amdahl sections (i. e. bottlenecks, rest = “easy” parallellism) : #bits ~ 1 !! • ? ? How to construct a “compiler”/computersystem with dynamically tunable machine granularity to adapt to dynamically varying demands on R and C from application(s) • Structured ? ! [ad hoc] / Unstructured ? ? 2005 ©Erik F. Dirkx
Fine Grain Parallellism • FPGA implementation (NOT automatic) • ATM switch sim @ faster than real time … • Speed-up = traffic pattern dependent 2005 ©Erik F. Dirkx
Conclusion • Cluster computing is here to stay • Cluster computing is a vehicle for a new way of doing science & engineering (for the masses) • COW is only one example of compute engines satisfying fundamental laws • (Digital) hardware : understood & economically sound • “Software” : cf. 1950’s ad-hoc, need for language(s), theorethical support, runtime, fault tolerance, … • VUB (INFO) : “Advanced Computer Architecture” + “Concurrente Systemen”(NL)/”Parallel Systems”(E) • http: //parallel. vub. ac. be 2005 ©Erik F. Dirkx
General Purpose “Computer” • Amplifying elements transistor (n*1000 atoms + quantum mechanics) • Connecting Elements wire/fibre/wireless (Maxwell equations) 2005 ©Erik F. Dirkx
Cluster : BYO 101 • Step 1 : design Your “compute element” : e. g. look-up table, ALU, … and build a BIG factory • Step 2 : design Your “memory element” : e. g. a capacitor, MRAM, … and build a BIG factory • Step 3 : design Your “switch” : e. g. cross-bar and build a BIG factory 2005 ©Erik F. Dirkx
Generic Multiprocessor Interconnect (1) Processors/Memory Interconnect (2) Front-end Processors/Memory VUB/Internet Interconnect 1 : High Bandwidth, Low Latency, DL-free (!!) Interconnect 2 : OTS TCP/IP 2005 ©Erik F. Dirkx
Cluster BYO (1+2) • Step 1 // Step 2 PC based COW : B(uild) Y(our) O(wn) Motherboards : BI- or QUAD CPU ? ! CPU, Memory : OTS Disk : RAID 5 Hierarchical Control (remember pre-frontal cortex…) Bottleneck !!!! 2005 ©Erik F. Dirkx
Cluster BYO (3) • Step 3 Buy a few km of wire/fibre Buy switches => compute switches (PVM/MPI) => diagnostic out-of-band (TCP/IP) => KVM switches 2005 ©Erik F. Dirkx
VUB INFO : BYO (1995) • Design & Test ! Experience ! Students • Blue Gene (2004) 256000 OTS CPUs 4 GB/CPU DRAM 10 Gbps/node Power : ? ? MTBF : ? ? job run-time : months … 2005 ©Erik F. Dirkx
Other Examples • Field Programmable Gate Array Satisfies description, Fundamental Limits ! “Program”/(Re)configure ? ? • Hybrid : COW cluster + accelerator in each node e. g. Deep Blue : 32 * ( 1 + 8) => Variable Granularity … (someone interested in an interesting Ph. D topic? ? ) =>VUB / Erasmus hogeschool 2005 ©Erik F. Dirkx
Lookahead Accumulation in Discrete Event Simulation • Improve Gproblem through compile time aggregation • A-synchronous synchronization system (!) 2005 ©Erik F. Dirkx
System Software • Compute => Sequential Languages (? ? Non-determinism, synchronization) • Storage => Virtual Memory, RAID, … • Communication => Communication Library e. g. “Parallel Virtual Machine” : Open e. g. “Message Passing Interface” : Standard • Fundamental Issue : Parallel Operating System n*Linux + MPI … (21 st century Microsoft/Intel ? ) 2005 ©Erik F. Dirkx
Application Software • BYO : kursus “Parallel Systems”, VUB • Public Domain Packages => granularity !! Numerical, well structured non-Numerical, dynamic, ill structured databases (e. g. Google) 2005 ©Erik F. Dirkx
History • 1985 : B Army mainframe + staff : <100 Kops, x 00 MB, n*4800 bps PC to “fine tune” + 1 temporary mil. service : >1 Mips, 20 MB, 10 Mbps + 1 EE/CS student in search for a Ph. D topic • 1990 : “A Parallel Simulation Testbed for Computer Networks” : solved 0. 1, posed 10 questions … • 1992 : IBM T. J. Watson Vulcan/Deep Blue • 1993 : ETL, Tsukuba : Heterogeneous granularity • 1999 : Xilinx, San Jose : Reconfiguration 2005 ©Erik F. Dirkx
Vrije Universiteit Brussel : location • Belgium : a EU experiment avant la lettre ? ? • Holland (A’dam) • France (Paris) • ° [Alamo – 6] • 3 languages (NL, F, D) • 5 governements (w/o county, city !) • NO supercomputer • (Meta) stable ? ? • ~ Free education • 60 km coast / 10 M people, 1 2 L highway • No capital gains tx … • L&H … (Martha ? ? ) • Airforce : F 16 – ECM • CEC location & 1 of the capitals … 2005 ©Erik F. Dirkx
2005 ©Erik F. Dirkx
Alternative • Special purpose device => temporary => point solution ? ? ? $$$$ [design, debug, …] Dynamic environment !? Power (cf. context) ? ! Patent (cf. EU software patent dispute …) 2005 ©Erik F. Dirkx
General Purpose Hardware • P(rocessor) => ALU (compute) + CU (control) • M(emory) => as much as possible => as fast as possible • S(witch) => throughput (telecom !) => latency (telecom ? ) 2005 ©Erik F. Dirkx
- Slides: 29