Parallel Hardware Parallel Applications Parallel Software The Parallel

  • Slides: 12
Download presentation
Parallel Hardware Parallel Applications Parallel Software The Parallel Computing Laboratory Krste Asanovic, Ras Bodik,

Parallel Hardware Parallel Applications Parallel Software The Parallel Computing Laboratory Krste Asanovic, Ras Bodik, Jim Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Edward Lee, Nelson Morgan, George Necula, Dave Patterson, Koushik Sen, John Wawrzynek, David Wessel, and Kathy Yelick March 17, 2008 1

A Parallel Revolution, Ready or Not o Embedded: per product ASIC to programmable platforms

A Parallel Revolution, Ready or Not o Embedded: per product ASIC to programmable platforms Multicore chip most competitive path ¨ o PC, Server: Power Wall + Memory Wall = Brick Wall Amortize design costs + Reduce design risk + Flexible platforms End of way built microprocessors for last 40 years New Moore’s Law is 2 X processors (“cores”) per chip every technology generation, but same clock rate ¨ “This shift toward increasing parallelism is not a triumphant stride forward based on breakthroughs …; instead, this … is actually a retreat from even greater challenges that thwart efficient silicon implementation of traditional solutions. ” The Parallel Computing Landscape: A Berkeley View, Dec 2006 o Sea change for HW & SW industries since changing the model of programming and debugging 2

P. S. Parallel Revolution May Fail o John Hennessy, President, Stanford University, 1/07: “…when

P. S. Parallel Revolution May Fail o John Hennessy, President, Stanford University, 1/07: “…when we start talking about parallelism and ease of use of truly parallel computers, we're talking about a problem that's as hard as any that computer science has faced. … I would be panicked if I were in industry. ” “A Conversation with Hennessy & Patterson, ” ACM Queue Magazine, 4: 10, 1/07. o 100% failure rate of Parallel Computer Companies ¨ o Convex, Encore, Inmos (Transputer), Mas. Par, NCUBE, Kendall Square Research, Sequent, (Silicon Graphics), Thinking Machines, … What if IT goes from a growth industry to a replacement industry? ¨ If SW can’t effectively use 32, 64, . . . cores per chip SW no faster on new computer Only buy if computer wears out 3

Par Lab Research Overview y t i v ti c u r d o

Par Lab Research Overview y t i v ti c u r d o Pr Laye cy n e i c i Eff ayer L OS. h Arc Personal Image Hearing, Parallel Speech Health Retrieval Music Browser Motifs Composition & Coordination Language (C&CL) C&CL Compiler/Interpreter Parallel Libraries Efficiency Languages Parallel Frameworks Sketching Static Verification Type Systems Directed Testing Autotuners Dynamic Legacy Communication & Schedulers Checking Code Synch. Primitives Efficiency Language Compilers Debugging OS Libraries & Services with Replay Legacy OS Hypervisor Multicore/GPGPU RAMP Manycore Correctness t lica p p A s n o i Diagnosing Power/Performance Easy to write correct programs that run efficiently on manycore 4

Compelling Client Applications Image Query by example Image Database Music/Hearing Robust Speech Input 1000’s

Compelling Client Applications Image Query by example Image Database Music/Hearing Robust Speech Input 1000’s of images Parallel Browser Personal Health 5

“Motif" Popularity o (Red Hot Blue Cool) Cool How do compelling apps relate to

“Motif" Popularity o (Red Hot Blue Cool) Cool How do compelling apps relate to 13 motifs? 6

Developing Parallel Software o o 2 types of programmers 2 layers Efficiency Layer (10%

Developing Parallel Software o o 2 types of programmers 2 layers Efficiency Layer (10% of today’s programmers) Expert programmers build Frameworks & Libraries, Hypervisors, … ¨ “Bare metal” efficiency possible at Efficiency Layer ¨ o Productivity Layer (90% of today’s programmers) Domain experts / Naïve programmers productively build parallel apps using frameworks & libraries ¨ Frameworks & libraries composed to form app frameworks ¨ o Effective composition techniques allows the efficiency programmers to be highly leveraged Create language for Composition and Coordination (C&C) 7

Par. Lab OS Research Logical System View Web Browser Plug-in Root Suspended Partition Manager

Par. Lab OS Research Logical System View Web Browser Plug-in Root Suspended Partition Manager Root Partition Manager Web Browser Plug-in 1 Plug-in 2 Plug-in Devic e Driver Service Root Partition implements policy to timeshare partitions Suspended partition (passive data structure in memory) is not mapped by Root onto physical cores. Partition Hypervisor Manycore Hardware Physical System View 8

Four separate on-chip network types o Control networks combine 1 -bit signals in combinational

Four separate on-chip network types o Control networks combine 1 -bit signals in combinational tree for interrupts & barriers o Active message networks carry register-register messages between cores o L 2/Coherence network connects L 1 caches to L 2 slices and indirectly to memory o Memory network connects L 2 slices to memory controllers I/O and accelerators potentially attach to all network types. Flash replaces rotating disks. Only high-speed I/O is network & display. Control/Barrier Network Active Message Network L 1 I$ Core L 1 D$ L 2/Coherence Network L 2 Cntl. L 2 L 2 Tags RAM Memory Network MEMC DRAM Flash Accelerators and/or I/O interfaces Infini. Core Architecture Overview I/O Pins 9

1008 Core “RAMP Blue” o 1008 = 12 32 -bit RISC cores / FPGA,

1008 Core “RAMP Blue” o 1008 = 12 32 -bit RISC cores / FPGA, 4 FGPAs/board, 21 boards ¨ Simple n o Full star-connection between modules NASA Advanced Supercomputing (NAS) Parallel Benchmarks (all class S) ¨ o Micro. Blaze soft cores @ 90 MHz UPC versions (C plus shared-memory abstraction) CG, EP, IS, MG RAMPants creating HW & SW for manycore community using next gen FPGAs Chuck Thacker & Microsoft designing next boards ¨ 3 rd party to manufacture and sell boards: 1 H 08 ¨ Gateware, Software BSD open source ¨ o RAMP Gold for Par Lab: new CPU 10

Physical Par Lab - 5 th Floor Soda 11

Physical Par Lab - 5 th Floor Soda 11

o o o Apps Easy to write correct programs that run efficiently and scale

o o o Apps Easy to write correct programs that run efficiently and scale up on manycore Personal Image Hearing, Parallel Speech Health Retrieval Music Browser Motifs Diagnosing Power/Performance Bottlenecks o Productivity o Efficiency o Whole IT industry has bet its future on parallelism (!) Try Apps-Driven vs. CS Solution-Driven Research Motifs as anti-benchmarks Efficiency layer for ≈10% today’s programmers Productivity layer for ≈90% today’s programmers C&C language to help compose and coordinate Autotuners vs. Compilers OS & HW: Primitives Diagnose Power/Perf. March 19 announcement UPCRC winner from top 25 CS departments Arch. OS o Composition & Coordination Language (C&CL) C&CL Compiler/Interpreter Parallel Libraries Efficiency Languages Parallel Frameworks Sketching Static Verification Type Systems Directed Testing Correctness Par. Lab Summary Autotuners Legacy Code Schedulers Communication & Synch. Primitives Efficiency Language Compilers Legacy OS Multicore/GPGPU OS Libraries & Services Hypervisor RAMP Manycore Dynamic Checking Debugging with Replay 12