HPC IN THE CLOUDS David Mayhew AMD Fellow

Disclaimer • I am not announcing, implying, subtly suggesting, or otherwise even hinting at

HPC System Classification • Big dollar HPC, largely government funded, occasionally created super-scale, top-10

Non-CPU Acceleration • Four major computational components – – • Compute Storage Network User

GPU Acceleration • GPU is not every non-standard architecture accelerator, it is the only

GPUs are Good at Graphics • GPU architectures are driven by the need to

GPUs are Not Ideal for HPC • Large set-up costs – Creates minimum useful

GPGPU Conjecture • There is enough work that can be significantly accelerated by continually

Servers, HPC, and GPUs • GP-GPU conjecture, even if true, may not create the

Scale-out-Server (Cloud) and GPUs • If GP-GPU conjecture is true for servers, then: –

System Architectures • HPC • Scale-Out Server (cloud) – Lots of APUs • Preferably

Cloud / HPC Architectural Delta • Processors – High speed critical to both camps

How Big is Big? • HPC wants huge – HPC’s size goals are typically

I have a $1 M Home Theater and 1 Blu-ray Disk • The solution-problem

Why will Amasoft and Googizon care about HPC? • Perhaps more importantly, why will

HPC in the Cloud • Does HPC really need dedicated HPC hardware? – APUs

Slides: 17

Download presentation

HPC IN THE CLOUDS David Mayhew AMD Fellow

Disclaimer • I am not announcing, implying, subtly suggesting, or otherwise even hinting at anything that has anything to do with anything that AMD may, will, may not, absolutely will not, possibly might, be thinking about, working on, building products to support, and/or releasing products, in the future, present, or just to be safe in the past. [Engineering Legalize]

HPC System Classification • Big dollar HPC, largely government funded, occasionally created super-scale, top-10 systems – Exascale is emerging example – Technology focused – Big dollars buy expensive solutions • Large NRE • Very little amortizable volume • Big volume HPC – Compute/dollar primary focus • Compute/cap-ex$ • Compute/op-ex$ – Many systems per year • May represent reasonable to significant market opportunity based on how such systems are defined moving forward David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 3

Non-CPU Acceleration • Four major computational components – – • Compute Storage Network User interface All four use acceleration – Compute • FPU • Large-core / small-core ISA homogeneous / heterogeneous – Storage • Embedded processors on network cards • With exception of XOR accelerations, these are standard architectures – Network • Full-stack offload • With a few exceptions, these are standard architectures with sophisticated/accelerated memory structures – User interface • GPU – Custom, graphics specific acceleration • Everything else – Standard embedded processor architectures David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 4

GPU Acceleration • GPU is not every non-standard architecture accelerator, it is the only non-standard architecture accelerator – Hardware is expensive – General-purpose solutions offer the advantage of being general purpose • All architectures have Turing power and the only issue is efficiency – – Speed Cost » Cap-ex » Op-ex – Special-purpose solutions have must offer a significant speed and/or cost advantage over general-purpose ones • General-purpose processing is very inexpensive • NREs (design and implementation costs) tend to be very expensive and getting more so • GPUs are the only accelerators with sufficient volumes and performance to challenge general-purpose architectures – Cost driven reality • PC market pays the NRE • 300 M unit volume keeps prices down – GPUs are ridiculously fast at parallel computation and they have excellent energy usage profiles • They consume lots of energy, but very little per FLOP • High efficiency requires that most of the FLOPs get used David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 5

GPUs are Good at Graphics • GPU architectures are driven by the need to be good at graphics – HPC will not pay the bills, not even a small part • GP-GPU may pay something • Single-precision is overkill for graphics • Accurate calculation is a luxury for graphics – One bad pixel don’t spoil the whole bunch • The dominant users of GPUs (graphics) will not support architectures that sacrifice too much graphics performance to secondary calculation models – Until graphics become “fast enough” • 1080 p may represent a slowing of screen resolution increases • 90% solutions will offer silicon opportunities David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 6

GPUs are Not Ideal for HPC • Large set-up costs – Creates minimum useful problem scale • Fusion addresses this issue, but at cost of maximum performance • SIMD partitioning addresses this issue – Sacrifices pure graphics to GPGPU » Along with better double-precision performance and correct full-precision results • Shrinking SIMD size addresses this issue – Individually programmable FPUs are arguably the ultimate theoretical solution » Except that argument would almost certainly be lost – Creates addressing issues • GPUs create independent address spaces – Require management of data location and coherence • AMD’s Fusion System Architecture (FSA) addresses this issue – Pointer is a pointer, coherency solution • Complex programming model – Programming a GPU is per implementation “black magic” • Open. CL addresses this issue – Not easy, but usable – Primary building block for emerging set of GPGPU enabled libraries and application specific programming languages and environments David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 7

GPGPU Conjecture • There is enough work that can be significantly accelerated by continually improving GP-GPU architectures to make GPUs an important and indispensible building block for computation (as opposed to graphical display management) on standard systems, both client (desktop, laptop, netbook) and server • The conjecture is currently unproven, particularly for server – The issue is less significant for clients because of they need graphics in any case, though the characteristics of the GPU may differ for only graphics • The validity of the GP-GPU Conjecture will have a significant impact on the future of HPC, particularly the “Big Volume” variety – Availability of cost-effective, massively-parallel, floating-point and scalar accelerators crucial to many HPC workloads • I apologize profusely to the FPGAs crowd who believe that FPGAs are going to somehow become relevant in this space, but I believe that it is GPUs or nothing (nothing meaning that general-purpose processors do everything). – FPGAs have been 3 years away from being standard system components for the last 10 years and will be for the next 10 years – Die-stacking may affect this bit of cynicism/pessimism » A layer of FPGA in a standard, vertical processor/memory stack may make FPGAs inexpensive enough and generally useful enough to achieve general system integration • I got nothing here, but could not resist adding one more level of indention David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 8

Servers, HPC, and GPUs • GP-GPU conjecture, even if true, may not create the optimum HPC centric CPU/GPU ratio in standard server centric fusion APUs (AMD term for fused CPU-GPU components, stands for Accelerated Processing Unit) – HPC may benefit from GP-GPU to a much greater level than standard servers • If 1 GPU SIMD per CPU ratio is effective for servers an 8 GPU SIMD per GPU ratio may be better for HPC • The issue becomes a socket problem not a SIMD problem – The required number of SIMDs can be marshaled, it just takes more sockets to marshal them – Interconnect becomes a more significant issue • Client centric APUs may have superior HPC GPU/CPU ratios – Clients will have relatively more need for graphics and relatively less need for compute as more computation shifts to the cloud – Massed client parts may be an interesting HPC solution • Lower cost components with superior GPU ratios • Can HPC live with client RAS? David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 9

Scale-out-Server (Cloud) and GPUs • If GP-GPU conjecture is true for servers, then: – Standard server components will have massive parallel computation capability – Standard programming tools will be available for those computational capabilities • • • Accelerated standard libraries General-purpose and domain specific languages, Development environments Debugging tools Profilers • Analogy: – GPUs are engines – System software tools are the tires – It is the tires that determine how much of the engines power can be usefully applied • Server APUs will mean that HPC-ish system characteristics will become standard and they will get the attention of the masses David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 10

System Architectures • HPC • Scale-Out Server (cloud) – Lots of APUs • Preferably high-performance (really would like them to finish calculations somewhat before they are given the problem) – Cost issue (cooling is a cost) • Preferably low-power (really would like them to generate power) – Performance issue • Preferably low-cost (really would like to be paid to take them) – Lots of memory – Lots of interconnect – Lots of APUs • Preferably high-performance Preferably low-power Preferably low-cost – Lots of memory – Lots of interconnect – Lots of storage • This is the same list as the on the left, minus the additional commentary • More slower processors requires more interconnect – Lots of storage David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 11

Cloud / HPC Architectural Delta • Processors – High speed critical to both camps – Some data is pointing towards faster is better again • Fewer fast processors may be cost effective (cap-ex and op-ex) than more slower processors • High-speed interconnect is problematic – The fewer the connections the easier the problem • Memory – Virtualization demands maximally sized per processor memory footprints • Interconnect – Distributed storage and Hadoop style processing is advantaged by high-speed interconnect with large cross-sectional bandwidth • Storage – HPC can frequently live without per node persistent storage – If the storage is there and plentiful for other reasons, perhaps HPC could employ local persistent storage to improve its set-up, tear-down performance David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 12

How Big is Big? • HPC wants huge – HPC’s size goals are typically restrained by budgetary considerations • Hadoop wants 4 k cores – Unfortunately with modern CPUs this is only 500 processor sockets – Conversely, 500 -1000 node server clusters should be commonplace and relatively inexpensive to use • Hadoop benefits from bandwidth within the cluster • HPC also needs bandwidth between clusters to create virtual supercomputers – Cluster size is large enough that the number of inter-cluster links necessary to create a virtual supercomputer is low, but the bandwidth required on those links is very high • How big a virtual supercomputer is interesting if it can be rented by the hour? – Normal supercomputer metrics probably apply, the bigger the machine the higher the per-hour cost David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 13

I have a $1 M Home Theater and 1 Blu-ray Disk • The solution-problem dichotomy – The people interested in the problem solving hardware frequently much less interested in the problems their hardware solves than in the hardware itself • – The people with problems to solve are frequently not particularly interested in the system characteristics of the solution platform • • But are too often forced to become experts in arcane system characteristics of one-off solutions Who will drive HPC moving forward? – – • The hardware/system becomes the solution The people with HPC hardware? The people with HPC software? If Amazon/Google/HP/Microsoft/Verizon/… become the dominant providers of 500 -1000 node clusters for cloud, but those cloud infrastructures can be repurposed to perform HPC calculations, what happens to the relevance of dedicated HPC hardware systems? – If I connect 10 clusters with 1000 processors/cluster with 8 CPUs and 8 GPU SIMDs/processor, with a 100 Gb/s mesh, then I have a 80, 000 processor 80, 000 SIMD HPC cluster that I can rent by the hour • • 10 k nodes * 8 cores * 3 ghz/core = 240 k gigahertz * 1 hour = 240 kg. H * $0. 05/off hour g. H = $12, 000 No capital acquisition – – – • • • No power bill No IT staff Nothing for cool “show-and-tell” sessions with funding agencies and colleagues/students – – • No machine room No coolers No rack after rack of hardware Yours will not longer be bigger than mine Can bragging about virtual hardware ever approach the fun of bragging about real hardware: “I set up a 24 k node virtual supercomputer on Googizon and completed my calculation in 2 hours”, “oh yea, I set up a 40 k node virtual supercomputer on Amasoft and did that calculation in 1 hour” Migration of HPC system architects to cloud system architects? David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 14

Why will Amasoft and Googizon care about HPC? • Perhaps more importantly, why will they provide the inter-cluster interconnect that will make their base clusters usable by HPC clients? • Money • They want the money you pour into your custom HPC hardware systems – They can extend their solutions to satisfy the demands of HPC (from an interconnect perspective) for a fraction of their total system cost – If HPC represents a reasonable, off-peak processing load that allows them to recover cost for otherwise unused processing cycles, then it is a win-win for HPC problem solvers and scale-out-server hardware providers – Hardware dollars become largely software dollars David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 15

HPC in the Cloud • Does HPC really need dedicated HPC hardware? – APUs may address, to a significant degree, the performance demands of HPC at the micro level – Massive scale cloud computing infrastructures may address the demands of HPC at the macro level • Real problem moving forward may be perceptual – Shift to virtual supercomputer may challenge funding models – Cost of computation is no less real, but without real hardware costs driving funding proposals, dollars may be harder to acquire – Conversely, more work may be possible with fewer dollars using virtual systems • Cost of software mistakes exacerbated when virtual system is leased – I would hate to be the graduate student that kicked off the creation of a $10 k virtual supercomputer only to find that the software had a bug in it David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 16

Trademark Attribution AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. © 2011 Advanced Micro Devices, Inc. All rights reserved. David Mayhew / AMD / 14 Jan 2011 IMA: High Performance Computing and Emerging Architectures 16