Sierra The LLNL IBM CORAL System Bronis R
Sierra: The LLNL IBM CORAL System Bronis R. de Supinski Chief Technology Officer Livermore Computing October 6, 2017 LLNL-PRES-738369 This work was performed under the auspices of the U. S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC 52 -07 NA 27344. Lawrence Livermore National Security, LLC
Advanced Technology Systems (ATS) Sierra will be the next ASC ATS platform Sequoia (LLNL) ATS 1 – Trinity (LANL/SNL) ATS 2 – Sierra (LLNL) ATS 3 – Crossroads (LANL/SNL) System Delivery ATS 4 – (LLNL) Commodity Technology Systems (CTS) ATS 5 – (LANL/SNL) Tri-lab Linux Capacity Cluster II (TLCC II) CTS 1 Procure& Deploy Use CTS 2 Retire ‘ 13 ‘ 14 ‘ 15 ‘ 16 ‘ 17 ‘ 18 Fiscal Year ‘ 19 ‘ 20 ‘ 21 ‘ 22 ‘ 23 Sequoia and Sierra are the current and next-generation Advanced Technology Systems at LLNL-PRES-738369 2
Sierra is part of CORAL, the Collaboration of Oak Ridge, Argonne and Livermore LLNL’s IBM Blue Gene Systems Modeled on successful LLNL/ANL/IBM Blue Gene partnership (Sequoia/Mira) BG/L BG/P Dawn BG/Q Sequoia NRE contract Long-term contractual partnership with 2 vendors 2 awardees for 3 platform acquisition contracts 2 nonrecurring eng. contracts RFP ORNL Summit contract (2017 delivery) LLNL Sierra contract (2017 delivery) NRE contract ANL Aurora contract (2018 delivery) CORAL is the next major phase in the U. S. Department of Energy’s scientific computing roadmap and path to exascale computing LLNL-PRES-738369 3
The Sierra system that will replace Sequoia features a GPU-accelerated architecture Compute Node Compute Rack Compute System 4320 nodes Standard 19” 1. 29 PB Memory 2 IBM POWER 9 CPUs Warm water cooling 240 Compute Racks 4 NVIDIA Volta GPUs 125 PFLOPS NVMe-compatible PCIe 1. 6 TB SSD ~12 MW 256 Gi. B DDR 4 16 Gi. B Globally addressable HBM 2 associated with each GPU Components Coherent Shared Memory IBM POWER 9 • Gen 2 NVLink NVIDIA Volta • 7 TFlop/s • HBM 2 • Gen 2 NVLink LLNL-PRES-738369 Mellanox Interconnect Single Plane EDR Infini. Band 2 to 1 Tapered Fat Tree GPFS File System 154 PB usable storage 1. 54 TB/s R/W bandwidth 4
Outstanding benchmark analysis by IBM and NVIDIA demonstrates the system’s usability Projections included code changes that showed tractable annotation-based approach (i. e. , Open. MP) will be competitive LLNL-PRES-738369 5
Sierra system architecture details have recently been finalized with Go decision Sierra Nodes POWER 9 processors per node GV 100 (Volta) GPUs per node Node Peak (TFLOP/s) System Peak (PFLOP/s) Node Memory (Gi. B) System Memory (Pi. B) Interconnect Off-Node Aggregate b/w (GB/s) Compute racks Network and Infrastructure racks Storage Racks Total racks These are working numbers; Peak Power (MW) u. Sierra 4, 320 684 2 2 4 4 29. 1 125 19. 9 320 1. 29 0. 209 2 x IB EDR 45. 5 240 38 13 4 24 4 final 277 configuration 46 ~12 ~1. 8 the will only be set once the system is fully installed LLNL-PRES-738369 6
LLNL and ASC platform chose to use a tapered fat-tree for Sierra’s network § § § Full bandwidth from dual ported Mellanox EDR HCAs to TOR switches Half bandwidth from TOR switches to director switches An economic trade-off that provides approximately 5% more nodes This decision, counter to prevailing wisdom for system design, benefits Sierra’s planned UQ workload: 5% more UQ simulations at performance loss of < 1% LLNL-PRES-738369 7
NVIDIA Volta GPUs (GV 100) provide the bulk of Sierra’s compute capability SMs 80 FP 64 Units (per SM) 32 FP 32 Units (per SM) 64 Tensor Cores (per SM) 8 Register File (per SM) 256 Ki. B L 1/Shared Memory (per SM) 128 Ki. B Double Precision Peak (TFlop/s) 7 (7. 5) Single Precision Peak (TFlop/s) 14 (15) Tensor Op Peak (TOp/s) 120 HBM 2 Bandwidth (GB/s) 898 NVLINK BW to CPU/Other GPU 75 (60) To realize Sierra’s full potential, we must exploit the tensor operations. The commoditization of machine learning will make this an enduring challenge. LLNL-PRES-738369 8
Sierra and its EA systems are beginning an accelerator-based computing era at LLNL § The advantages that led us to select Sierra generalize — Power efficiency — Network advantages of “fat nodes” — Balancing capabilities/costs implies complex memory hierarchies § Planning a similar, unclassified, M&IC resource — Same architecture as Sierra — Up to 25% of Sierra’s capability § Exploring possibilities for other GPU-based resources — Not necessarily NVIDIA-based — May support higher single precision performance We have multiple projects planned to foster a healthy ecosystem LLNL-PRES-738369 9
- Slides: 10