HWSW Codesign Activities in Cray EMEA Adrian Tate

  • Slides: 36
Download presentation
HW-SW Co-design Activities in Cray EMEA Adrian Tate, Director CERL

HW-SW Co-design Activities in Cray EMEA Adrian Tate, Director CERL

1. What do we mean by co-design? 2. Co-design Activities at Cray EMEA Research

1. What do we mean by co-design? 2. Co-design Activities at Cray EMEA Research talk. For info on Cray products, see Eric Aulagne Cray Inc.

What do we mean by co-design? ● Design through multiple stakeholders ● General design

What do we mean by co-design? ● Design through multiple stakeholders ● General design term ● Some informal usage in HPC also ● Insufficient Cray Inc.

Embedded Systems Co-design W. H. Wolf, Hardware-software co-design of embedded systems 4

Embedded Systems Co-design W. H. Wolf, Hardware-software co-design of embedded systems 4

Co-design in HPC? ● What is the HPC design space? ● US exascale programs

Co-design in HPC? ● What is the HPC design space? ● US exascale programs ● Applications ● ● Vast range Existing code-base ● Hardware ● Commodity ● Software ● Many layers of complexity Ang et al. Exascale Computing and the role of Co-design ● Resulting optimization problem is not feasible ● Instead, informal, multi-stakeholder, inter-connected design 3/3/2021 5

Domain Model of Co-design ● Multiple-stakeholder design ● ● Applications Developers Hardware Architects Programming

Domain Model of Co-design ● Multiple-stakeholder design ● ● Applications Developers Hardware Architects Programming Models Systems Software Architects ● True Co-design in HPC is too hard Applications Programming Models Systems Software Hardware ● Break the infeasible problem into related feasible problems ● Develop a project management framework using this 3/3/2021 6

Applications A Programming Models P System Software S Hardware H A A P P

Applications A Programming Models P System Software S Hardware H A A P P S S S H H H outputs inputs outputs

A 1 A P S A 2 P 0 P 1 S 0 S

A 1 A P S A 2 P 0 P 1 S 0 S 1 H 0 A 0 A 1 P 2 P 1 H A 0 S 0 H 0

Co-design results ● Enforce interactivity at various levels ● Between phases ● Between co-design

Co-design results ● Enforce interactivity at various levels ● Between phases ● Between co-design tasks ● Between domains ● Assess design trade-offs between activities ● Solve a feasible optimization problem ● Inherit structure into related co-design project management 3/3/2021 9

Cray EMEA Research Lab Hardware Advanced Workloads Memory Hierarchy HPC Software Slow-moving Secure Stable

Cray EMEA Research Lab Hardware Advanced Workloads Memory Hierarchy HPC Software Slow-moving Secure Stable Perf-Optimized 3/3/2021 Dataintensive Software Fast-moving Productivity-focused Open Co-design Activities Collaborative R&D projects Centres of Excellence Education programs

Co-design Activity: Running Data-intensive Workflows on Supercomputers A P S S H H Phase

Co-design Activity: Running Data-intensive Workflows on Supercomputers A P S S H H Phase I Define Co-design Hardware and Software Environment 3/3/2021 A A P P Phase II Co-design Applications and Programming Models S H Phase III Co-design Next-gen Hardware Environment 11

Cray’s Vision: The Fusion of Supercomputing and Big & Fast Data Modeling The World

Cray’s Vision: The Fusion of Supercomputing and Big & Fast Data Modeling The World Cray Supercomputers solving “grand challenges” in science, engineering and analytics Math Models Modeling and simulation to provide the highest fidelity virtual reality results Compute HPC workloads 3/3/2021 Data-Intensive Processing High throughput event processing & data capture from sensors, data feeds and instruments Store Data Models Integration of datasets for search, analysis, predictive modeling and knowledge discovery Analyze “mixed” workloads Data-intensive workloads Copyright 2014 Cray Inc. 1 2

General Mixed Workload : Scientific Data Workflow VDL . . . VDL Source Captur

General Mixed Workload : Scientific Data Workflow VDL . . . VDL Source Captur e Stor e Processing Segme nt Feature Extractio n Analysi s

Memory Hierarchy Cray Inc.

Memory Hierarchy Cray Inc.

Cray Shasta ● Current Supercomputer ● ● Not configurable Closed network interfaces Limited support

Cray Shasta ● Current Supercomputer ● ● Not configurable Closed network interfaces Limited support of exotic SW HW optimized for HPC ● Cray Shasta, first truly configurable system ● Choice of ● Cabinet type ● Network ● Processor type ● Operating system ● Memory hierarchy 3/3/2021 15

Computeintensive Read Write intensive Cache 2 M 64 GB 1 TB/s 150 GB/s 500

Computeintensive Read Write intensive Cache 2 M 64 GB 1 TB/s 150 GB/s 500 GB ~10 GB/s HBM 64 GB 50 ns DDR 256 GB 60 ns NV DIMMS 0. 5 TB 5μs Phase Change / 3 d Xpoint 5 TB 10 TB 5 GB/s On-node SSD / FLASH 5 TB Network attached NVRAM HDD 3/3/2021

Data-intensity model Compute bound Memory bound Today: Tomorrow: Read / Write Complexity Data dependencies

Data-intensity model Compute bound Memory bound Today: Tomorrow: Read / Write Complexity Data dependencies Structure Reuse Potential Coarse Data-Intensity Access Pattern Operatio ns Practical Reuse 3/3/2021 Ordering Memory 17

Co-design Model A A P P S S H Phase I Co-design of Programming

Co-design Model A A P P S S H Phase I Co-design of Programming Models and System Software 3/3/2021 H Phase II Influencing Next-gen Hardware Environment 18

Memory-centric Optimization 3/3/2021 19

Memory-centric Optimization 3/3/2021 19

3/3/2021 20

3/3/2021 20

3/3/2021 21

3/3/2021 21

3/3/2021 22

3/3/2021 22

3/3/2021 23

3/3/2021 23

3/3/2021 24

3/3/2021 24

3/3/2021 25

3/3/2021 25

3/3/2021 26

3/3/2021 26

2 2 4 2 3

2 2 4 2 3

LRU Bit-PLRU Tree-PLRU No HW prefetch Tree-PLRU Cray Inc.

LRU Bit-PLRU Tree-PLRU No HW prefetch Tree-PLRU Cray Inc.

Adjiashvili et al. Model-driven tiling using Associativity Lattice Cray Inc.

Adjiashvili et al. Model-driven tiling using Associativity Lattice Cray Inc.

Heterogeneity of Hardware ● Shasta as partitions ● Bad match for bulk synchronous /

Heterogeneity of Hardware ● Shasta as partitions ● Bad match for bulk synchronous / static scheduled approach 3/3/2021 30

Tasking: towards dynamic execution Name Interface Project focus Heterogeneous support Applications Support for existing

Tasking: towards dynamic execution Name Interface Project focus Heterogeneous support Applications Support for existing code Comments HPX C++ (HPX-3) C, C++ (HPX-5) Complete systemand node-level framework CUDA (HPX-3) None (HPX-5) LULESH, CMA, Mini. Ghost, Nbody Existing Open. MP code will compete for resources. • • Relatively strong early adoption Two different versions of HPX is counterproductive for everyone Legion C++ with heavy use of Legion objects Complete systemand node-level framework CUDA S 3 D, SNAP Can replace parts of existing MPI code with Legion • • • Very unstable API Questions over performance Massive burden on user to learn API Star. PU C Node-level heterogeneous scheduling CUDA, Open. CL Factorisation (LU, Cholesky, QR) utilising CPU+GPU Open. MP to Star. PU compiler • Simple and powerful heterogeneous scheduler. Little work towards dynamic internode load balancing Node-level heterogeneity CUDA, Open. CL Cholesky Easy translation from Open. MP Omp. Ss C, C++ • • • Use of annotations makes task-based parallelism relatively easy. Limited functionality for distributed workloads. Tasking requirements: Challenges : • Dynamic load balancing across entire system. • No single framework has invented a complete solution yet. • Scheduler awareness of heterogeneity within a node. • Hard to demonstrate usefulness, task-parallelism better at scale • Simple but powerful API, as complex abstractions deter users. • Support for easily porting or integrating existing parallel code. and/or with complex applications. • Some APIs currently require a large time investment to learn and • Runtime support and awareness of data locality. are extremely unstable. Cray Inc.

MATCHe. R : Memory-Aware Task-based Compute on Heteroegenous Resources C/C++ Application HPX, Star. PU

MATCHe. R : Memory-Aware Task-based Compute on Heteroegenous Resources C/C++ Application HPX, Star. PU I/O Complexity t t 1 1 Coarse-Grain Data-Intensity r w p d h h 2 h 3 h 4 h 5 h 1 Data-Centric Transformation t 1 r w p d h 3 3/3/2021 r w p d h 1

Human Brain Project PCP: Interactive Visualization NVRAM SIMULATION Visualization Interconnect interactivity PFS 3/3/2021

Human Brain Project PCP: Interactive Visualization NVRAM SIMULATION Visualization Interconnect interactivity PFS 3/3/2021

Scalable Interactive Visualization with XC and CS - Big scientific particle data Interactive remote

Scalable Interactive Visualization with XC and CS - Big scientific particle data Interactive remote visualization on XC and CS machines Exploiting CPU, GPU, KNC or KNL Scalable order-integrated splatting in distributed memory environments via Splotch Parallel Visualization Pipeline ● ● Distribute Render Composite View Billion particle galaxy simulation visualized interactively (>10 FPS) on CRAY XC, data courtesy of Lucio Mayer et al, Institute for Computational Science, University of Zurich Cray Inc.

Co-design of a Visualization Platform A A P P S S H H Phase

Co-design of a Visualization Platform A A P P S S H H Phase I Co-design of Hardware and Software Environment 3/3/2021 Phase II Co-design Applications and Programming Models 35

Cray Inc.

Cray Inc.