For Official Use Only HPC Road Map 25

  • Slides: 11
Download presentation
For Official Use Only HPC Road Map 25 November 2019

For Official Use Only HPC Road Map 25 November 2019

Current System

Current System

Current System Cont'd Divided into 2 halls - Production and non-production Introduced dual hall

Current System Cont'd Divided into 2 halls - Production and non-production Introduced dual hall operation – utilising unused non-production hall capacity for production jobs Driver for dual hall - introduction of higher resolution G 3/GE 3 models

Mid Term Upgrade (MTU) Business case planned MTU to Australis in current data centre

Mid Term Upgrade (MTU) Business case planned MTU to Australis in current data centre (DC) Driver from Robust to move MTU to new DC Strategic DC plan was defined and endorsed The contract for the MTU was signed with Cray in November Finalising detailed design presently – to be completed by 20 December 2019

PROD Loads

PROD Loads

Mid Term Upgrade – Australis II Configuration CPU Intel Xeon Skylake 8160 Storage CPU

Mid Term Upgrade – Australis II Configuration CPU Intel Xeon Skylake 8160 Storage CPU microarchitecture Skylake Cray Cluster. Stor L 300 N (x 2) CPU Frequency (GHz) 2. 10 Number of File Systems and Type – 4, Lustre CPU Count per Node 2 CPU cores per node 24 Node Memory Capacity (GB) 192 Node PCIe R Gen 3 Interconnection Network Aries Compute Racks 12 (liquid cooled) * Number of compute nodes 1968 ** Peak FLOP Rate (TF) 6350 Number of service nodes 96 Number of external nodes 38 Number of Scalable Storage Units (SSUs) - 30 Usable Storage Capacity – 20 PB Storage Bandwidth (Simultaneous Read & Write) – 81. 0 GB/s

Mid Term Upgrade Double the compute capability of the existing HPC platforms Will operate

Mid Term Upgrade Double the compute capability of the existing HPC platforms Will operate as a mixed load system – accessing unused capacity across production and non-production Unused capacity will be identified and realised by job scheduler Access to this system will be defined by the appropriate security clearance allocated as per roles Bureau Production workloads destined for new DC – following this, remediation of existing (Australis) platform will occur Allowing Australis to be converted to a full R&D / Development system

Disaster Recovery option not provided Disaster recovery option planned with the next Supercomputer installation

Disaster Recovery option not provided Disaster recovery option planned with the next Supercomputer installation

High Performance Computing (HPC) Reference Group Establish as single point of responsibility to govern

High Performance Computing (HPC) Reference Group Establish as single point of responsibility to govern the operation of both systems Allocation of HPC resources to projects and management of platforms capacity across operations Achieve business outcomes for research and scientific investigations, product development and trials, and production services Will manage all HPC platforms beyond the life of current Government funding arrangement