SGI Contributions to Supercomputing by 2010 Steve Reinhardt

SGI Contributions to Supercomputing by 2010 Steve Reinhardt Director of Engineering spr@sgi. com

Supercomputing Aspects of SGI HPC Scalable servers and superclusters Visualization • “VAN” • Deliver images wherever the users are • Enable collaboration • SGI® Origin® family • SGI® Altix™ 3000 family SGI® NUMAflex™ • Deliver data wherever the users are • CXFS/WAN demo at SC’ 02 • Each server reads directly, at channel speeds • Biggest installed configuration NOTE: . 5 PB No Data Access “enterprise” references

SGI in HPC • Memory is unifying theme • globally addressable up to O(PB) • incorporating varied processing types • latency (-> 500 ns for 10 KP) • bandwidth (local stride-1 B: F -> 2. 0+ local gather/scatter B: F. 5 -1. 0 remote bisection BW B: F ->. 3) • Sustained performance • differentiated scaling (latency & bandwidth) • better memory interface • new synchronization substrate • Raise the level of programming abstraction • UPC/CAF (near-term) • parallel Matlab (radical)

SGI in HPC • SGI Origin® family • MIPS processors, Irix OS • exploit low power consumption, ISA control • SGI Altix™ family • IPF processors, Linux OS • exploit SGI interconnect, with industry-standard ISA and rapid OS maturation

Profitability high Balancing High Innovation and Profitability low “Death Valley”: enough differentiation to have higher cost but not enough to have high value low Differentiation high

System / Component Differentiation Processor Memory Interconnect OS System Cost System Value

Ideal Differentiation Processor Memory Interconnect OS System Cost System Value

SGI Origin series Processor Memory Interconnect OS System Cost System Value

Quadrics cluster Processor Memory Interconnect OS System Cost System Value

IBM SP 3 system Processor Memory Interconnect OS System Cost System Value

SGI Altix system Processor Memory Interconnect OS System Cost System Value

STREAM Triad Results • World-record result for a µP-based system; fourth overall • . 8 B: F (6. 4 GB/s shared by 2 x 4 GF processors) • Single kernel; NUMA placement support in Linux

Interconnect Scaling MPI bandwidth versus distance (MB/s) Comin g soon

Altix 3000 Throughput Performance Throughput of 4 jobs, each 8 P, crash application System: Altix 3000, 32 P, 64 GB, XVM, TP 900 Individual jobs in the throughput mix are between 0. 4% and 1. 8 % slower than the standalone case

Summary: SGI for HPC • Long-term directions –Memory: globally addressable, high BW, low latency –Strong delivered performance • differentiated scaling (latency & bandwidth) • better memory interface • new synchronization substrate –Raise the level of programming abstraction • UPC/CAF (near-term); parallel Matlab (radical) • Near-term deliverables –Altix 3000 system • distinguished performance