Palacios and Kitten New High Performance Operating Systems

  • Slides: 22
Download presentation
Palacios and Kitten: New High Performance Operating Systems For Scalable Virtualized and Native Supercomputing

Palacios and Kitten: New High Performance Operating Systems For Scalable Virtualized and Native Supercomputing John R. Lange and Kevin Pedretti Trammell Hudson, Peter Dinda, Zheng Cui, Lei Xia, Patrick Bridges, Andy Gocke, Steven Jaconette, Mike Levenhagen and Ron Brightwell Northwestern University Sandia National Labs University of New Mexico

Summary • Palacios – First VMM for scalable HPC – Open Source and available

Summary • Palacios – First VMM for scalable HPC – Open Source and available • Kitten – First open source Lightweight Kernel for High Performance Computing (HPC) – Open Source and available • Proved HPC virtualization is effective at scale – Performance within 5% of native – Largest scale study of virtualization 2

What is a virtual machine? • Run an OS as an application – Run

What is a virtual machine? • Run an OS as an application – Run multiple OS environments on a single machine – Start, stop, pause – Can easily move entire OS environments Page Tables CPU state Hardware Application Guest Application OS VMM Hardware Guest OS Page Tables CPU state Hardware Guest OS Host OS/VMM Emulate Hardware 3

What are VMMs currently used for? • • Server Consolidation Fault tolerance Legacy application

What are VMMs currently used for? • • Server Consolidation Fault tolerance Legacy application support Debugging Isolation Virtual appliances Failover and disaster recovery $7. 58 Billion $16. 70 Billion • Market size – 2007: $5. 5 billion – 2011: $11. 7 billion 4

High Performance Computing (HPC) • Large scale simulations to solve Big Problems 5

High Performance Computing (HPC) • Large scale simulations to solve Big Problems 5

Virtualization in HPC • Fault tolerance – Red. Storm MTBI target: 50 hours –

Virtualization in HPC • Fault tolerance – Red. Storm MTBI target: 50 hours – Red. Storm Min TTR: 30 minutes – 1 hour A. B. Nagarajan, F. Mueller, C. Engelmann, and S. L. Scott Proactive Fault Tolerance for HPC with Xen Virtualization ICS 2007 • Broader usage – Allow applications to select best OS • Only if it doesn’t degrade performance… – Tightly coupled parallel applications – Very large scale 6

Palacios VMM • OS-independent embeddable virtual machine monitor • Developed at Northwestern and University

Palacios VMM • OS-independent embeddable virtual machine monitor • Developed at Northwestern and University of New Mexico • Open source and freely available – Downloaded over 1000 times as of July 2009 • Users: – Kitten: Lightweight supercomputing OS from Sandia National Labs – MINIX 3 – Modified Linux versions • Successfully used on supercomputers, clusters (Infiniband Ethernet), and servers http: //www. v 3 vee. org/palacios 7

Palacios as an HPC VMM • Minimalist interface – Suitable for an LWK •

Palacios as an HPC VMM • Minimalist interface – Suitable for an LWK • Compile and runtime configurability – Create a VMM tailored to specific environments • Low noise • Contiguous memory pre-allocation • Passthrough resources and resource partitioning 8

Lightweight Kernel Timeline 1991 – Sandia/UNM OS (SUNMOS), n. Cube-2 1991 – Linux 0.

Lightweight Kernel Timeline 1991 – Sandia/UNM OS (SUNMOS), n. Cube-2 1991 – Linux 0. 02 1993 – SUNMOS ported to Intel Paragon (1800 nodes) 1993 – SUNMOS experience used to design Puma First implementation of Portals communication architecture 1994 – Linux 1. 0 1995 – Puma ported to ASCI Red (4700 nodes) Renamed Cougar, productized by Intel 1997 – Stripped down Linux used on Cplant (2000 nodes) Difficult to port Puma to COTS Alpha server Included Portals API 2002 – Cougar ported to ASC Red Storm (13000 nodes) Renamed Catamount, productized by Cray Host and NIC-based Portals implementations 2004 – IBM develops LWK (CNK) for BG/L/P (106000 nodes) 2005 – IBM & ETI develop LWK (C 64) for Cyclops 64 (160 cores/die)

Kitten: An Open Source LWK • Better match for user expectations – Provides mostly

Kitten: An Open Source LWK • Better match for user expectations – Provides mostly Linux-compatible user environment • Including threading – Supports unmodified compiler toolchains and ELF executables • Better match vendor expectations – Modern code-base with familiar Linux-like organization • Drop-in compatible with Linux – Infiniband support • End-goal is deployment on future capability system http: //software. sandia. gov/trac/kitten 10

Complexity • Scalable HPC performance requires minimal overhead Component Lines of code Kitten ~33,

Complexity • Scalable HPC performance requires minimal overhead Component Lines of code Kitten ~33, 000 Palacios ~28, 000 Total ~61, 000 Xen: 580 k lines (50 k – 80 k core) KVM: 50 k-60 k lines + Kernel dependencies (? ? ) + User level devices (180 k) 11

HPC Performance Evaluation • Virtualization is very useful for HPC, but… Only if it

HPC Performance Evaluation • Virtualization is very useful for HPC, but… Only if it doesn’t hurt performance • Virtualized Red. Storm with Palacios – Evaluated with Sandia’s system evaluation benchmarks 17 th fastest supercomputer Cray XT 3 38208 cores ~3500 sq ft 2. 5 Mega. Watts $90 million 12

Virtualized performance (Catamount) Within 5% Scalable 13 HPCCG: conjugant gradient solver

Virtualized performance (Catamount) Within 5% Scalable 13 HPCCG: conjugant gradient solver

Comparison of Operating Systems Shadow Paging Compute Node Linux Catamount 14 HPCCG: conjugant gradient

Comparison of Operating Systems Shadow Paging Compute Node Linux Catamount 14 HPCCG: conjugant gradient solver

Comparison of Operating Systems Compute Node Linux Catamount 15 CTH: multi-material, large deformation, strong

Comparison of Operating Systems Compute Node Linux Catamount 15 CTH: multi-material, large deformation, strong shockwave simulation

Large Scale Study • Evaluation on full Red. Storm system – 12 hours of

Large Scale Study • Evaluation on full Red. Storm system – 12 hours of dedicated system time on full machine – Largest virtualization performance scaling study to date • Measured performance at exponentially increasing scales – Up to 4096 nodes • Publicity – New York Times – Slashdot – HPCWire – Communications of the ACM – PC World 16

Scalability at Large Scale (Catamount) Within 3% Scalable CTH: multi-material, large deformation, strong shockwave

Scalability at Large Scale (Catamount) Within 3% Scalable CTH: multi-material, large deformation, strong shockwave simulation 17

Commodity Systems • Kitten and Palacios fully support commodity systems – Infiniband clusters –

Commodity Systems • Kitten and Palacios fully support commodity systems – Infiniband clusters – Ethernet servers – Generic PC hardware • Palacios embeddable in many OSes – – Kitten MINIX 3 Linux Geek. OS 18

Infiniband on Commodity Linux (Linux guest on IB cluster) 19 2 node Infiniband Ping

Infiniband on Commodity Linux (Linux guest on IB cluster) 19 2 node Infiniband Ping Pong bandwidth measurement

Summary • Virtualization can scale – Near native performance for optimized VMM/guest (within 5%)

Summary • Virtualization can scale – Near native performance for optimized VMM/guest (within 5%) • VMM needs to know about guest internals – Should modify behavior for each guest environment – Example: Paging method to use depends on guest • Black Box inference is not desirable in HPC environment – Unacceptable performance overhead – Convergence time – Mistakes have large consequences • Need guest cooperation – Guest and VMM relationship should be symbiotic – Paper forthcoming (4096 scaling results and techniques) 20

Future Work • Continue exploring virtualization in HPC – NU, UNM and SNL collaboration

Future Work • Continue exploring virtualization in HPC – NU, UNM and SNL collaboration – Granted 5 million hours on Jaguar • Current fastest supercomputer in the world Oak Ridge National Labs Cray XT 5 224, 256 cores 4352 sq. ft 6. 95 Mega. Watts $104 million 21

Conclusion • Palacios and Kitten – Two open source tools for HPC – Proved

Conclusion • Palacios and Kitten – Two open source tools for HPC – Proved virtualization of HPC systems can scale • Contributions Welcome!! • http: //www. v 3 vee. org • http: //software. sandia. gov/trac/kitten 22