Partition and Isolate Approaches for Consolidating HPC and

Summary • Commodity and HPC systems have been converging – Commodity off the shelf

User Space Partitioning Memory Socket 1 Socket 2 Cores 1 2 5 6 3

HPC vs. Commodity Systems • Commodity systems have fundamentally different focus than HPC systems

High Performance Computing (HPC) • Large scale simulations to solve Big Problems 5

Dual Stack Approach • Partition – Segment the underlying hardware resources – Assign them

HPC in the cloud • Clouds are starting to look like supercomputers… – Are

Commodity VMMs • Virtualization is considered an “enterprise” technology – Designed for commodity environments

Palacios VMM • OS-independent embeddable virtual machine monitor – Established compatibility with Linux, Kitten,

Palacios/Linux • Palacios/Linux provides lightweight and high performance virtualized environments – Internally manages dedicated

VMM Comparison • Primary difference: Consistency – Requirement for tightly coupled performance at large

Dual Stack Architecture • Partitioning at the OS level Commodity Application(s) HPC Application Commodity

Evaluation • Goal: Measure VM isolation properties • Partitioned a single node into HPC

Results Commodity VMMs degrade with contention Palacios delivers consistent performance Mini. FE: Unstructured implicit

Discussion • A dual stack approach can provide HPC environments on commodity clouds –

Multi-stack Operating Systems • Future Exascale Systems are moving towards in situ organization •

What this means for the OS • At Petascale we could optimize each environment

Current Supercomputer OS architectures UNIX LINUX 18 Source: http: //en. wikipedia. org/wiki/Fil e: Operating_systems_used_on

Will Linux continue to dominate? • An open question at this point – Exascale

Dual Stack Architecture HPC Application Commodity Application(s) Linux Kernel Kitten Palacios VMM Linux Module

Stream Palacios/Kitten provides higher memory throughput than Linux – 400 MB/s (4. 74%) Palacios/Kitten

Selfish Detour Linux Virtual Kitten Palacios/Kitten can reduce noise from Linux • Eliminates Periodic

Better than native? • Results are preliminary – Followed best practices for configuring Linux

Beyond Virtualization • Virtualization imposes overhead – Power: requires transistors – Performance: small, but

Dual Stack Architecture Commodity Application(s) Linux HPC Application Palacio VMM Kitten Hardware • Provide

Approach • OS partition created via offlined resources – CPUs, memory, PCI devices •

Dual Stack Memory • Maybe we don’t need to provide an entirely separate OS

Dual Stack Architecture Commodity Application(s) HPC Application Linux Memory Management Hardware • Provide LWK

Conclusion • Commodity systems are not designed to support HPC workloads – Different requirements

Thank you Jack Lange Assistant Professor University of Pittsburgh jacklange@cs. pitt. edu http: //www.

Slides: 30

Download presentation

Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads Jack Lange Assistant Professor University of Pittsburgh

Summary • Commodity and HPC systems have been converging – Commodity off the shelf components – Linux based HPC systems – Cloud computing • Problem: Real HPC applications need HPC environments – Tightly coupled, massively parallel, and synchronized – Current services must provide dedicated HPC systems • Can we co-host HPC applications on commodity systems? • Dual Stack Approach – Provision the underlying software stack along with application – Commodity stack should handle commodity applications – HPC stack can provide HPC environment

User Space Partitioning Memory Socket 1 Socket 2 Cores 1 2 5 6 3 4 7 8 Commodity Partition Memory HPC Partition • Current systems do support this, but… • Interference still exists inside the system software – Inherent feature of commodity systems

HPC vs. Commodity Systems • Commodity systems have fundamentally different focus than HPC systems – Amdahl’s vs. Gustafson’s laws – Commodity: Optimized for common case • HPC: Common case is not good enough – At large (tightly coupled) scales, percentiles lose meaning – Collective operations must wait for slowest node – 1% of nodes can make 99% suffer – HPC systems must optimize outliers (worst case)

High Performance Computing (HPC) • Large scale simulations to solve Big Problems 5

Dual Stack Approach • Partition – Segment the underlying hardware resources – Assign them to exclusively to specific workloads • Isolate – Prevent interference from other workloads – Hardware: partitions must be course grained – Software: eliminate shared state • Implementation – Independent system software running on isolated resources

HPC in the cloud • Clouds are starting to look like supercomputers… – Are we seeing a convergence? • Not yet – – Noise issues Poor isolation Resource contention Lack of control over topology • Very bad for tightly coupled parallel apps – Require specialized environments that solve these problems • Approaching convergence – Vision: Dynamically partition cloud resources into HPC and commodity zones – This talk: partitioning compute nodes with performance isolation

Commodity VMMs • Virtualization is considered an “enterprise” technology – Designed for commodity environments – Fundamentally different, but not wrong! • Example: KVM architecture issues – Userspace handlers – Fairly complex memory management – Locking and periodic optimizations – Presence of system noise

Palacios VMM • OS-independent embeddable virtual machine monitor – Established compatibility with Linux, Kitten, and Minix • Specifically targets HPC applications and environments – Consistent performance with very low variance • Deployable on supercomputers, clusters (Infiniband/Ethernet), and servers – 0 -3% overhead at large scales (thousands of nodes) • VEE 2011, IPDPS 2010, ROSS 2011 Open source and freely available http: //www. v 3 vee. org/palacios 9

Palacios/Linux • Palacios/Linux provides lightweight and high performance virtualized environments – Internally manages dedicated resources • Memory and CPU scheduling – Does not bother with “enterprise features” • Page sharing/merging, swapping, overcommitting resources • Palacios enables scalable HPC performance on commodity platforms

VMM Comparison • Primary difference: Consistency – Requirement for tightly coupled performance at large scale • Example: KVM nested paging architecture – Maintains free page caches to optimize performance • Requires cache management – Shares page tables to optimize memory usage • Requires synchronization VMM % of exits Mean Std Dev # NPFS KVM 52% 8804 5232 3, 265, 156 Palacios 50% 10876 2685 1, 872, 017

Dual Stack Architecture • Partitioning at the OS level Commodity Application(s) HPC Application Commodity Linux HPC Linux KVM Linux Kernel Palacios VMM Linux Module Interface Palacios Resource Managers Hardware • Enable cloud to host both commodity and HPC apps – Each zone optimized for the target applications

Evaluation • Goal: Measure VM isolation properties • Partitioned a single node into HPC and commodity zones – Commodity Zone: Parallel Kernel compilation – HPC Zone: Set of standard HPC benchmarks – System: • Dual 6 -core AMD Opteron with NUMA topology • Linux guest environments (HPC and commodity) • Important: Local node only – Does not promise good performance at scale – But, poor performance will magnify at large scales

Results Commodity VMMs degrade with contention Palacios delivers consistent performance Mini. FE: Unstructured implicit finite element solver Mantevo Project -- https: //software. sandia. gov/mantevo/index. html

Discussion • A dual stack approach can provide HPC environments on commodity clouds – HPC and commodity workloads can dynamically share resources – HPC requirements can be met without fully dedicated resources • Networking is still an open issue – Need mechanisms for isolation and partitioning – Need high performance networking architectures • 1 Gbit is not good enough • 10 Gbit is good, Infiniband is better – Need control over placement and topologies

Multi-stack Operating Systems • Future Exascale Systems are moving towards in situ organization • Applications traditionally have utilized their own platforms • Visualization, storage, analysis, etc • Everything must now collapse onto a single platform

What this means for the OS • At Petascale we could optimize each environment separately – Each had their own OS and hardware • At Exascale workloads will be co-located – Can a single OS handle all workloads effectively? • Probably Not – Each has different resource requirements and behaviors – Exascale will need to support multiple OS environments on the same hardware

Current Supercomputer OS architectures UNIX LINUX 18 Source: http: //en. wikipedia. org/wiki/Fil e: Operating_systems_used_on

Will Linux continue to dominate? • An open question at this point – Exascale systems may mark a radical departure from traditional architectures • Kitten: Open-source Lightweight Kernel from Sandia – Minimal compute node OS • Provides mostly Linux-compatible user environment – Supports unmodified compiler toolchains and ELF executables • But doesn’t include the enterprise Linux features – Simple memory management, application managed I/O, etc 19

Dual Stack Architecture HPC Application Commodity Application(s) Linux Kernel Kitten Palacios VMM Linux Module Interface Palacios Resource Managers Hardware • Provide LWK environment on a commodity system – Each zone optimized for the target applications

Stream Palacios/Kitten provides higher memory throughput than Linux – 400 MB/s (4. 74%) Palacios/Kitten provides more consistent memory performance than Linux – 340 MB/s lower standard deviation 21

Selfish Detour Linux Virtual Kitten Palacios/Kitten can reduce noise from Linux • Eliminates Periodic timer interrupts 22

Better than native? • Results are preliminary – Followed best practices for configuring Linux – Didn’t try to optimize VMM performance • Virtualization can improve system performance – when the system is running a commodity OS B. Kocoloski, J. Lange, Better than Native: Using Virtualization to Improve Compute Node Performance, (ROSS 2012)

Beyond Virtualization • Virtualization imposes overhead – Power: requires transistors – Performance: small, but present – Interference: Still some dependencies on host OS • Might not be available on exascale hardware • Can we provide native partitioning? – We think so – Linux provides the ability to dynamically remove resources (CPUs, memory, etc) – These can be taken over by a second OS

Dual Stack Architecture Commodity Application(s) Linux HPC Application Palacio VMM Kitten Hardware • Provide LWK environment on a commodity system – Each zone optimized for the target applications

Approach • OS partition created via offlined resources – CPUs, memory, PCI devices • Secondary OS “booted” on offline resources • Issues: – OS initialization • Boot process • Resource discovery – Coordination and communication – Security and safety

Dual Stack Memory • Maybe we don’t need to provide an entirely separate OS – Instead selectively manage some resources • Dual stack memory – Provide a separate memory management layer to Linux • Features – Selectively manage heap per application – Provide applications with direct control over memory layout – Transparently back memory using large pages – Without overhead added by Linux

Dual Stack Architecture Commodity Application(s) HPC Application Linux Memory Management Hardware • Provide LWK memory manager on a commodity OS

Conclusion • Commodity systems are not designed to support HPC workloads – Different requirements and behaviors than commodity applications • A multi stack approach can provide HPC environments in commodity systems – HPC requirements can be met without separate physical systems – HPC and commodity workloads can dynamically share resources – Isolated system software environments

Thank you Jack Lange Assistant Professor University of Pittsburgh jacklange@cs. pitt. edu http: //www. cs. pitt. edu/~jacklange