Performance Analysis of Virtualization for High Performance Computing

  • Slides: 25
Download presentation
Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads

Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood Supervised by: Dr. Simon Winberg University of Cape Town

Matthew Cawood (UCT) Overview 1. Background 2. Research Objectives 3. HPC 4. Virtualization 5.

Matthew Cawood (UCT) Overview 1. Background 2. Research Objectives 3. HPC 4. Virtualization 5. Performance Tuning 6. The Research Cluster 7. Benchmark Selection 8. Results 9. Conclusions

Matthew Cawood (UCT) 1. Background • BSc (Eng) final year research project • Based

Matthew Cawood (UCT) 1. Background • BSc (Eng) final year research project • Based in CHPC’s Advanced Computer Engineering (ACE) Lab • Access to research cluster currently being commissioned • Project focused on evaluating cluster hardware and software

Matthew Cawood (UCT) 2. Research Objectives 1. Present an in-depth report on the current

Matthew Cawood (UCT) 2. Research Objectives 1. Present an in-depth report on the current technologies being developed in the field of High Performance Computing. 2. Provide a quantitative performance analysis of the costs associated with Virtualization, specifically in the field of HPC.

Matthew Cawood (UCT) 3. High Performance Computing • HPC data centres are rapidly growing

Matthew Cawood (UCT) 3. High Performance Computing • HPC data centres are rapidly growing in size and complexity • Current emphasis placed on improving efficiency and utilization • Wide selection of applications/requirements • Bioinformatics • Astrophysics • Simulation • Modelling

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 5. Performance Optimizations • Host memory reservation of Linux huge pages

Matthew Cawood (UCT) 5. Performance Optimizations • Host memory reservation of Linux huge pages • KVM v. CPU pinning to improve NUMA cell awareness

Matthew Cawood (UCT)

Matthew Cawood (UCT)

Matthew Cawood (UCT) 6. The Research Cluster Compute Nodes: • 2 x Intel Xeon

Matthew Cawood (UCT) 6. The Research Cluster Compute Nodes: • 2 x Intel Xeon E 5 -2690, 20 MB L 3 cache, 2. 90 GHz • 256 GB, DDR 3 -1600, CL 11 • Mellanox Connect. X-3 VPI FDR 56 Gbps HCA • Gigabit Ethernet NIC Switch Infrastructure: • Mellanox SX 6036 FDR 36 port Infiniband Switch

Matthew Cawood (UCT) 6. The Research Cluster • Cent. OS 6. 4 • OFED

Matthew Cawood (UCT) 6. The Research Cluster • Cent. OS 6. 4 • OFED 2. 0 (with SR-IOV) • Open. Nebula 4. 2

Matthew Cawood (UCT) 7. Performance Benchmarks • HPC Challenge • HPLinpack • MPI Random

Matthew Cawood (UCT) 7. Performance Benchmarks • HPC Challenge • HPLinpack • MPI Random Access • STREAM • Effective bandwidth & latency • Open. FOAM • 7 million cell, 5 millisecond transient simulation • snappy. Hex. Mesh

Matthew Cawood (UCT) 8. Results

Matthew Cawood (UCT) 8. Results

Matthew Cawood (UCT) 8. 1 Software Comparison HPLinpack throughput comparison of compiler selection

Matthew Cawood (UCT) 8. 1 Software Comparison HPLinpack throughput comparison of compiler selection

Matthew Cawood (UCT) 8. 2 Single Node Evaluation MPI Random Access Performance HPLinpack throughput

Matthew Cawood (UCT) 8. 2 Single Node Evaluation MPI Random Access Performance HPLinpack throughput efficiency of virtual machines STREAM Memory Bandwidth

Matthew Cawood (UCT) 8. 3 Cluster Evaluation HPLinpack throughput efficiency of virtual machines

Matthew Cawood (UCT) 8. 3 Cluster Evaluation HPLinpack throughput efficiency of virtual machines

Matthew Cawood (UCT) 8. 3 Cluster Evaluation Open. FOAM runtime efficiency of virtual machines

Matthew Cawood (UCT) 8. 3 Cluster Evaluation Open. FOAM runtime efficiency of virtual machines

Matthew Cawood (UCT) 8. 4 Interconnect Evaluation Native Verbs Vs. IP over Infiniband Typical

Matthew Cawood (UCT) 8. 4 Interconnect Evaluation Native Verbs Vs. IP over Infiniband Typical Verbs Latency of virtual machines Typical IPo. IB Latency of virtual machines

Matthew Cawood (UCT) 8. 5 Supplementary Tests Intel® Hyper-threading HPLinpack throughput

Matthew Cawood (UCT) 8. 5 Supplementary Tests Intel® Hyper-threading HPLinpack throughput

Matthew Cawood (UCT) 9. Conclusions • KVM provides good performance for HPC • Tuning

Matthew Cawood (UCT) 9. Conclusions • KVM provides good performance for HPC • Tuning is necessary to further improve performance • Efficiency is highly application dependant • SR-IOV for Infiniband effectively reduced I/O Virtualization overheads • Synthetic and real-world results often contradict

Matthew Cawood (UCT) Questions ?

Matthew Cawood (UCT) Questions ?