A Fast Rejuvenation Technique for Server Consolidation with

  • Slides: 23
Download presentation
A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines Kenichi Kourai Shigeru Chiba

A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines Kenichi Kourai Shigeru Chiba Tokyo Institute of Technology

Server consolidation with VMs n Server consolidation is widely carried out n n n

Server consolidation with VMs n Server consolidation is widely carried out n n n Multiple server machines are integrated on one physical machine Recently, using virtual machines (VM) VMs are run on a virtual machine monitor (VMM) n Multiplexing resources VM VM . . . VMM hardware

Software aging of VMMs n Software aging of a VMM is critical n Software

Software aging of VMMs n Software aging of a VMM is critical n Software aging is. . . • The phenomenon that software state degrades with time • E. g. exhaustion of system resources n Software aging of a VMM affects all VMs on it • E. g. performance degradation VM VM VMM . . .

Software rejuvenation of VMMs n Preventive maintenance n n n Performed before software aging

Software rejuvenation of VMMs n Preventive maintenance n n n Performed before software aging of a VMM affects its VMs Occasionally stops a VMM, cleans its internal state, and restarts it Typical example: rebooting a VMM n n Cleans the internal state automatically and completely The easiest way

Drawbacks (1/2): Increasing service downtime n The VMM reboot needs: n Rebooting all OSes

Drawbacks (1/2): Increasing service downtime n The VMM reboot needs: n Rebooting all OSes running on the VMs • The time tends to be long • Larger number of VMs • Longer startup time of services VM OS OS . . . VMM n A hardware reset • The BIOS power-on self test is time-consuming OS shutdown VMM shutdown hardware reset VMM boot OS boot

Drawbacks (2/2): Performance degradation n The file cache is lost by the OS reboot

Drawbacks (2/2): Performance degradation n The file cache is lost by the OS reboot n OSes cannot restore performance until the file cache is re-filled • They strongly rely on the file cache to speed up file accesses n process The time tends to be long • The file cache size is increasing • Large amount of memory for a VM • Free memory as the file cache OS disk

Warm-VM reboot n Fast rejuvenation technique n Efficiently reboots only a VMM • The

Warm-VM reboot n Fast rejuvenation technique n Efficiently reboots only a VMM • The VMM reboot causes no OS reboot n Basic idea • Suspend all VMs before the VMM reboot • Resume them after the reboot n Challenge • How does a VMM efficiently deal with the large memory images of VMs?

On-memory suspend of VMs n Freezes the memory images of VMs on the main

On-memory suspend of VMs n Freezes the memory images of VMs on the main memory n That memory area is just reserved • The time does not depend on the memory size n n Saving them into a slow disk is inefficient ACPI S 3 state for VMs n n VM Suspend To RAM Traditional suspend is ACPI S 4 state disk freez e main memory

On-memory resume of VMs n Unfreezes the memory images preserved on the main memory

On-memory resume of VMs n Unfreezes the memory images preserved on the main memory n They are reused directly as the memory of VMs • No need to read them from a slow disk n The file cache of OSes is also restored • No performance degradation disk VM unfreez e main memory

Quick reload of VMMs n Directly boots a new VMM without a hardware reset

Quick reload of VMMs n Directly boots a new VMM without a hardware reset n The memory images of VMs are preserved through the VMM reboot • Software can keep track of them • A hardware reset does not guarantee this n A VMM is rebooted quickly • No overhead due to a hardware reset main memory VM new VMM preload old VMM

Comparison with other methods n Cold-VM reboot n n Needs the OS reboot Saved-VM

Comparison with other methods n Cold-VM reboot n n Needs the OS reboot Saved-VM reboot n A naive implementation of the warm-VM reboot • VMs are saved into a disk Reboot method Cold-VM Saved-VM Warm-VM Depend on # of VMs Yes No No Depend on services Yes No No Depend on mem size of VMs No Yes No Performance degradation No No Yes

Model for availability n Must consider the software rejuvenation of both a VMM and

Model for availability n Must consider the software rejuvenation of both a VMM and OSes n Warm-VM reboot • The OS rejuvenation is independent n Cold-VM reboot • The OS rejuvenation is affected by the VMM rejuvenation • # of the OS rejuvenation increases OS rejuvenation VMM rejuvenation

Root. Hammer n We have implemented the warm-VM reboot into Xen 3. 0. 0

Root. Hammer n We have implemented the warm-VM reboot into Xen 3. 0. 0 VM physical n On-memory suspend/resume memory • Based on Xen's suspend/resume • Manages the mapping from the VM memory to the physical memory n Quick reload • Based on the kexec mechanism in Linux • Kexec for a VMM is included in the latest Xen • It is not for reusing the memory images

Experiments n Examine that the warm-VM reboot reduces downtime and performance degradation n Comparison

Experiments n Examine that the warm-VM reboot reduces downtime and performance degradation n Comparison • Cold-VM reboot with the OS reboot • Saved-VM reboot using Xen's suspend/resume Linux server . . . client VMM 2 dual-core 12 GB 15, 000 rpm gigabit Opteron SDRAM SCSI disk Ethernet Linux

Performance of on-memory suspend/resume n Suspend/resume of one VM with 11 GB of memory

Performance of on-memory suspend/resume n Suspend/resume of one VM with 11 GB of memory n n Ours: 1 sec Xen's: 280 sec • Depends on the memory size n Suspend/resume of 11 VMs n n Ours: 4 sec OS reboot: 58 sec • Depends on # of VMs

Effect of quick reload n The time of rebooting a VMM with no VMs

Effect of quick reload n The time of rebooting a VMM with no VMs n Warm-VM reboot • 11 sec • The time of quick reload is negligible n Cold-VM reboot • 59 sec • The time due to a hardware reset is 48 sec

Downtime of services n Warm-VM reboot n Always the same • 42 sec n

Downtime of services n Warm-VM reboot n Always the same • 42 sec n Saved-VM reboot n Depends on # of VMs • 429 sec (11 VMs) n Cold-VM reboot n Affected by the service type • 157 sec (sshd) • 241 sec (JBoss)

Availability of JBoss n The warm-VM reboot achieves four 9 s n Assumptions •

Availability of JBoss n The warm-VM reboot achieves four 9 s n Assumptions • OS rejuvenation every week • 34 sec • VMM rejuvenation every 4 weeks • In 0. 5 week after the last OS rejuvenation 1 week OS rejuvenation 0. 5 week VMM rejuvenation Warm-VM reboot 99. 993% Cold-VM reboot 99. 985% Saved-VM reboot 99. 977%

Performance degradation n The throughput of the Apache web server n n before and

Performance degradation n The throughput of the Apache web server n n before and after the VMM reboot Warm-VM reboot • No degradation n Cold-VM reboot • Degraded by 69%

Software rejuvenation in a cluster environment n Clustering achieves zero downtime n n Multiple

Software rejuvenation in a cluster environment n Clustering achieves zero downtime n n Multiple hosts can provide the same service Let us consider the total throughput of all hosts in a cluster total throughput n Warm-VM reboot • (m-1)p n Cold-VM reboot • (m-1)p • (m-0. 69)p for a while after the reboot mp (m-1)p 42 sec 241 sec m: # of hosts p: throughput of one host t

Comparison with VM migration in a cluster environment n VM migration achieves nearly zero

Comparison with VM migration in a cluster environment n VM migration achieves nearly zero downtime n VMs are moved to another host • Xen's live migration, VMware's VMotion n Total throughput n Normal run • (m-1)p • One host is reserved for migration n Live migration • (m-1. 12)p total throughput mp (m-1)p 42 sec 17 min t

Related work n Microreboot [Candea et al. '04] n Reboots only a part of

Related work n Microreboot [Candea et al. '04] n Reboots only a part of subcomponents • The warm-VM reboot enables rebooting only a parent component (VMM for VMs) n Checkpointing/restart [Randell '75] n Saves/restores OS processes • Similar to suspend/resume of VMs n Optimizations of suspend/resume n Incremental suspend, compression of memory images

Conclusion n We proposed the warm-VM reboot n On-memory suspend/resume • Freezes/unfreezes the memory

Conclusion n We proposed the warm-VM reboot n On-memory suspend/resume • Freezes/unfreezes the memory images of VMs n Quick reload • Preserves the memory images through the VMM reboot n It achieved fast rejuvenation n n Downtime reduced by 83% at maximum No performance degradation