The Design Implementation of Hyperupcalls Nadav Amit Michael

  • Slides: 18
Download presentation
The Design & Implementation of Hyperupcalls Nadav Amit & Michael Wei July 2018 │

The Design & Implementation of Hyperupcalls Nadav Amit & Michael Wei July 2018 │ © 2018 VMware, Inc.

Hardware Virtualization virtual machine process OS OS guest host hypervisor hardware Confidential │ ©

Hardware Virtualization virtual machine process OS OS guest host hypervisor hardware Confidential │ © 2018 VMware, Inc. Common use-case: server consolidation Mature technology Challenged by alternatives (e. g. , containers) since it is: • Relatively inefficient • Hard to provide services for VMs 2

The Semantic Gap The VM does not know the physical hardware constraints The hypervisor

The Semantic Gap The VM does not know the physical hardware constraints The hypervisor is oblivious to the VM OS state low on memory? sensitive architectural events guest host Degrades performance Prevents VM introspection is page free? hypervisor Makes the hypervisor robust swap discard

Paravirtualization: Extended hypervisor and VM interface guest page X is free is page X

Paravirtualization: Extended hypervisor and VM interface guest page X is free is page X free? host hypervisor code page X is free hypervisor hypercalls upcalls pre-virtualiztion

Paravirtual Interfaces For hypervisor & virtual machine (VM) coordination execution context initiator VM logic

Paravirtual Interfaces For hypervisor & virtual machine (VM) coordination execution context initiator VM logic VM hypervisor previrtualization hypercalls [ Le. Vaasseur ’ 05 ] hypervisor VM upcalls ? hyperupcalls can run privileged operations “pull” mechanism no context switch overhead

Hyperupcalls: A new Paravirtualization Mechanism Short encapsulated programs Provided by the VM to the

Hyperupcalls: A new Paravirtualization Mechanism Short encapsulated programs Provided by the VM to the hypervisor, registered to certain hypervisor events guest host Invoked on hypervisor events is page X free? VM code hypervisor hyperupcalls Query VM state or notify it on events Reuses OS code Can run while the VM is suspended

Hyperupcalls Safety The VM cannot be trusted Solution: verifiable code - e. BPF Isolation

Hyperupcalls Safety The VM cannot be trusted Solution: verifiable code - e. BPF Isolation is key feature of virtualization Originates from Berkley Packet Filter • Bytecode with provable safety Must ensure safety properties: • Ao. T compilation to native code • No privileged instructions • LLVM compiles C to e. BPF • Safe memory accesses • Supported by Linux, DPDK, etc. • Bounded runtime program/kernel interaction resembles virtual-machine/hypervisor interaction • e. BPF can verify hyperupcalls Confidential │ © 2018 VMware, Inc. 7

Using Verifiable Code / e. BPF guest host Ao. T assembler compiler h-upcall code

Using Verifiable Code / e. BPF guest host Ao. T assembler compiler h-upcall code compilation (once) registration (boot) execution (event) h-upcall bytecode event h-upcall native code safety checker helper functions

Memory Mappings for Hyperupcalls virtual machine view guest virtual hyperupcall memory hypervisor view host

Memory Mappings for Hyperupcalls virtual machine view guest virtual hyperupcall memory hypervisor view host virtual might be occupied host physical both should point to the same data

Memory Mappings for Hyperupcalls virtual machine view hypervisor view guest virtual guest base host

Memory Mappings for Hyperupcalls virtual machine view hypervisor view guest virtual guest base host virtual host physical host base Cannot use native pointers [ guest base ] is only known after boot due to address space randomization [ hyperupcall address ] = [ address ] – [ guest base ] + [ host base ] Extend compiler to transparently adjust the pointer Do not adjust host pointers by annotating them

Additional Issues Hardware interfaces • Interrupts generation • Accessing VCPU registers • Accessing descheduled

Additional Issues Hardware interfaces • Interrupts generation • Accessing VCPU registers • Accessing descheduled VCPUs Solutions: • Helper functions • Synchronization points Confidential │ © 2018 VMware, Inc. e. BPF limitations • No loops, atomic operations, static variables, etc. • Frequent verification failures • Native assembly is unsupported • No linker – no symbols Solution: • A framework as an in-place replacement for common OS function 11

Use-cases New features • Hypervisor event tracing • Kernel security hardening Performance enhancements •

Use-cases New features • Hypervisor event tracing • Kernel security hardening Performance enhancements • Free memory discarding • TLB shootdowns to inactive cores

Hyperupcalls Performance hyperupcall context switch native 1400 1200 530 Cycles 1000 336 147 800

Hyperupcalls Performance hyperupcall context switch native 1400 1200 530 Cycles 1000 336 147 800 92 600 400 800 800 568 200 395 185 108 0 tracing memory discard TLB shootdown kernel hardening

Hypervisor Event Tracing Performance analysis requires tracing and profiling tools virtual machine trace OS

Hypervisor Event Tracing Performance analysis requires tracing and profiling tools virtual machine trace OS event 1 time gap? OS event 2 Only virtual machine events are traced On the cloud hypervisor events cannot be traced virtual machine descheduled hypervisor

Tracing with Hyperupcalls OS tracing service VM OS tracing code [x 86] guest host

Tracing with Hyperupcalls OS tracing service VM OS tracing code [x 86] guest host trace buffer VM ev ent hypervisor event OS hyperupcall VM OS tracing code [e. BPF x 86] hypervisor VM-Exit (context switch) • The hypervisor is oblivious • Virtual machine and hypervisor are decoupled

Free Memory Reclamation Swap (no paravirtualization) Ballooning (upcall) Free memory discard (hyperupcall) guest host

Free Memory Reclamation Swap (no paravirtualization) Ballooning (upcall) Free memory discard (hyperupcall) guest host hypervisor

Memory Reclamation When both memory and CPUs are overcommitted time to reclaim 7 GB

Memory Reclamation When both memory and CPUs are overcommitted time to reclaim 7 GB of free memory on a 16 VCPUs VM 120 time [seconds] 100 80 balloon (upcalls) swap hyperupcalls 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 cores [#} Confidential │ © 2018 VMware, Inc. 17

Conclusions Hyperupcalls • • Provide a flexible interface for VM—hypervisor cooperation Decouples VM-hypervisor Alternative

Conclusions Hyperupcalls • • Provide a flexible interface for VM—hypervisor cooperation Decouples VM-hypervisor Alternative hyperupcall designs are possible Programmability is the key for flexible interfaces