OS Virtualization Outline Background What is Virtualization Why
OS Virtualization
Outline • • • Background What is Virtualization? Why would we want it? Why is it hard? How do we do it? Choices 2
What is Virtualization? • OS virtualization – Create a platform that emulates a hardware platform and allows multiple instances of an OS to use that platform, as though they have full and exclusive access to the underlying hardware 3
What is Virtualization? Applications OS 1 OS 2 OS 3 Applications OS 4 Virtualization Platform Hardware 4
The Problem • OS uses kernel mode / user mode to protect the OS. – System calls (privileged instructions) generate a trap (software interrupt) that forces a switch to kernel mode – Assembly sensitive instructions (I/O, MMU control, etc. ) that must only be executed by the kernel 5
The Problem • If our VM now runs in user space, we cannot run sensitive instructions in it, since those must trap to kernel space. • We would like such instructions to force a trap into the hypervisor – Hypervisor responsible to assist with sensitive instructions 6
The Problem • Hardware protection rings – Supervisor mode • Can run any instruction (ring 0). • Trusted to not fail, in case of failure the system crashes. – User programs use ring 3 • Hypervisor runs on ring 0, guest OS does not
The Problem • On x 86, some instructions are sensitive but not privileged – Example: POPF • Pops data from stack to the EFLAGS register • Can be called from all protection rings, behaves differently when not in ring 0 • Interrupt flag is part of EFLAGS, only changes on ring 0 • Is not privileged (does not trap) 8
Solution – binary translation • Replace problematic calls dynamically – Read in code, looking for basic blocks – Then inspect basic block to find problematic instructions. If found, replace with VM call (process called binary translation) – Then, cache block and execute. – Eventually, most basic blocks will be modified and cached, and will run at near native speed. • Can force traps on sensitive non-privileged instructions 9
Solution – VM hardware • Systems with intel VT-x or AMD SVM (since 2005) – – New assembly commands to enter VM mode Hypervisor runs on ring 0 under root mode Guest OS runs in ring 0 under non-root mode Changes are done within VM specific state called VMCS (Virtual Machine Control Structure) • Even with VM hardware support binary translation can still be used to improve performance 10
Implementation • Type 1 Hypervisor • Type 2 Hypervisor • Paravirtualization 11
Type 1 Hypervisor • Runs on “bare metal” – Hypervisor is the machine’s kernel – Made for servers, includes interface for remote / admin access – Examples: Xen, Vmware v. Sphare, etc. 12
Type 2 Hypervisor • Runs from within a OS. – Supports guest OSs above it. – VM software must include kernel module – Example: Oracle Virtual. Box, VMware Player, etc. 13
Paravirtualization • Modify Guest OS so that all calls to nonprivileged sensitive instructions are changed to hypervisor calls. • Much easier (and more efficient) to modify source code than to emulate hardware instructions (as in binary translation). 14
Problems with Paravirtualization • Paravirtualized systems won’t run on native hardware • There are many different paravirtualization systems that use different commands, etc. – VMware, Xen, etc. • Proposed solution: – Modify the OS kernel so that it calls a special set of procedures to execute sensitive instructions (Virtual Machine Interface ) • Bare metal – link to library that implement code • On VM – link to VM specific library 15
Memory Virtualization • OS tracks mapping of virtual memory pages to physical memory page frames. • Builds page tables, then updates paging register (trap). • Allow hypervisor to manage page mapping, and use shadow page tables for the VMs 16
Shadow Page Table • Guest page tables map: Guest VA Guest PA • Shadow tables: Guest VA Host PA.
Nested/extended page tables • Requires hardware support – Two “CR 3”s (CR 3 and EPTP) – MMU translates each guest mapping level Guest OS Page dir. CR 3 Hypervisor Page table VMM SW TLB EPTP HW CPU Host page table
Nested page tables • Guest page table map: Guest VA Guest PA • Nested page table map: Guest PA Host PA
I/O Virtualization • Each guest OS holds its own “partition”. – Typically implemented as a file or region on disk – Hypervisor must convert guest OS address (block #) into physical address in region – May convert between storage types. – Must deal with DMA (Direct memory access) requests 20
Question (Moed B 2017) Guest OS Page dir. Hypervisor Page table VMM SW G-CR 3 TLB HW CPU Shadow page table Interrupt & VMM corrects page table.
Question (Moed B 2017) Define these pages as not R/W Guest OS Page dir. CR 3 Hypervisor Page table VMM SW TLB HW CPU VM memory layout
- Slides: 25