Virtualization 1 What is virtualization Creating a virtual

  • Slides: 56
Download presentation
Virtualization 1

Virtualization 1

What is virtualization? Creating a virtual version of something o Hardware, operating system, application,

What is virtualization? Creating a virtual version of something o Hardware, operating system, application, network, memory, storage “The construction of an isomorphism between a guest system and a host” [Popek, Goldberg, ’ 74] 2

Example: virtual disk q Partition a single hard disk to multiple virtual disks o

Example: virtual disk q Partition a single hard disk to multiple virtual disks o Virtual disk has virtual tracks & sectors q Implement virtual disk by file q Map between virtual disk and real disk contents q Virtual disk write/read mapped to file write/read in host system 3

What is virtualization? (continued( q A way to run multiple operating systems (and their

What is virtualization? (continued( q A way to run multiple operating systems (and their applications) on the same hardware (virtual machines) q Only virtual machine manager (a. k. a. hypervisor) has full system control q Virtual machines completely isolated from each other (or so we hope) Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 4

Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor (type ||)

Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor (type ||) / Virtual Machine Monitor Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 5

Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor (type ||)

Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor (type ||) / Virtual Machine Monitor Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 6

Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor (type ||)

Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor (type ||) / Virtual Machine Monitor Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 7

Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor (type ||)

Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor (type ||) / Virtual Machine Monitor Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 8

Types of virtualization q Full virtualization – guest OS runs unmodified q Para-virtualization –

Types of virtualization q Full virtualization – guest OS runs unmodified q Para-virtualization – guest OS must be aware of virtualization, source-code modifications required Hardware virtualization support may be used for both Our focus is on full virtualization Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 9

Virtualization advantages q Cost-effectiveness – less hardware o Multiple virtual machines / operating systems

Virtualization advantages q Cost-effectiveness – less hardware o Multiple virtual machines / operating systems / services on single physical machine (server consolidation) o Various forms of computation as a service q Isolation o Good for security o Great for reliability and recovery: If VM crashes it can be rebooted, does not affect other services (fault containment) o VM migration q Development tool o Work on multiple OS in parallel o Develop and debug OS in user mode o Origins of VMware as a tool for developers Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 10

Virtualization vs. Multi-Processing Process 1 Process 2 ∙∙∙ OS HW (disk, NIC, …) Multiprocessing

Virtualization vs. Multi-Processing Process 1 Process 2 ∙∙∙ OS HW (disk, NIC, …) Multiprocessing VM Pr 1 Pr 2 OS 1 Virtualization Pr 1 Pr 2 OS 2 User space/ kernel separation HW interface ∙∙∙ VMM/Hypervisor Virtual HW interface Real HW interface HW (disk, NIC, …) Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 11

Type 1 and type 2 hypervisors VMware ESX, Microsoft Hyper-V, Xen VMware Workstation, Microsoft

Type 1 and type 2 hypervisors VMware ESX, Microsoft Hyper-V, Xen VMware Workstation, Microsoft Virtual PC, Sun Virtual. Box, QEMU, KVM Figure 7 -1. Location of type 1 and type 2 hypervisors. Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 12

Type 1 and type 2 hypervisors (continued) Figure 7 -2. Examples of the various

Type 1 and type 2 hypervisors (continued) Figure 7 -2. Examples of the various combinations of virtualization type and hypervisor. Type 1 hypervisors always run on the bare metal whereas type 2 hypervisors use the services of an existing host operating system. Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 13

What's required of a (classic) hypervisor Hypervisor should provide the following: q Safety: have

What's required of a (classic) hypervisor Hypervisor should provide the following: q Safety: have full control of virtualized resources q Fidelity: program behavior on VM should be identical to its behavior on bare hardware q Efficiency: As much as possible, run directly on hardware without hypervisor intervention q Full interpretation isn't efficient Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 14

Classic virtualization: trap and emulate VM 1 VM 2 VMM Return to process (3)

Classic virtualization: trap and emulate VM 1 VM 2 VMM Return to process (3) HW emulation HW Trap (1) Interrupt handler (2) Emulation is the process of implementing the functionality/interface of one system on a system having different functionality/interface 15

Trap and emulate: difficulties on x 86 q Sensitive instructions: Provide control over HW

Trap and emulate: difficulties on x 86 q Sensitive instructions: Provide control over HW resources behave differently in kernel/supervisor and user modes q I/O instructions, enable/disable interrupts, access CR 3 register… q Privileged instructions: cause a trap if executed in user mode Theorem [Popek and Goldberg, 1974] A machine can be virtualized [using trap and emulate] if every sensitive instruction is privileged. Not supported by x 86 processors prior to 2005 In 2005, Intel/AMD introduced virtualization HW support. Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 16

What is sensitive? q CPU – some registers q MMU o Page table o

What is sensitive? q CPU – some registers q MMU o Page table o Segments q Interrupts q Timers q IO devices Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 17

X 86 virtualization problem I q The x 86 architecture (w/o virtualization extensions) can't

X 86 virtualization problem I q The x 86 architecture (w/o virtualization extensions) can't be virtualized by trap and emulate. q Some sensitive instructions are not privileged. q Example: the popf instruction o o Pops 16 bits from stack to flags register One of the flags masks (i. e. disables) interrupts The instruction is not privileged What happens if the OS of a VM runs popf? Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 18

X 86 virtualization problem II q Some instructions: push, pop, mov can have code

X 86 virtualization problem II q Some instructions: push, pop, mov can have code segment selectors (cs, ds, ss) as arguments even in user mode, so they can be read q The selectors have two bits that are their current privilege level o In x 86 (beginning with 386), four privilege levels (ring 0 to ring 3) o The two lower bits of the cs register are the Current Privilege Level (CPL) of the code. o Guest OS thinks that it is in ring 0. o Guest OS is actually in ring 1 q Result - guest OS confusion. Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 19

Implementation options q Avoid executing sensitive instructions o Interpretation (BOCHS, JSLinux). o Binary translation

Implementation options q Avoid executing sensitive instructions o Interpretation (BOCHS, JSLinux). o Binary translation – change executed code (VMware, QEMU). q Para-virtualization – re-compile guest OS (XEN, Denali). q Hardware assistance – Intel VT-x and AMD-V (used by KVM, XEN, Vmware). Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 20

Outline q Concepts, classical CPU virtualization o Binary translation q Memory virtualization Operating Systems,

Outline q Concepts, classical CPU virtualization o Binary translation q Memory virtualization Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 21

Binary translation q Binary translation is the process of translating one instruction set to

Binary translation q Binary translation is the process of translating one instruction set to another one. q Approach I: translate entire OS when loaded to VM o Key problem – indirect control flow 22

Dynamic binary translation q Approach II: translate code on the fly q Simplest approach

Dynamic binary translation q Approach II: translate code on the fly q Simplest approach o o Keep table mapping old instructions to new instructions. Fetch old instruction. Use table to translate. Execute new instruction(s). q Problem: performance o Overhead for every instruction similarly to interpretation. 23

Dynamic BT with caching q Cache translated code region: o After translation run from

Dynamic BT with caching q Cache translated code region: o After translation run from cache. o Translation occurs only once. q Static translation cannot handle dynamic control transfer, when: o Jump depending on content of memory address. o Indirect function call (by function pointer). q Translation of dynamic control transfer must be done at execution time. q User code does not have to be translated 24

Virtualization prior to HW support Figure 7 -4. The binary translation rewrites the guest

Virtualization prior to HW support Figure 7 -4. The binary translation rewrites the guest operating system running in ring 1, while the hypervisor runs in ring 0 Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 25

VMWare binary translation: example C code Invo loggi king is. Pr i ng al

VMWare binary translation: example C code Invo loggi king is. Pr i ng al l cod me(49), e tran slate d 64 -bit binary Binary (hex) representation 26

VMWare binary translation: example q Translator reads guest memory at the address indicated by

VMWare binary translation: example q Translator reads guest memory at the address indicated by guest PC q Decodes instructions, creates Intermediate Representation - IR objects q Accumulates IR objects to translation units (TUs) o Basic blocks (BB), stops upon control flow First TU Compiled code fragment (CCF) 27

VMWare binary translation: example q Translator reads guest memory at the address indicated by

VMWare binary translation: example q Translator reads guest memory at the address indicated by guest PC q Decodes instructions, creates Intermediate Representation - IR objects q Accumulates IR objects to translation units (TUs) o Basic blocks (BB), stops upon control flow First TU Identical code Compiled code fragment (CCF) 28

VMWare binary translation: example q Translator reads guest memory at the address indicated by

VMWare binary translation: example q Translator reads guest memory at the address indicated by guest PC q Decodes instructions, creates Intermediate Representation - IR objects q Accumulates IR objects to translation units (TUs) o Basic blocks (BB), stops upon control flow Translation of jump BB First TU Compiled code fragment (CCF) 29

VMWare binary translation: example q Translator reads guest memory at the address indicated by

VMWare binary translation: example q Translator reads guest memory at the address indicated by guest PC q Parses instructions, creates Intermediate Representation - IR objects q Accumulates IR objects to translation units (TUs) o Basic blocks (BB), stops upon control flow Translation of fall through BB First TU Compiled code fragment (CCF) 30

VMWare binary translation: example C code 64 -bit binary Invo loggi king is. Pr

VMWare binary translation: example C code 64 -bit binary Invo loggi king is. Pr i ng al l cod me(49), e tran slate d Which basic block will be translated next? 31

VMWare binary translation: example C code 64 -bit binary Invo loggi king is. Pr

VMWare binary translation: example C code 64 -bit binary Invo loggi king is. Pr i ng al l cod me(49), e tran slate d Which basic block will be translated next? 32

VMWare binary translation: example C code 64 -bit binary Invo loggi king is. Pr

VMWare binary translation: example C code 64 -bit binary Invo loggi king is. Pr i ng al l cod me(49), e tran slate d 33

VMWare binary translation example: output 34

VMWare binary translation example: output 34

VMWare binary translation example: output These continuations remain because respective basic blocks were not

VMWare binary translation example: output These continuations remain because respective basic blocks were not executed 35

VMWare binary translation operation q Translation cache (TC) stores translations done so far q

VMWare binary translation operation q Translation cache (TC) stores translations done so far q A hash table tracks the input-to-output correspondence q Chaining optimization allows one CCF to jump directly to another without calling out of the translation cache q As TC gradually captures guest's working set, proportion of translation decreases User code does not have to be translated Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 36

Dealing with privileged instructions: example q The cli (clear interrupts) instruction is privileged q

Dealing with privileged instructions: example q The cli (clear interrupts) instruction is privileged q Translated to: “vcpu. flags. IP=0” q Much faster than source binary! Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 37

Outline q Concepts, classical CPU virtualization o Binary translation q Memory virtualization Operating Systems,

Outline q Concepts, classical CPU virtualization o Binary translation q Memory virtualization Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 38

Memory allocation q Each VM usually receives a contiguous set of physical addresses. o

Memory allocation q Each VM usually receives a contiguous set of physical addresses. o 1 Gbyte– 4 Gbyte are typical values. q As far as VM is concerned, this is the physical memory of the machine. q The guest OS allocates pages to guest processes. 39

Memory management q Assumptions of OS in VM: o Physical memory is a contiguous

Memory management q Assumptions of OS in VM: o Physical memory is a contiguous block of addresses from 0 to some n. o OS can map any virtual page to any page frame. q Hypervisor must: o Partition memory among VMs. o Ensure virtual page mapping only to assigned page frames. q TLB miss: cache miss in HW-managed TLB (e. g. x 86) causes HW to select a page from page table. q VM OS must not manage real page table. Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 40

Option 1: brute force Define these pages as not R/W Guest OS Page dir.

Option 1: brute force Define these pages as not R/W Guest OS Page dir. CR 3 Hypervisor Page table VMM SW TLB VM memory layout Interrupt & VMM corrects address. CPU HW Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 41

Brute force – description q Guest page tables are read and write protected in

Brute force – description q Guest page tables are read and write protected in host system. q If guest OS reads page table (e. g. for page eviction), writes page table (e. g. after page fault), or changes CR 3, the system traps. q The hypervisor then uses a VM memory layout to: q Return answers to VM q Update the layout q Hypervisor switches VM memory layout when new VM is scheduled. Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 42

Option 2: shadow page tables Guest OS Page dir. Hypervisor Page table VMM SW

Option 2: shadow page tables Guest OS Page dir. Hypervisor Page table VMM SW Shadow page table G-CR 3 TLB CPU Interrupt & VMM corrects page table. HW Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 43

Shadow page tables – description q Hypervisor maintains “shadow page tables”. q Guest page

Shadow page tables – description q Hypervisor maintains “shadow page tables”. q Guest page tables map: Guest VA (GVA) Guest PA (GPA) q Shadow tables map: Guest VA Host PA (HPA). q Hypervisor does not trap guest updates to its page table. o Result – inconsistent guest page table and shadow page table. q When guest process accesses virtual address o The physical address is not in the guest page table, but in the shadow page table. o HW translates correctly, because it is aware only of shadow tables. Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 44

Shadow page tables – description (continued) q If address in TLB – TLB hit

Shadow page tables – description (continued) q If address in TLB – TLB hit and no problem. q When guest process causes a page fault o Hypervisor begins execution. o If required, hypervisor updates shadow page table. q Performance is as good as native execution as long as there are no page faults. q Shadow page tables should be cached so that once a VM is re-scheduled the page table does not have to be rebuilt from scratch. Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 45

Shadow page tables – page faults (continued) Two scenarios when handling a page fault.

Shadow page tables – page faults (continued) Two scenarios when handling a page fault. Hypervisor ``walks’’ guest page table to determine which it is. 1. Guest page fault – No translation in guest page tables ``inject’’ page fault for guest to handle 2. Guest translation found update shadow table respectively Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 46

Shadow page tables – updating CR 3 Virtual CR 3 Real CR 3 Slide

Shadow page tables – updating CR 3 Virtual CR 3 Real CR 3 Slide taken from a presentation by VMWare. 47

Shadow page tables – updating CR 3 Virtual CR 3 Real CR 3 Slide

Shadow page tables – updating CR 3 Virtual CR 3 Real CR 3 Slide taken from a presentation by VMWare. 48

Shadow page tables – updating CR 3 Virtual CR 3 Real CR 3 Slide

Shadow page tables – updating CR 3 Virtual CR 3 Real CR 3 Slide taken from a presentation by VMWare. 49

Undiscovered guest page table Virtual CR 3 Real CR 3 Slide taken from a

Undiscovered guest page table Virtual CR 3 Real CR 3 Slide taken from a presentation by VMWare. 50

Undiscovered guest page table Virtual CR 3 Real CR 3 Slide taken from a

Undiscovered guest page table Virtual CR 3 Real CR 3 Slide taken from a presentation by VMWare. 51

Option 3: Extended/nested page tables Guest OS Page dir. CR 3 Hypervisor Page table

Option 3: Extended/nested page tables Guest OS Page dir. CR 3 Hypervisor Page table VMM SW TLB Host page table CPU EPTP HW Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 52

Nested/extended page tables - description q The name implies having page tables within page

Nested/extended page tables - description q The name implies having page tables within page tables. q The essence of the idea is a hardware assist. o Hardware has an extra pointer and the ability to walk an extra set of page tables. o Idea is called Extended Page Tables (EPT) by Intel q Guest page tables hold Guest VA Guest PA mapping, access by standard CR 3 q Extended page tables hold Host VA Host PA mapping, access by EPTP (EPT pointer). q Host VA=Guest PA Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 53

Walking extended page tables Operating Systems, Spring 2018, I. Dinur, D. Hendler and R.

Walking extended page tables Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 54

Extended page tables – description (cont'd) q TLB as usual holds Guest VA Host

Extended page tables – description (cont'd) q TLB as usual holds Guest VA Host PA q On memory access o If found in TLB – no problem. o If not in TLB, but no page fault, hardware walks both tables and updates TLB. o If page fault, then hypervisor gets host virtual page (guest physical page) and maps it to host physical page. Operating Systems, Spring 2018, I. Dinur, D. Hendler and R. Iakobashvili 55

Sources q “Modern operating systems”, 4‘th edition, A. Tanenbaum and H. Bos q “Virtual

Sources q “Modern operating systems”, 4‘th edition, A. Tanenbaum and H. Bos q “Virtual machines”, J. E. Smith and R. Nair q A presentation by Niv Gilboa from CSE@BGU q “Formal requirements for virtualizable third generation architectures”, G. J. Popek and R. P. Goldberg, CACM, 1974 q “A comparison of software and hardware techniques for x 86 virtualization”, K. Adams and O. Ageson, ASPLOS 2006 q A presentation by VMWare 56