Virtualization Creating a virtual i e not really
- Slides: 41
Virtualization Creating a virtual (i. e. not really existing) existing version of something: o o o Hardware network memory storage … Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 1
Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor / Virtual Machine Monitor (VMM) Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 2
Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor / Virtual Machine Monitor Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 3
Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor / Virtual Machine Monitor Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 4
Basic concepts q Virtual Machine (VM) q Host q Guest q Hypervisor / Virtual Machine Monitor VMware Workstation, Microsoft Virtual PC, Sun Virtual. Box, QEMU, KVM VM Software type 2 VMware ESX, Microsoft Hyper-V, Xen type 1 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 5
Why to use virtualization ? ü Servers' consolidation o Multiple VMs / OSs / services on one physical machine Unix Host OS VM Software Linux Windows 10 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 6
Why to use virtualization ? ü Isolation o VMs are completely isolated from each other – multi-users security o Only VMM has full control of VMs Unix Host OS VM Software Linux Windows 10 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 7
Why to use virtualization ? ü Fault containment (recovery) o If VM crashes it can be rebooted, does not affect other VMs Unix Host OS VM Software Linux Windows 10 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 8
Why to use virtualization ? For example, when some VM needs more HW resources ü VM migration o move VM to another server is easy Unix Host OS VM Software X Linux Windows 10 VM Software Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky Host OS 9
Why to use virtualization ? ü For virtual testing non-existing (novel) HW architectures Unix Host OS VM Software Linux Windows 10 virtual x 90 architecture Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 10
Types of virtualization Our focus in the course is on full virtualization q Full virtualization – guest OS runs unmodified guest OS does not know that it runs on VM and not on real machine Host OS VM Software Windows 10 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky Windows 10 11
Types of virtualization guest OS can directly cooperate with VMM, and thus VM performance may be better q Para-virtualization – guest OS must be aware of virtualization, guest OS source-code modifications is required Host OS VM Software modified Windows 10 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky Windows 10 12
Hypervisor (VMM) must provide: q Safety: Hypervisor should have full control of virtualized resources q Fidelity: program behavior on VM should be identical to its behavior on bare hardware q Efficiency: as much as possible, run directly on hardware , without Hypervisor intervention HW resources Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 13
Classic virtualization: trap-and-emulate Trap is caused when guest OS (or any other process that is not host OS) tries to run some privileges instruction. Trap is not caused when execute sensitive instructions. . . VM 1 VMM VM 2 3) Return to process HW emulation i. e. virtual hardware HW 1) Trap 2) Interrupt handler Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky מעדכן hypervisor , trap אחרי VM שהוא שומר מול state ב . את הנדרש בפקודה 14
X 86 virtualization problem I q Some sensitive instructions are not privileged 15 Host CPU IP q Example: popf instruction Pop 16 bits from stack to flags register IP flag masks (i. e. disables) interrupts IP flag remains When is executed in User Space, IP flag stays unchanged When executed in Kernel Space, IP flag is changed Since popf may be executed in both modes, this instruction is not privileged o Since in each mode popf has different result, this instruction is sensitive (to execution mode) Eflags o o o What happens when guest OS runs popf ? Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky VM virtual CPU IP Eflags indeed, we are interested to change IP flag on VM virtual CPU 15
Trap-and-emulate – difficulties on x 86 q Sensitive instructions - provide control over real (i. e. not virtual) HW resources • access to some CPU registers • • • access to MMU • • CR 3 register CS, DS, SS registers Page table Enable / Disable CPU Interrupts Timers IO devices By executing sensitive instruction directly, guest OS might run incorrectly Theorem [Popek and Goldberg, 1974] A machine can be virtualized (using trap and emulate) if every sensitive instruction is privileged. Two solutions are possible: 1. q Privileged instructions: cause a trap if executed in user mode This was not supported by x 86 processors prior to 2005 In 2005, Intel/AMD introduced virtualization HW support. Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 2. make guest OS aware of its “guestness” make all sensitive instruction to be privileged (and thus causing trap when guest OS run them) 16
X 86 virtualization problem II q Some instructions can have code segment selectors (cs, ds, ss) as arguments even in user mode, so segment selectors can be read q The selectors have two bits that are their current privilege level o In x 86 (beginning with 386), four privilege levels (ring 0 to ring 3) o Guest OS thinks that it is in ring 0 o Guest OS is actually in ring 1 q Result - guest OS confusion Host OS Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 17
Outline q Concepts, classical CPU virtualization o Binary translation q Memory virtualization Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 18
Dynamic binary translation (by VMM) of guest OS machine code q Binary translation is the process of translating one instruction set to another executable of guest OS Overhead for every instruction, similarly to interpreter. q Translate guest OS code on the fly q Translator reads next Basic Block (BB) of guest OS o stops upon control flow instruction (i. e. jump, call, loop, ret instructions) q Decodes BB instructions, and creates Intermediate Representation (IR) objects for them q Replace all sensitive and privileged instructions by changes of appropriate virtual data structure q IR objects are gathered into Translation Unit (TU) q Execute only At the end of each TU there is a trap instruction, which activates VMM to choose the next TU to run. lea edx, nomscdex xor ebx, ebx mov edi, 10 h nextloop: mov ecx, edi lea edx, nomscdex xor ebx, ebx mov eax, 1500 BB 1 cli BB 2 test ebx, ebx jz exit mov edi, 10 h nextloop: mov ecx, edi lea edx, nomscdex xor ebx, ebx mov eax, 1500 int 0 x 2 F popf jz exit mov edi, 10 h nextloop: mov ecx, edi lea edx, nomscdex test ebx, ebx jz exit . . . TUs in RAM xor ebx, ebx mov eax, 1500 vcpu. flags. IP=0 TU 2 test ebx, ebx Binary Translation BB 3 BB 4 BB 5 Both privileged and sensitive instructions are translated code This is much faster than executing trap on privileged instructions. Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky jz exit mov edi, 10 h nextloop: mov ecx, edi lea edx, nomscdex xor ebx, ebx mov eax, 1500 mov ecx, edi lea edx, nomscdex test ebx, ebx jz exit lea edx, nomscdex xor ebx, ebx mov edi, 10 h nextloop: mov ecx, edi lea edx, nomscdex TU 3 TU 5 TU 1 int 0 x 2 F vcpu. popf jz exit mov edi, 10 h TU 4 19
CPU IP Eflags P 1 CPU IP P 2 Eflags Example: q The cli (clear interrupts) instruction is privileged (on real CPU) q Translated to: “vcpu. flags. IP=0” vcpu. flags. IP=0 (on VM virtual CPU) Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 20
Dynamic binary translation with caching executable of guest OS q Translation cache (TC) stores translations done so far q Translation occurs only once q Static translation cannot handle dynamic control transfer, when: o Jump depending on content of memory address o Indirect function call (by function pointer) q User code is not translated xor ebx, ebx mov eax, 1500 lea edx, nomscdex xor ebx, ebx mov edi, 10 h nextloop: mov ecx, edi lea edx, nomscdex xor ebx, ebx mov eax, 1500 BB 1 cli BB 2 test ebx, ebx jz exit mov edi, 10 h nextloop: mov ecx, edi lea edx, nomscdex xor ebx, ebx mov eax, 1500 int 0 x 2 F popf jz exit mov edi, 10 h nextloop: mov ecx, edi lea edx, nomscdex test ebx, ebx jz exit TUs Translation Cache in RAM vcpu. flags. IP=0 TU 2 test ebx, ebx Binary Translation BB 3 BB 4 BB 5 . . . jz exit mov edi, 10 h nextloop: mov ecx, edi lea edx, nomscdex xor ebx, ebx mov eax, 1500 mov ecx, edi lea edx, nomscdex test ebx, ebx jz exit lea edx, nomscdex xor ebx, ebx mov edi, 10 h nextloop: mov ecx, edi lea edx, nomscdex TU 5 TU 1 int 0 x 2 F vcpu. popf jz exit mov edi, 10 h Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky TU 3 TU 4 21
Virtualization prior to HW support Figure 7 -4. The binary translation rewrites the guest operating system running in ring 1, while the hypervisor runs in ring 0 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 22
VMWare binary translation: example First TU executable of guest OS BB 1 Binary Translation BB 2 BB 3 identical (unchanged) code Compiled code fragment (CCF) BB 4 BB 5 Translation of jump conditional code ‘jge prime’ Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 23
VMWare binary translation example: output Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 24
VMWare binary translation example: output These continuations remain because respective basic blocks were not executed Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 25
Outline q Concepts, classical CPU virtualization o Binary translation q Memory virtualization Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 26
Memory allocation q Each VM usually receives a contiguous set of physical addresses o usually 1 -4 Gb q Guest OS allocates pages to guest processes q VMM must ensure that virtual pages mapping of guest OS processes is only to assigned page frames RAM P 2 P 1 P 3 VMM manages partition memory among VMs Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky P 1 P 2 27
Memory management q Assumptions of guest OS: o Physical memory is a contiguous block of addresses from 0 to some n o guest OS can map any virtual page to any page frame q Hypervisor must: o Partition memory among VMs o Ensure virtual page mapping only to assigned page frames. q TLB cache miss o cache miss in HW-managed TLB (e. g. x 86) o causes HW to select a page from page table q VM OS must not manage real (host) Page Table Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 28
Option 1: brute force q Guest page tables are read and write protected in host system. q If guest OS reads page table (e. g. for page eviction) or writes page table (e. g. after page fault), or changes CR 3, the system traps q Hypervisor then uses a VM memory layout in RAM to: disable R/W, R/W to cause trap when guest OS tries to read or to edit Page directory or Page Table Guest OS Hypervisor Page Table VMM VM memory layout q Return answers to VM (e. g. which page to evict) q Update RAM layout (e. g. add map of virtual page to allocated physical frame) q Hypervisor switches VM memory layout when new VM is scheduled CR 3 TLB CPU Interrupt, and then VMM treats Page Table access correctly MMU HW Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 29
Option 2: shadow page tables q Hypervisor maintains “shadow page tables” q Guest page tables mapping: Guest VA (GVA) Guest PA (GPA) q Shadow tables mapping: this is the real mapping Guest VA Host PA (HPA) q When guest process accesses virtual address, MMU translates GVA to HPA correctly, because it is aware only of Shadow Table q Hypervisor is not involved when guest OS updates its Page Table o Result – possibly inconsistent guest page table and shadow page table VA – virtual address PA – physical address Guest OS Hypervisor Page table VMM G-CR 3 TLB CPU Shadow Page Table Interrupt & VMM corrects page table. MMU Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 30
Option 2: shadow page tables VMM manages partition memory among VMs q If address in TLB (i. e. TLB hit), the found mapping is correct and there is nothing to do q When guest process causes a page fault (i. e. the appropriate virtual page is marked as “absent” in Shadow Page Table) : o Hypervisor begins execution o Hypervisor must check in Guest Page Table to detect which one of two possible scenarios happened: § Guest page fault – No translation in guest page tables guest OS must handle page fault (run page replacement algorithm if needed) § Guest translation found Hypervisor must update Shadow Page Table respectively Guest OS 1 3 1 Hypervisor Page table VMM G-CR 3 7 TLB CPU Shadow Page Table Interrupt & VMM corrects page table. MMU Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 31
Option 2: shadow page tables Guest OS q Performance is without overhead as long as there are no page faults q Shadow page tables should be cached so that once a VM is re-scheduled the page table does not have to be rebuilt from scratch Hypervisor Page table VMM G-CR 3 TLB CPU Shadow Page Table Interrupt & VMM corrects page table. MMU Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 32
Shadow page tables – updating CR 3 Virtual CR 3 Real CR 3 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 33
Shadow page tables – updating CR 3 Virtual CR 3 Real CR 3 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 34
Shadow page tables – updating CR 3 Virtual CR 3 Real CR 3 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 35
Undiscovered guest page table Virtual CR 3 Real CR 3 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 36
Undiscovered guest page table Virtual CR 3 Real CR 3 Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 37
Option 3: Extended/nested page tables q The name implies having page tables within page tables q The essence of the idea is a hardware assist o Hardware has an extra pointer and the ability to walk an extra set of page tables o Idea is called Extended Page Tables (EPT) by Intel q Guest page tables hold Guest VA Guest PA mapping, access by standard CR 3 q Extended page tables hold Host VA Host PA mapping, access by EPTP (EPT pointer) q Host VA=Guest PA Host page table of Hypervisor VM 3 Host page table of VM 2 Host page table of VMM VM 1 Guest OS Page table CR 3 TLB CPU EPTP MMU Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 38
Walking extended page tables Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 39
Option 3: Extended/nested page tables q TLB as usual holds Guest VA Host PA q On memory access o If found in TLB – no problem. o If not in TLB, but no page fault, hardware walks both tables and updates TLB. o If page fault, then hypervisor gets host virtual page (guest physical page) and maps it to host physical page. Host page table of Hypervisor VM 3 Host page table of VM 2 Host page table of VMM VM 1 Guest OS Page table CR 3 TLB CPU EPTP MMU Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 40
Sources q “Modern operating systems”, 4‘th edition, A. Tanenbaum and H. Bos q “Virtual machines”, J. E. Smith and R. Nair q A presentation by Niv Gilboa from CSE@BGU q “Formal requirements for virtualizable third generation architectures”, G. J. Popek and R. P. Goldberg, CACM, 1974 q “A comparison of software and hardware techniques for x 86 virtualization”, K. Adams and O. Ageson, ASPLOS 2006 q A presentation by VMWare Operating Systems, Spring 2020, I. Dinur, D. Hendler and M. Kogan-Sadetsky 41
- I see broad daylight on the other side
- It's not paranoia
- And the tree was happy
- Vocabulary workshop level d unit 1
- Has virtual functions and accessible non-virtual destructor
- Wish + past tenses
- Scientific notation
- What leaders really do
- Tell me what you really like
- Scientific notation is a shorthand way of writing really
- Tongue twister template
- It really burned me up when you yelled at me.
- A marketer can really satisfy everyone in the
- Heavenly father are you really there song
- Heavenly father are you really there
- Name calling examples
- Proofs that really count
- The sun peeked happily from behind the clouds
- Do ghosts really exist
- Really bad powerpoint presentations
- Tell me whats really going on
- Jesus beard
- What did elizabeth i really look like
- It's really fascinating
- Doesnt really matter
- People really hate elephants on compact cars
- Tell me whats really going on
- Really simple syndication initial release
- Reported speech gramatika
- I really enjoyed the experience
- People really hate elephants on compact cars
- What's the matter you look really tired
- What are basic human aspirations?
- We really must stop these silly pretences
- What really happened on thanksgiving
- What british people really mean
- This is really scary with long
- It's really fascinating
- On the cruelty of really teaching computer science
- Thinking synoynm
- Does a string and cup really work
- Attitude does matter