Hardware Assisted Virtualization Argentina Software Development Center Software

  • Slides: 40
Download presentation
Hardware Assisted Virtualization Argentina Software Development Center Software and Solutions Group 21 July 2008

Hardware Assisted Virtualization Argentina Software Development Center Software and Solutions Group 21 July 2008

Agenda • • • Challenges of running a VMM SW Solution for IA-32 arch

Agenda • • • Challenges of running a VMM SW Solution for IA-32 arch without Intel-VT Virtualization challenges Software workarounds to support Ring Deprivileging Intel® Virtualization Technology – – – – VT-x Modes VT-x transition mechanisms Virtual Machine Control Structure (VMCS) Solving Virtualization Challenges with VT-x New instructions VT-x Extensions VT-d Intel® Virtualization Technology for Directed I/O VT-c: Intel® Virtualization Technology for Connectivity • Intel® VT vs AMD-V • Conclusions INTEL CONFIDENTIAL

Challenges of running a Virtual Machine Monitor (VMM) • OS and Apps in a

Challenges of running a Virtual Machine Monitor (VMM) • OS and Apps in a VM don’t know that the VMM exists or that they share CPU resources with other VMs. • VMM should isolate Guest SW stacks from one another. • VMM should run protected from all Guest software • VMM should present a virtual platform interface to Guest SW. INTEL CONFIDENTIAL

SW Solution for IA-32 arch without Intel-VT • Ring Deprivileging – A technique that

SW Solution for IA-32 arch without Intel-VT • Ring Deprivileging – A technique that runs all guest software at a privilege level greater than 0. – Privileged instructions generate faults – VMM runs in Ring-0 as a collection of fault handlers VM 0 App VM 1 App . . . App Guest OS 0 . . . VM Monitor Platform Hardware INTEL CONFIDENTIAL App . . . App Guest OS 1 • Ring The VMM interprets in software privileged 3 instructions that would be executed by an OS. Ring 1 • Any non privileged instruction issued by an OS or Application Environment is executed directly by the machine. • A guest OS could be deprivileged in two distinct ways: Ring 0 – it could run either at privilege level 1 (the 0/1/3 model) or , – It could run at privilege level 3 (the 0/3/3 model).

Virtualization challenges • Ring Aliasing – Problems that arise when software is run at

Virtualization challenges • Ring Aliasing – Problems that arise when software is run at a privilege level other than the privilege level for which it was written. • Example: the CS register which points to the code segment. If the PUSH instruction is executed with the CS register, the contents of that register (which include the current privilege level) is pushed on the stack. A guest OS could easily determine that it is not running at privilege level 0. • Address-Space Compression – OSs expect to have access to the processor’s full virtual address space (in IA-32. linear address space) • The VMM could run entirely within the guest’s virtual-address space (but the VMM’s instructions and data structures would use a substantial amount of the guest’s virtual address space. • The VMM could run in a separate address space, but it must use a minimal amount of the guest’s virtual address space for the control structures that manage transitions between guest software and the VMM (IDT and GDT for IA 32) – The VMM must prevent guest access to those portions of the guest’s virtual address space that the VMM is using. Otherwise the VMM’s integrity could be compromised. INTEL CONFIDENTIAL

Virtualization challenges • Excessive Faulting – Ring deprivileging can interfere with the effectiveness of

Virtualization challenges • Excessive Faulting – Ring deprivileging can interfere with the effectiveness of facilities in the IA-32 architecture that accelerate the delivery and handling of transitions to OS software. • For example: The IA-32 SYSENTER and SYSEXIT instructions support low-latency system calls. SYSENTER always effects a transition to privilege level 0, and SYSEXIT faults if executed outside that ring. – The VMM must emulate every execution of SYSENTER and SYSEXIT causing serious performance problems. • Non-Trapping Instructions – There are instructions that access privileged state and do not fault when executed with insufficient privilege. • For example, the IA-32 registers GDTR, IDTR, LDTR, and TR contain pointers to data structures that control CPU operation. Software can execute the instructions that read, or store, from these registers at any privilege level. INTEL CONFIDENTIAL

Virtualization challenges • Interrupt Virtualization – The mechanisms of masking external interrupts for preventing

Virtualization challenges • Interrupt Virtualization – The mechanisms of masking external interrupts for preventing their delivery when the OS is not ready for them is a big challenge for the VMM design. The VMM must manage the interrupt masking in order to prevent an OS of masking the external interrupts preventing any guest to receive interrupts. • For example: IA-32 uses the interrupt flag (IF) in EFLAGS register to control interrupt masking. A value of 0 indicates that interrupts are masked. • Access to Hidden State – Some components of the processor state are not represented in any software- accessible register. • For example: the IA-32 has the hidden descriptor caches for segment registers. A segment-register load copies of the GDT and LDT into this cache, which is not modified if software later writes to the descriptor tables. INTEL CONFIDENTIAL

Virtualization challenges • Ring Compression – Ring deprivileging uses privilege-based mechanisms to protect the

Virtualization challenges • Ring Compression – Ring deprivileging uses privilege-based mechanisms to protect the VMM from guest software. IA-32 includes two mechanisms: segment limits and paging: • Segment limits do not apply in 64 -bit mode. • Paging must be used. – Problem: IA-32 paging does not distinguish privilege levels 0 -2. » The guest OS must run at privilege level 3 (the 0/3/3 model). » The guest OS is not protected from the guest applications. • Frequent Access to Privileged Resources – The performance is compromised if the privileged resources are accessed many times generating many faults that must be intercepted by the VMM. • For example: the task-priority register (TPR), in IA-32 located in the advanced programmable interrupt controller (APIC), is accessed with very high frequency by some OSs. INTEL CONFIDENTIAL

Intel® Virtualization Technology • VT-x: Support for IA-32 processor virtualization • VT-i: Support for

Intel® Virtualization Technology • VT-x: Support for IA-32 processor virtualization • VT-i: Support for Itanium processor virtualization INTEL CONFIDENTIAL

VT-x Modes • VMX root operation: – Full privileged, intended for Virtual Machine Monitor

VT-x Modes • VMX root operation: – Full privileged, intended for Virtual Machine Monitor • VMX non-root operation: – Not fully privileged, intended for guest software Ø Both forms of operation support all four privilege levels from 0 to 3 INTEL CONFIDENTIAL

VT-x transition mechanisms • VM exit – From VMX non-root operation mode to VMX

VT-x transition mechanisms • VM exit – From VMX non-root operation mode to VMX root operation mode • VM entry – from VMX root operation mode to VMX non-root operation mode VM 0 App . . . VM 1 App . . . Guest OS 0 VM Exit App . . . App Guest OS 1 VM Entry Platform Hardware INTEL CONFIDENTIAL VMM VTX non root mode / Ring 3 VTX non root mode / Ring 0 VTX root mode / Ring 0

Virtual Machine Control Structure (VMCS) • Data structure that manages VM entries and VM

Virtual Machine Control Structure (VMCS) • Data structure that manages VM entries and VM exits. • VMCS is logically divided into: – – – Guest-state area Host-state area. VM-execution control fields VM-exit control fields VM-entry control fields VM-exit information fields • VM entries load processor state from the guest-state area. • VM exits save processor state to the guest-state area and the exit reason, and then load processor state from the host-state area. INTEL CONFIDENTIAL

VT-x Operations VM 1 VMX Non-root Operation VM Exit VMX Root IA-32 Operation VM

VT-x Operations VM 1 VMX Non-root Operation VM Exit VMX Root IA-32 Operation VM 2 VM n Ring 3 Ring 0 VMCS 1 VMCS 2 VMCS n Ring 3 VMRESUME VMLAUNCH VMXON INTEL CONFIDENTIAL Ring 0 . . . Ring 3

Solving Virtualization Challenges with VT-x • Address-Space Compression – With VT-x every transition between

Solving Virtualization Challenges with VT-x • Address-Space Compression – With VT-x every transition between guest software and the VMM can change the linear-address space, allowing guest software full use of its own address space. – The VMX transitions are managed by the VMCS, which resides in the physicaladdress space, not the linear address space. • Ring Aliasing and Ring Compression – VT-x allow VMM to run guest software at its intended privilege level, this fact: • Eliminates ring aliasing problems: an instruction such as PUSH (of CS) cannot reveal that software is running in a VM. • Eliminates ring compression problems that arise when a guest OS executes at the same privilege level as guest applications INTEL CONFIDENTIAL

Solving Virtualization Challenges with VT-x • Nonfaulting Access to Privileged State – VT-x avoid

Solving Virtualization Challenges with VT-x • Nonfaulting Access to Privileged State – VT-x avoid this problem in two ways: • Generating VMExits on each sensitive execution • Provides configuration of interrupts and exceptions disposition • Guest System Calls – Problems occur with the IA-32 instructions SYSENTER and SYSEXIT when guest OS run outside privilege level 0. This problem is solved because with VT -x, a guest OS can run at privilege level 0. INTEL CONFIDENTIAL

Solving Virtualization Challenges with VT-x • Interrupt Virtualization – VT-x provide explicit support for

Solving Virtualization Challenges with VT-x • Interrupt Virtualization – VT-x provide explicit support for interrupt virtualization • It includes an external-interrupt exiting VM-execution control. – When this control is set to 1, a VMM prevents guest control of interrupt masking without gaining control of every guest attempt to modify EFLAGS. IF. • It includes an interrupt-window exiting VM-execution control. – When this control is set to 1, a VM exit occurs whenever guest software is ready to receive interrupts. A VMM can set this control when it has a virtual interrupt to deliver to a guest. • Access to Hidden State – VT-x includes, in the guest-state area of the VMCS, fields corresponding to CPU state not represented in any software-accessible register. • The processor loads values from these VMCS fields on every VM entry and saves into them on every VM exit. INTEL CONFIDENTIAL

Solving Virtualization Challenges with VT-x • Frequent Access to Privileged Resources – VT-x allow

Solving Virtualization Challenges with VT-x • Frequent Access to Privileged Resources – VT-x allow a VMM to avoid the overhead of high-frequency guest access to the TPR register. • A VMM can configure the VMCS so that the VMM is invoked only when required: when the value of the TPR shadow associated with the VMCS drops below that of a TPR threshold in the VMCS. INTEL CONFIDENTIAL

VT-x New instructions • VMXON and VMXOFF – To enter and exit VMX-root mode.

VT-x New instructions • VMXON and VMXOFF – To enter and exit VMX-root mode. • VMLAUNCH: Used on initial transition from VMM to Guest – Enters VMX non-root operation mode • VMRESUME: Used on subsequent entries – Enters VMX non-root operation mode – Loads Guest state and Exit criteria from VMCS • VMEXIT – – • Used on transition from Guest to VMM Enters VMX root operation mode Saves Guest state in VMCS Loads VMM state from VMCS VMPTRST and VMPTRLD – To Read and Write the VMCS pointer. • VMREAD, VMWRITE, VMCLEAR – Read from, Write to and clear a VMCS. INTEL CONFIDENTIAL

VT-x Extensions • • CPUID spoofing (Flex Migration) Extended Page Table (EPT) Virtual Processor

VT-x Extensions • • CPUID spoofing (Flex Migration) Extended Page Table (EPT) Virtual Processor IDs (VPID) Guest Preemption Timer INTEL CONFIDENTIAL

VT-x extension: CPUID spoofing (Flex Migration) • Allows software to “spoof” the CPUID feature

VT-x extension: CPUID spoofing (Flex Migration) • Allows software to “spoof” the CPUID feature bits (e. g. make the value of the CPUID feature bits appear different than they really are). • This is the same than the CPUID spoofing feature that the current VT processors have. Live VM Migration Pre 2004+ 32 bit single core 64 bit single core Older / Existing Servers INTEL CONFIDENTIAL Live VM Migration 2006+ (Intel® Core™) 64 bit dual, quad-core Newer Servers

VT-x extension: Extended Page Table (EPT) • All guest-physical addresses go through extended page

VT-x extension: Extended Page Table (EPT) • All guest-physical addresses go through extended page tables • Includes address in CR 3, address in PDE, address in PTE, etc. • • Reduces the frequency of VM exits to VMM. The net effect of both implementations (EPT or NPT) is to allow the guest OS to own and manage its own page table, and not force the host to get involved. INTEL CONFIDENTIAL

VT-x extension: Virtual Processor IDs (VPID) • The idea of a tagged TLB is

VT-x extension: Virtual Processor IDs (VPID) • The idea of a tagged TLB is that each TLB entry is “tagged” with an identifier • Having such a tag allows the TLB entries to not be “flushed” when switching between the host and a guest • VPID is activated if the new “enable VPIP” control bit is set in VMCS INTEL CONFIDENTIAL

VT-x extension: Guest Preemption Timer • • • Allows VMM to preempt guest execution.

VT-x extension: Guest Preemption Timer • • • Allows VMM to preempt guest execution. • Can bound guest execution time. Programmable by VMM. • Causes VM exit when timer expires. • No impact on interrupt architecture. VMM-specific and platform-independent. • No need to share with guest OS. It can help a lot when you need to switch tasks, or you must allocate a certain amount of CPU power to a task. For telecom and networking applications, it makes virtualization a useful tool and possibly a must have feature. On the other end of the spectrum, it can help for media applications like media PCs and Tivo-type devices. For the business world, it doesn't buy you all that much. INTEL CONFIDENTIAL

VT-d: Intel® Virtualization Technology for Directed I/O • Provides the capability to ensure improved

VT-d: Intel® Virtualization Technology for Directed I/O • Provides the capability to ensure improved isolation of I/O resources for greater reliability, security, and availability. • Supports the remapping of I/O DMA transfers and device-generated interrupts. • Provides flexibility to support multiple usage models that may run unmodified, special-purpose, or "virtualization aware" guest OSs. INTEL CONFIDENTIAL

VT-d Feature: DMA Remapping • • DMA-remapping translates the address of the incoming DMA

VT-d Feature: DMA Remapping • • DMA-remapping translates the address of the incoming DMA request to the correct physical memory address and perform checks for permissions to access that physical address DMA-remapping hardware logic in the chipset sits between the DMA capable peripheral I/O devices and the computer’s physical memory INTEL CONFIDENTIAL

VT-d Feature: Interrupt Remapping • • The interrupt requests generated by I/O devices must

VT-d Feature: Interrupt Remapping • • The interrupt requests generated by I/O devices must be controlled by the VMM. When the interrupt occurs, the VMM must present the interrupt to the guest. This is not accomplished through hardware. The VT-d interrupt-remapping architecture addresses this problem by redefining the interrupt-message format. Interrupt requests specify a requester-ID and interrupt-ID, and remap hardware transforming these requests to a physical interrupt. INTEL CONFIDENTIAL

VT-d Feature: Address Translation Services • • VT-d architecture defines a multi-level page-table structure

VT-d Feature: Address Translation Services • • VT-d architecture defines a multi-level page-table structure for DMA address translation. The multi-level page tables are similar to IA-32 processor page-tables, enabling software to manage memory at 4 KB or larger page granularity INTEL CONFIDENTIAL

VT-c: Intel® Virtualization Technology for Connectivity • Improves overall system performance by improving communication

VT-c: Intel® Virtualization Technology for Connectivity • Improves overall system performance by improving communication between host CPU and I/O devices within the virtual server. – This enables a lowering of CPU utilization, a reduction of system latency and improved networking and I/O throughput. • VT-c includes: – Intel® I/O Acceleration Technology. – Virtual Machine Device Queues (VMDq). – Single Root I/O Virtualization (SRIOV) implementation in Intel® devices. INTEL CONFIDENTIAL

VT-c: Intel® I/O Acceleration Technology • Intel® I/O Acceleration Technology (Intel® I/OAT) is a

VT-c: Intel® I/O Acceleration Technology • Intel® I/O Acceleration Technology (Intel® I/OAT) is a suite of features which improves data acceleration across the platform, from I/O and networking devices to the memory and processors which help to improve system performance. • Intel® Quick. Data Technology: designed to maximize throughput of server data traffic across a broader range of configurations and server environments to achieve faster, scalable, and more reliable I/O. • Direct Cache Access (DCA): Enables the CPU to pre-fetch data avoiding cache misses and improving application response times • MSI-X: Helps in load-balancing I/O network interrupts • Low latency interrupts: Automatically tune interrupt interval times depending on the latency sensitivity of the data • Receive Side Coalescing (RSC): provides lightweight coalescing of receive packets, which increases the efficiency of the host network stack INTEL CONFIDENTIAL

VT-c: Virtual Machine Device Queues (VMDq) • In addition to consolidating CPU processes, you

VT-c: Virtual Machine Device Queues (VMDq) • In addition to consolidating CPU processes, you also effectively consolidate I/O bandwidth and switch processing capabilities onto the same platform • The overhead of this switching limits your bandwidth, adds CPU overhead, and effectively reduces the benefits of server virtualization. In some cases you may have a new problem in having created an I/O bottleneck INTEL CONFIDENTIAL

VT-c: Virtual Machine Device Queues (VMDq) • On the receive path, VMDq provides a

VT-c: Virtual Machine Device Queues (VMDq) • On the receive path, VMDq provides a hardware ‘sorter' or classifier that essentially does the pre-work for the VMM of directing which end VM the packets should go to. The NIC or LAN silicon is performing a hardware assist for the VMM layer. INTEL CONFIDENTIAL

VT-c: Single Root I/O Virtualization • • • INTEL CONFIDENTIAL VI switches and manages

VT-c: Single Root I/O Virtualization • • • INTEL CONFIDENTIAL VI switches and manages data streams between System Images (SI) and I/O devices VI has to: – configure and setup I/O Devices – copy data streams SI ↔ VI ↔ I/O devices – switch I/O access from and to SI’s – handle messages/interrupts I/O ↔ VI ↔ SI – ensure secure data streams and messages between SI’s SW based virtualization of I/O is time consuming which limits performance

VT-c: Single Root I/O Virtualization • • • Single Root I/O Virtualization (SR-IOV) is

VT-c: Single Root I/O Virtualization • • • Single Root I/O Virtualization (SR-IOV) is a Peripheral Component Interconnect Special Interest Group (PCI-SIG) specification. SR-IOV provides a standard mechanism for devices to advertise their ability to be simultaneously shared among multiple virtual machines. SR-IOV allows for the partitioning of a PCI function into many virtual interfaces for the purpose of sharing the resources of a PCI Express* (PCIe) device in a virtual environment. With SR-IOV: – SI’s will get direct access to PCIe device functions – No more need for hypervisor (VI) to manage all system resources PCIe devices will have multiple virtual functions (VF’s) – utilizable by multiple SI’s – a single SI may also use multiple virtual functions Security of I/O Streams ensured by – Independency of control structures between VF’s within one PCIe device – I/O address translation services – Interrupt remapping mechanisms INTEL CONFIDENTIAL

Intel® VT vs AMD-V • Although architectures are different, AMD’s Virtualization Technology have equivalent

Intel® VT vs AMD-V • Although architectures are different, AMD’s Virtualization Technology have equivalent level of assistance to the VMMs as that of Intel® VT. • Intel® and AMD’s virtualization technology roadmaps include equivalent extensions to accelerate and optimize virtualization software. • AMD-V Rapid Virtualization Indexing provides performance improvement on virtualized environments and it is equivalent to Intel® VT Extended Page Tables. • AMD-V Extended Migration is equivalent to VT Flex Migration. INTEL CONFIDENTIAL

Conclusions • VT Reduces guest OS dependency – Eliminates need for binary patching /

Conclusions • VT Reduces guest OS dependency – Eliminates need for binary patching / translation – Facilitates support for Legacy OS • VT improves robustness – Eliminates need for complex SW techniques – Simpler and smaller VMMs – Smaller trusted-computing base • VT improves performance – Fewer unwanted Guest VMM transitions INTEL CONFIDENTIAL

Backup INTEL CONFIDENTIAL

Backup INTEL CONFIDENTIAL

The VMCS guest area • Is used to contain elements of the state of

The VMCS guest area • Is used to contain elements of the state of virtual CPU associated with that VMCS. – – • The segment registers: to map from logical to linear addresses CR 3: to map from linear to physical addresses IDTR: for event delivery It contains fields that are not held in any software-accessible register: • The processor’s interruptibility state: indicates whether external interrupts are temporarily masked and whether non-maskeable interrupts are masked because software is handling an earlier NMI. It does not contain fields corresponding to registers that can be saved and loaded by the VMM itself. INTEL CONFIDENTIAL

The VMCS control fields The VMCS contains a number of fields that control VMX

The VMCS control fields The VMCS contains a number of fields that control VMX not-root operation by specifying the instructions and events that cause VM exits. • The VMCS includes controls that support interrupt virtualization: – External interrupt exiting: if it is set, all external interrupts cause VM exits. The guest is not able to mask these interrupts – Interrupt window exiting: if it is set a VM exit occurs whenever guest software is ready to receive interrupts. – Use TPR shadow: if is set, accesses to the APIC’s TPR through control register CR 8 are handled in a special way: executions of MOV CR 8 access a TPR shadow referenced by a pointer in the VMCS. The VMCS also includes a TPR threshold; a VM exit occurs after any instruction that reduces the TPR shadow below the TPR threshold. (Flex Priority) – CR 0 and CR 4 virtualization INTEL CONFIDENTIAL

The VMCS control fields • Exception bitmap: 32 entries for the IA-32 exceptions. To

The VMCS control fields • Exception bitmap: 32 entries for the IA-32 exceptions. To specify which exception should cause VM exits and which should not. • I/O bitmaps: one entry for each port in the 16 -bit I/O space. An I/O cause a VM exit if it attempts to access a port whose entry is set in the I/O bitmap. • MSR bitmaps: two entries (read and write) for each model-specific register (MSR) currently in use. An execution of RDMSR (or WRMSR) causes a VM exit if attempts to read (or write) an MSR whose read bit (or write bit) is set in the MSR bitmaps. INTEL CONFIDENTIAL

VMCS location • The VMCS is referenced with a physical address. – This eliminates

VMCS location • The VMCS is referenced with a physical address. – This eliminates the need to locate it in the guest’s linear-address (may be different from that of the VMM). • The VMCS format and layout memory is not architecturally defined – This allow implementation-specific optimization to improve performance in VMX non-root operation. – This also reduce the latency of VM entries and VM exits. • VT-x defines a set of new instructions that allows software to access the VMCS in an implementation-independent manner. (coming soon) INTEL CONFIDENTIAL