Disco Running Commodity Operating Systems on Scalable Multiprocessors
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre La. Borde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei Kuo Paper by: Edouard Bugnion, Scott Devine, Kinshuk Govil, Mendel Rosenblum
Introduction Pierre La. Borde
Introduction • CC-NUMA o Cache-Coherent Non-Uniform Memory Access • Coupling with standard distributed protocols o TCP/IP o NFS • Global Buffer Cache
Introduction • Hide NUMA-ness o Page placement o Dynamic page migration o Dynamic page replication
Problem • Operating systems for innovative hardware o Scalable shared memory multiprocessors • Significant changes required o OS typically have millions of lines of code
Solution
Virtual Machine Monitors • Instead of modifying existing OS o Additional layer of software between hardware and OS o Multiple copies of existing operating systems § Support a variety of workloads o Virtualizes all of the resources § Exports conventional hardware interface o Schedules virtual resources on the physical § Processor § Memory
Virtual Machine Monitor • Monitor and distributed protocols need to scale o Simplicity of the monitor o Fault-containment o NUMA memory management issues • Global policies o Fine-grained resource sharing
Challenges • Overheads o Privileged instructions o I/O Devices • Resource Management o Instruction execution stream § Idle loop § Lock busy-waiting • Communication and Sharing o Virtual disk
Disco: A Virtual Machine Monitor Jordan Deveroux
Disco's Interface • Processors o Abstraction of MIPS R 10000 processor o Does not support complete virtualization of kernel virtual address space o Extends architecture to support efficient access to some processor functions • Physical Memory o Abstraction of main memory that resides in contiguous physical address space o Uses dynamic page migration and replication to export nearly uniform memory architecture to the software • I/O Devices o Each virtual machine has specified set of I/O devices o Intercepts communication from all of it's I/O devices for translation or emulation o Virtualizes access to the networking devices of the underlying system
Implementing Disco • Multithreaded, shared memory program • Disco vs. Other Systems o NUMA memory placement o cache-aware data structures o interprocessor communication patterns • NUMA memory management o Copy DISCO into all memories of FLASH machine • Cache-aware data structures o Partitioned so that parts accessed only by a certain processor are in memory near that processor • Interprocessor communication patterns o Very few locks o Wait-free synchronization
Implementing Disco: Virtual CPU's • Emulates virtual CPU's by using direct execution of real CPU's • Same execution speed as running on real CPU's • Each virtual CPU has a data structure like a process table entry in traditional O. S. o Contains state of virtual CPU • Runs in kernel mode with full access • Simple scheduler allows virtual processors to be shared
Implementing Disco: Virtual Physical Memory • Add a level of address translation and maintains physical-to-machine address mappings • Translation performed using translation-lookaside buffer • Memory references are translated through this mapping from now on • Each TLB entry is marked with an address space identifier to avoiding the flushing the TLB on context switches • Each miss is more expensive o emulation of trap architecture o emulation of privileged instructions o remapping of physical addresses
Implementing Disco: NUMA Memory Management • Optimization that enhances data locality • Fast translation of virtual-to-physical addresses • Allocation of real memory to virtual machines • Only moves pages that will have performance benefit • Contains a memmap data structure with an entry for each real machine memory page
Two different virtual processors of the same virtual machine logically read-share the same physical page, but each virtual processor accesses a local
Implementing Disco: Virtual I/O • Intercepts all device access from the virtual machine and forwards them to the physical devices • Each disco device defines a monitor call used by the device driver to pass all command arguements • Disks and network interfaces include a map as part of their arguements o list of address pairs that specify the source and destination of I/O operations
VM Sharing Imran Ali
Copy-on-Write Disks • Uses Virtual Memory Addressing to Map Data to physical Memory • Multiple Virtual Machines(VM) Share Machine Memory • Copy on write means that VM is unaware of Machine Memory being shared
VM Sharing Pages
Virtual Network Interfaces • Virtual Machines are not allowed to communicate with each other • Uses Standard Protocols to communicate through Ethernet- type addressing • All read only pages can be shared through virtual machines reducing memory overhead • Pages are shared whenever possible and are replicated when needed to improve proformance
Transparent Sharing of Pages
Experimental Results Yazen Ghannam
Experimental Setup • Experiments are Simulated, not using real hardware • Used four different workloads o Software Development (Pmake) § OS, I/O Intensive o Hardware Development (Engineering) § OS light; Large memory footprint o Scientific Computing (Raytrace, Radix) § OS light; uses shared memory regions o Commercial Database § I/O light; Single memory intensive
Execution Overheads
Memory Overheads
Scalability
Page Migration and Replication
Experiences and Related Work Tzu-Wei Kuo
Experiences on Real Hardware • Disco was ported to run on a real hardware in order to confirm the simulation test results • Run on SGI Origin 200 board which forms the basis of the FLASH machine o Single - 180 MHz MIPS R 10000 processor o 128 MB of memory
Experiences on Real Hardware • Overheads of Virtualization • Two workloads o Pmake: compiles Disco itself using the SGI development tools, two files at a time o Engineering: simulates the memory system of the FLASH machine
Experiences on Real Hardware • This table shows a breakdown of the execution time for the two workloads and a comparison between IRIX and Disco running IRIX. The execution time is broken down into the user, system, and idle time.
Related Work • System Software for Scalable Shared Memory Machines • Virtual Machine Monitors • Other System Software Structuring Techniques • CC-NUMA Memory Management
Conclusion • Developing system software for scalable shared memory multiprocessors without massive development effort • Experimental results shows that the overhead of virtualization is modest in both processing time and memory footprints • Disco provides simple solution for scalability and reliability • Lower implementation cost
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre La. Borde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei Kuo Paper by: Edouard Bugnion, Scott Devine, Kinshuk Govil, Mendel Rosenblum
Title • Text
- Slides: 36