The Performance of MicroKernel Based Systems H Haertig

The Performance of Micro-Kernel. Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park

Introduction l l l μ-kernels have reputation for being too slow, inflexible Can 2 nd generation μ-kernel (L 4) overcome limitations? Experiment: – – Port Linux to run on L 4 (Mach 3. 0) Compared to native Linux, Mk. Linux (Linux on 1 st gen Mach derived μ-kernel)

Introduction (cont. ) l l Test speed of standard OS personality on top of fast μ-kernel: Linux implemented on L 4 Test extensibility of system: – – – l pipe-based communication implemented directly on μ-kernel mapping-related OS extensions implemented as user tasks user-level real-time memory management implemented Test if L 4 abstractions independent of platform

L 4 Essentials l l Based on threads and address spaces Recursive construction of address spaces by userlevel servers – – l l Initial address space σ0 represents physical memory Basic operations: granting, mapping, and unmapping. Owner of address space can grant or map page to another address space All address spaces maintained by user-level servers (pagers)

L 4 Linux – Design & Implementation l l l Fully binary compliant with Linux/X 86 Restricted modifications to architecturedependent part of Linux No Linux-specific modifications to L 4 kernel

L 4 Linux – Design & Implementation l Address Spaces – – Initial address space σ0 represents physical memory Basic operations: granting, mapping, and unmapping. L 4 uses “flexpages”: logical memory ranging from one physical page up to a complete address space. An invoker can only map and unmap pages that have been mapped into its own address space

L 4 Linux – Design & Implementation

L 4 Linux – Design & Implementation l Address Spaces (cont. ) – – I/O ports are parts of address spaces. Hardware interrupts are handled by user-level processes. The L 4 kernel will send a message via IPC.

L 4 Linux – Design & Implementation l The Linux server – – L 4 Linux will use a single-server approach. A single Linux server will run on top of L 4, multiplexing a single thread for system calls and page faults. The Linux server maps physical memory into its address space, and acts as the pager for any user processes it creates. The Server cannot directly access the hardware page tables, and must maintain logical pages in its own address space.

L 4 Linux – Design & Implementation l Interrupt Handling – – – All interrupt handlers are mapped to messages. The Linux server contains threads that do nothing but wait for interrupt messages. Interrupt threads have a higher priority than the main thread.

L 4 Linux – Design & Implementation l User Processes – – Each different user process is implemented as a different L 4 task: Has its own address space and threads. The Linux Server is the pager for these processes. Any fault by the user-level processes is sent by RPC from the L 4 kernel to the Server.

L 4 Linux – Design & Implementation l System Calls – Three system call interfaces: l l l – A modified version of libc. so that uses L 4 primitives. A modified version of libc. a A user-level exception handler (trampoline) calls the corresponding routine in the modified shared library. The first two options are the fastest. The third is maintained for compatibility.

L 4 Linux – Design & Implementation l Signalling – – Each user-level process has an additional thread for signal handling. Main server thread sends a message for the signal handling thread, telling the user thread to save it’s state and enter Linux

L 4 Linux – Design & Implementation l Scheduling – – – All thread scheduling is down by the L 4 kernel The Linux server’s schedule() routine is only used for multiplexing it’s single thread. After each system call, if no other system call is pending, it simply resumes the user process thread and sleeps.

L 4 Linux – Design & Implementation l Tagged TLB & Small Space. – – In order to reduce TLB conflicts, L 4 Linux has a special library to customize code and data for communicating with the Linux Server The emulation library and signal thread are mapped close to the application, instead of default high-memory area.

Performance l What is the penalty of using L 4 Linux? Compare L 4 Linux to native Linux l Does the performance of the underlying micro-kernel matter? Compare L 4 Linux to Mk. Linux l Does co-location improve performance? Compare L 4 Linux to an in-kernel version of Mk. Linux

Microbenchmarks l measured system call overhead on shortest system call “getpid()”

Microbenchmarks (cont. ) l Measures specific system calls to determine basic performance.

Macrobenchmarks l measured time to recompile Linux server

Macrobenchmarks (cont. ) l Next use a commercial test suite to simulate a system under full load.

Performance Analysis l l l L 4 Linux is, on average 8. 3% slower than native Linux. Only 6. 8% slower at maximum load. Mk. Linux: 49% average, 60% at maximum. Co-located Mk. Linux: 29% average, 37% at maximum.

Extensibility Performance l l l A micro-kernel must provide more than just the features of the OS running on top of it. Specialization – improved implementation of Os functionality Extensibility – permits implementation of new services that cannot be easily added to a conventional OS.

Pipes and RPC First five (1) use the standard pipe mechanism of the Linux kernel. (2) Is asynchronous and uses only L 4 IPC primitives. Emulates POSIX standard pipes, without signalling. Added thread for buffering and cross-address-space communication. (3) Is synchronous and uses blocking IPC without buffering data. (4) Maps pages into the receiver’s address space.

Virtual Memory Operations l l The “Fault” operation is an example of extensibility – measures the time to resolve a page fault by a user-defined pager in a separate address space. “Trap” – Latency between a write operation to a protected page, and the invocation of related exception handler. “Appel 1” – Time to access a random protected page. The fault handler unprotects the page, protects some other page, and resumes. “Appel 2” – Time to access a random protected page where the fault handler only unprotects the page and resumes.

Conclusion l l Using the L 4 micro-kernel imposes a 5 -10% slowdown to native Linux. Much faster than previous micro-kernels. Further optimizations such as co-locating the Linux Server, and providing extensibility could improve L 4 Linux even further.

Q&A