Extensible Kernels Ken with slides by Amar Phanishayee
Extensible Kernels Ken, with slides by Amar Phanishayee 1
Traditional OS services – Management and Protection n Provides a set of abstractions n n n Processes, Threads, Virtual Memory, Files, IPC Sys calls and APIs (eg: Win 32, POSIX) Resource Allocation and Management Accounting Protection and Security n Concurrent execution 2
Context for these papers n n n Researchers (mostly) were doing special purpose O/S hacks Commercial market complaining that O/S imposed big overheads on them O/S research community began to ask what the best way to facilitate customization might be. In the spirit of the Flux OS toolkit…
Problems (examples coming-up) n Extensibility n n Performance n n n Abstractions overly general Apps cannot dictate management Implementations are fixed Crossing over into the kernel is expensive Generalizations and hiding information affect performance Protection and Management offered with loss in Extensibility and Performance 4
Need for Application controlled management (examples) n Buffer Pool Management In DBs (*) n n LRU, prefetch (locality Vs suggestion), flush (commit) Shared Virtual Memory (+) n use a page fault to retrieve page from disk / another processor 5
Examples (cont. ) n Concurrent Checkpointing (+) n n Overlap checkpointing and program being checkpointed Change rights to R-only on dirty pages Copy each page and reset rights Allow reads; Use write faults to {copy, reset rights, restart} * OS Support for Database Management (Stonebraker) + Virtual Memory Primitives for User Programs (Andrew W. Appel and Kai Li) 6
Examples (cont. ) Feedback for file cache block replacement [Implementation and Performance of Application-Controlled File Caching - Pei Cao, et al. ] 7
Down with monarchy! French Revolution - Execution of Louis XVI 8
Challenges n Extensibility n Security n Performance 9
Extensible Kernels n Exokernel (SOSP 1995): safely exports machine resources n n Higher-level abstractions in Library OS Secure binding, Visible resource revocation, Abort Apps link with the Lib. OS of their choice SPIN (SOSP 1995): kernel extensions (imported) safely specialize OS services n n Extensions dynamically linked into OS kernel Safety ensured by Programming Language facilities 10
Notice difference in pt. of view n Exokernel assumes that very significant extensions to the kernel are needed in many settings and that home-brew kernels may remain common for long into the future n n Goal is to enable this sort of work while reducing risk that developer will trash the file system, debugging tools, etc SPIN is more focused on protecting standard O/S against a device driver run amok. n Sees this as the more common need…
Exokernels - Motivation n Existing Systems offer fixed high-level abstractions which is bad n n n Hurt app performance (generalization – eg: LRU) Hide information (eg: page fault) Limit functionality (infrequent changes – cool ideas don’t make it through) 12
Motivation (cont. ) n n n Separate protection from management, mgmt in user space Apps should use domain specific knowledge to influence OS services Small and simple kernel – adaptable and maintainable 13
OS Component Layout Exokernel 14
Lib OS and the Exokernel n Lib OS (untrusted) can implement traditional OS abstractions (compatibility) n Efficient (Lib OS in user space) n Apps link with Lib OS of their choice n Kernel allows Lib. OS to manage resources, protects Lib. Oss 15
Exokernel : Design Principles n Securely expose hardware n n Expose allocation n Min resource management as required by protection (allocation, revocation) No implicit allocation Expose Names Expose Revocation n Eg: two-level replacement 16
Exokernel : Secure Bindings n n Lib OSs are untrusted Authorization at bind time Authentication at access time (no need to understand semantics – eg: FS permissions, groups) Techniques n n n Hardware (TLB) Software (STLB – Kavita Bala!) download code (direct procedure call, sandboxing, type-safe language) 17
Secure Bindings n Multiplexing Memory n n Record capabilities (ownership, RW) @ bind time Check capability @ access time Capability passing to share resources Multiplexing the Network n n n Application-specific Safe Handler (ASH) Download code into kernel (compiled to m/c code @ runtime) No kernel crossing; Procedure call instead of scheduling (low RTT) 18
Resource Revocation n Visible Revocation n n “please return a memory page” “return a page within 50 microseconds” CPU revocation at the end of time-slice Invisible better when revocations are frequent (due to f/b) Abort n n n To revoke resources “by force” from misbehaving processes repossession vector, repossession exception Worst case repossession (guarantee) 19
Ex. OS + Aegis n n n Platform – MIPS-based DECstation Aegis – exokernel Ex. OS – library OS n n Processes, Virtual Mem, IPC, Network Protocols (ARP/RARP, IP, UDP) Comparison with Ultrix (tuned monolithic kernel) 20
Base Cost in micro. Sec 12. 5 MHz ~11 MIPS 16. 6 MHz ~15 MIPS 25 MHz ~25 MIPS Demultiplexing Sys. Calls expensive in Ultrix. May have TLB miss in Sys call! 21
“barebone” unidirectional Protected Control Transfer (micro. Sec) Types 1. Asynchronous (donate only current time slice to callee) 2. Synchronous L 3 Entering kernel – 71 cycles Exiting Kernel – 36 cycles TLB flush on context switch 22
Key to Aegis’ Performance n n Easy keeping track of ownership Provides very little apart from low level multiplexing Caching secure bindings (STLB) Dynamic code generation 23
Ex. OS IPC • Pipe – shared mem; yield • Pipe’ has code inlining • Shm – Yield to switch (Ex. OS), Signals (Ultrix) • RPC – single function, no look-up. Cost of emulation in Ultrix using pipes or signals is high 24
Ex. OS Virtual Memory + Fast Sys call. - Half the time in look-up (vector). Repeated access to Aegis STLB and Ex. OS Page. Table 25
ASH and scalability • Ping-pong of counter in a 60 -byte UDP packet 4096 times between 2 processes in user space on DECStation 5000/125 • Without ASH - response on being scheduled. Round Robin scheduling -> linear increase in RTT. 26
Exokernel: Summary n Minimal Kernel n n Secure multiplexing of resources Bind time Authorization Portability OS Abstractions in user space (Lib OS) n n VM, IPC Apps link with OS of their choice 27
SPIN n Use of language features for Extensions n Extensibility n n Safety n n Dynamic linking and binding of extensions Interfaces. Type safety. Extensions verified by compiler Performance n Extensions not interpreted; Run in kernel space 28
Language: Modula 3 n n n Interfaces Type safety Array bounds checking Storage Management Threads Exceptions 29
Motivation Can we have all 3 in a single OS? From Stefan Savage’s SOSP 95 presentation 30
SPIN structure From Stefan Savage’s SOSP 95 presentation 31
Protection model n Capabilities n n n Pointer as capability Type safe (compile time check) Externalized reference 32
Protection model (cont. ) n Protection “domain” n n exported interfaces of safe object files Safe object file = verified by compiler or asserted by the kernel In-kernel name server Optional authorization for importing i/f 33
Events and Handlers n Events n message announcing n n Change in state Request for service Procedure exported from an interface Handlers register for events n Multiple handlers 34
Dispatcher n Central dispatcher – event router n n Primary handler Handler invocation n Synchronous/Asynchronous Bounded time Ordered/Unordered 35
Handler Installation From Brian Bershad’s OSDI 96 presentation 36
Handler Installation (cont. ) From Brian Bershad’s OSDI 96 presentation 37
Event Handling From Stefan Savage’s SOSP 95 presentation 38
Core Services: Memory Management n Services n n n Physical storage : allocate, deallocate, “reclaim” (returns capability) Naming (virtual) : allocate, deallocate Translation (mapping) : add/remove/check mapping n Exceptions n n n Bad. Address Page. Not. Present Extensions use these primitives to define an address space model 39
Core Services: Thread Management n Strand interface n n n Global and application-specific schedulers n n block/unblock checkpoint/resume fault-isolation Thread model can be defined using these primitives 40
Microbenchmarks IPC Sockets, SUN RPC Mesgs. In-kernel Call Thread Mgmt All numbers are in microseconds 41
Performance: Virtual Memory In-Kernel calls are more efficient than traps or messages All numbers are in microseconds 42
Performance: Networking Lower RTT because of in-kernel extension time in microseconds, Bandwidth in Mbps 43
End-to-End Performance Networked Video Server CPU utilization (network interface supports DMA) 44
Issues n n n Dispatcher scalability Handler scheduling Garbage collection 45
Conclusion n n Extensibility without loss of security or performance Exokernels n n n Safely export machine resources Decouple protection from management SPIN n n kernel extensions (imported) safely specialize OS services Safety ensured by Programming Language facilities 46
- Slides: 46