Improving IPC by Kernel Design Jochen Liedtke Shane

Improving IPC by Kernel Design Jochen Liedtke Shane Matthews Portland State University

Summary • Review • Performance improved – Architecture Level – Algorithmic Level – Interface Level – Coding Level 3/12/2004 Portland State University

Micro-kernels • Minimal OS, providing a set of primitives used to implement thread/address space management and IPC [1] • Everything else is moved to user-space (servers) 3

Terminology (L 3) • Dataspace – Memory object, mapped into address space • Task – Composed of threads, dataspaces, and an address space • Message – String/memory object 4

L 3 Architecture & IPC • Active components communicate via messages • Applies to: – Device drivers • Implemented as user level tasks – Hardware Interrupts • Interrupt message from micro-kernel to thread 5

L 3 Redesign Principles • IPC performance is the master – Security and performance must not be affected • Synergetic effects taken into consideration – (Think combined effects) – May lead to reinforcement or diminution • Design must aim at performance goal – Per short message transfer – 350 cycles (7 micro-seconds) 6

Architectural Level • Messages • Process Structure • Control Blocks 3/12/2004 Portland State University

Compound Messages • Multiple send/receive -> 1 send/receive • Messages consists of direct/indirect strings, and memory objects 3/12/2004 Portland State University

Twofold message copy • [A space] -> [kernel] > [B space] • O(20 +. 75 n) cycles, n: = bytes • Good for small messages • Need something better as n grows 9

LRPC and SRC RPC • Client/server share user level memory – sender -> shared buffer • Problems – When server to client is 1 to many, shared regions of address space become critical resources – Shared regions require explicit opens (unlike L 3) – Message change during/after checking 10

Direct Message Copy Via Windows • L 3's method – Destination mapped into window – Message copied to window • Window – per address space – Accessed exclusivly by kernel 11

Communication Windows • Problems – Must be fast – Different threads coxisting within address space • L 3 Implementation – One word page directory B to A. 12

Process Structure • Threads running kernel mode have 1 kernel stack per thread – Efficient since interupts, page faults, IPC, already save state on kernel stack • Continuations – Pro: • Reduce kernel stack – Cons: • Require additional copies between kernel and continutation • Interfere with other optimizations 13

Tread Control Blocks • Implemented as large array in kernel – fast tcb access • Array base + tcb # + tcb size – Saves TLB misses (IPC) • kernel stacks of sender and reciever located in TCB page – Locking done via unmapping on TCB 14

Algorithmic Level • Thread Identifier • Lazy Scheduling • Short Messages Via Registers 3/12/2004 Portland State University

Thread Identifier • Thread addressed by 64 -bit UID in usermode • Thread number in lower 32 -bits of UID – AND with bit mask, add to TCB’s array base 3/12/2004 Portland State University

Lazy Scheduling • IPC operation call or reply & receive next – Delete sending thread from ready queue – Insert into waiting queue – Delete receiving thread from waiting queue – Insert into ready queue • Too many queue operations! 3/12/2004 Portland State University

Lazy Scheduling cont. • L 3 queue invariants – Ready queue contains all ready threads – Waiting queue contains at least all threads waiting • TCB contains threads state (ready/waiting) • Scheduler removes all threads not belonging to queue during queue parsing 3/12/2004 Portland State University

Short Messages Via Registers • High proportion of messages are short – Ex. Driver ack/error, hardware interrupts • 486 – 7 general registers – 3 needed: sender ID, result code – 4 available • 8 -byte messages using coding scheme 3/12/2004 Portland State University

Interface Level • Simple RPC stubs – Load registers, system call, check success – Compiler generates stubs inline • Parameter Passing – Use registers when possible 3/12/2004 Portland State University

Coding Level • Reduce cache and TLB misses – Short kernel code • Short jumps, use registers, short address displacements – IPC kernel code in one page – Handle save/restore of coprocessor lazily • Delayed until different thread needs to use it 3/12/2004 Portland State University

Results • 100% would indicate double the time increase • Removal of all increase IPC time by 134% for 8 byte message 3/12/2004 Portland State University

Results • L 3 VS Mach • System – Intel 486 DX-50 – 256 KB external cache – 16 MB memory 3/12/2004 Portland State University

Results cont. 3/12/2004 Portland State University

Conclusions • IPC improved by applying – Performance based reasoning – Synergetic effects – Architecture -> coding 3/12/2004 Portland State University

References • [1] http: //en. wikipedia. org/wiki/Micro_kernel • [2] Improving IPC by Kernel Design - Jochen Liedtke 26