The 27 Year Old Microkernel Sebastien MarineauMes Colin
The 27 Year Old Microkernel Sebastien Marineau-Mes & Colin Burgess
Agenda Background and timeline è Hybrid Software Model è Anatomy of the microkernel è How things work – system calls, process manager è Q&A è All content copyright QNX Software Systems 2
A History of Software Innovation 1980: First commercially available microkernel OS 1992: First RTOS to offer built-in faulttolerant networking 1985: First memoryprotected RTOS 1982: First RTOS to support a hard disk on a PC 1990: First POSIXcertified RTOS 1997: First RTOS to support symmetric multiprocessing (SMP) 1994: US patent for scalable microkernel windowing system 2002: First RTOS vendor to deliver Eclipse-based IDE 2005: First to offer “bound” multiprocessing 2007: Introduces hybrid software model and opens source code 1980 QNX 2 1985 1990 QNX 4 1995 All content copyright QNX Software Systems 2000 QNX 6 2005 3
Hybrid Software Model è Developer Enablement > Published source code – runtime components > Transparent development § QNX development teams working in the open § Live check-ins for features and bugfixes Community Enablement > Foundry 27 developer portal > Initial projects: OS, Tools, BSPs, Bazaar è Business Enablement > Free access to development tools for non-commercial and partners > Free access to source for development è > > § Standard business model & pricing for commercial projects Ability to create and distribute derivative works Flexible contribution model All content copyright QNX Software Systems 4
Microkernel Architecture Process Manager File System Networking Windowing Multi-media Message Bus µK Microkernel Arm, Mips, SH 4 Power. PC, Xscale, X 86 Microkernel + Process Manager are the only trusted components Applications and Drivers Are processes which plug into a message bus • Reside in their own memory-protected address space • Have a well defined message interface • Cannot corrupt other software components • Can be started, stopped and upgraded on the fly § All content copyright QNX Software Systems 5
Separation of Duties – Process Manager vs. Micro. Kernel Microkernel Process Manager Messages Pathname Threads Process Synchronization Virtual Memory Scheduling procfs Signals Debug Timers Resources Channels Loader Connections Named Sems Interrupts imagefs procnto All content copyright QNX Software Systems 6
Microkernel Services Simple pre-emptable operations è Provides basic system services > Implements much of the POSIX thread and realtime è Messages Threads Synchronization Scheduling Signals Timers Channels Connections Interrupts standard Interrupt and exception redirection IPC primitives > > è Most of the microkernel is hardware independent > CPU-dependant layer for low-level cpu interfaces > CPU-specific optimized routines è Only pieces of code that runs with full system privilege è Microkernel does not run “on its own” > Only reacts to external events: system calls, interrupts, exceptions All content copyright QNX Software Systems 7
Process Manager Services Process Manager è Implements long, complex operations and services > Ex: Process creation and memory management è Is a multi-threaded process that is scheduled at normal priority > Competes for CPU with all other threads in the system Message driven server è More on this later è Pathname Process Virtual Memory procfs Debug Resources Loader Named Sem imagefs All content copyright QNX Software Systems 8
Procnto Source Layout /services/system/ memmgr arm mips M an a ge m en t … rtu ic al M ro ke em rn e ls or y x 86 procmgr Process lifecyle management sh Support Functions ppc er vi ce s mips pathmgr Vi M arm proc Pathname management ker Looking for source code? Go to www. foundry 27. com -> Projects -> Core Operating System -> Source Guide All content copyright QNX Software Systems 9
Kernel call operations sequence Kernel entry Entry Interrupts off Unlocked Kernel Operation which may include message pass Locked Kernel exit Pre-emptable No pre-emption Interrupts on Unlocked Pre-emptable Exit Interrupts off All content copyright QNX Software Systems 10
Anatomy of a Kernel Call A user-mode thread makes a call to a system call stub located in libc > Ex: Msg. Sendvnc() è The system call stub executes a TRAP instruction (or whatever instruction is appropriate for the particular hardware). è Msg. Sendvnc: lw $8, 16($29) addiu $2, $0, 12 syscall jr $31 nop è è è The processor changes privilege state, interrupts are disabled Execution resumes at the appropriate vector services/system/ker/<cpu>/kernel. s > One of the few pieces of code that is all assembly – see __ker_entry, __ker_sysenter /* * r 4 k_syscall_handler() * Streamlined path for our most common operation--kernel calls */ FRAME(r 4 k_syscall_handler, sp, 0, ra). set noat /* * Coming from user mode. Save user registers, and get * a fresh kernel stack. Move GP to our own short data * area. */ LD_ACTIVE_AND_KERSTACK(k 0, k 1) addiu k 0, REG_OFF SAVE_REGS(1) All content copyright QNX Software Systems 11
Kernel Entry /* * r 4 k_syscall_handler() * Streamlined path for our most common operation--kernel calls */ FRAME(r 4 k_syscall_handler, sp, 0, ra) /* * Coming from user mode. Save user registers, and get * a fresh kernel stack. Move GP to our own short data * area. */ LD_ACTIVE_AND_KERSTACK(k 0, k 1) System call entry Load kernel stack Save thread context (register set) SAVE_REGS(1) Acquire the kernel lock ACQUIRE_KERNEL(INKERNEL_NOW, zero, 1) • On uni processor, atomically set the INKERNEL_NOW bit • On SMP systems, spinlock on INKERNEL_NOW Enable Interrupts Transfer to kernel call implementation • services/system/ker_call_table. c /* * Interrupts are now OK again */ STI /* * Kernel call number should still be intact in v 0. * Save the kernel call number. */ sw v 0, SYSCALL(s 0) #if defined(VARIANT_instr) la t 1, _trace_call_table #else la t 1, ker_call_table #endif /* * Index the call table and run the C code */ All content copyright QNX Software Systems 12
Kernel Function Implementation Entry from kercall table int kdecl ker_timer_create(THREAD *act, struct kera *kap) { VALID_CLOCKID(kap); Validate parameters if(kap->event) { Verify pointers referenced by kernel are valid RD_VERIFY_PTR(act, kap->event, si • RD_PROBE/WR_PROBE functions RD_PROBE_INT(act, kap->event, siz • RD_VERIFY_*/WR_VERIFY_* functions } • If addresses are no accessible, a fault will be generated and kernel call will return with EFAULT All done up-front work, ready to do the real work prp = act->process; … It’s very important to get the validation right, as a fault (due to invalid or malicious parameter passed in to call) could be catastrophic All content copyright QNX Software Systems 13
Microkernel Pre-emption Kernel entry Entry Interrupts off Unlocked Kernel Operation which may include message pass Locked Kernel exit Pre-emptable Kernel call preemption is important Interrupt activity may READY a higher priority thread to run while a kernel call is in progress è We want to immediately schedule this higher priority THREAD (minimize scheduling latency) è QNX does this in a novel way – preemptable kernel > Defer changing global kernel state > Implementation of kernel ops is 2 stages: è è do the work followed by a “commit” On preemption, the active thread’s IP is backed up to re-execute the SYSENTER instruction è Allows us to only have one kernel stack – not one per thread è Any memory references/calculations done before locking kernel must be restartable è No pre-emption Interrupts on Unlocked Pre-emptable Exit Interrupts off All content copyright QNX Software Systems 14
Lock Kernel Most of the preperatory work is done before locking kernel, if possible. int kdecl ker_sched_get(THREAD *act, struct kerargs_sched_get *kap) { PROCESS *prp; THREAD *thp; // Verify the target process exists. if((prp = (kap->pid ? lookup_pid(kap->pid) : act->process)) == NULL) return ESRCH; // Verify the target thread exists. if((thp = (kap->tid ? vector_lookup(&prp->threads, kap->tid-1) : act)) == NULL) return ESRCH; // Verify we have the right to examine the target process if(!kerisusr(act, prp)) return ENOERROR; Argument Verification if(kap->param) { WR_VERIFY_PTR(act, kap->param, sizeof(*kap->param)); WR_PROBE_OPT(act, kap->param, sizeof(*kap->param) / sizeof(int)); kap->param->sched_curpriority = thp->priority; kap->param->sched_priority = thp->real_priority; } User Pointer Verification Lock kernel to change status lock_kernel(); SETKSTATUS(act, thp->policy); return ENOERROR; } All content copyright QNX Software Systems 15
Exit Kernel è Exit kernel to run user-space thread > Note that currently scheduled thread may have changed § Ex: entered kernel due to HW interrupt, interrupt readied highprio thread (that high-prio thread becomes RUNNING) § Ex: Blocking kernel causes current thread to be blocked, another to be made RUNNING __ker_exit implements this > è Adjust the address space if needed > memmgr. aspace() è Do special return processing > Deliver signals, pulses etc > This may cause a reschedule which could cause another loop through __ker_exit Restore the context of the (newly) active thread è Call SYSEXIT è All content copyright QNX Software Systems 16
What about “Non Kernel” System Calls? è In many cases, traditional UNIX system calls are not implemented by the micro-kernel on QNX. > They are implemented in the process manager or in external servers that extend procnto è In general, many of the lengthier core POSIX operations are done by the process manager All content copyright QNX Software Systems 17
Process Manager First process in system > Created by kernel (init_objects) è Provides core services to other processes è Multi-threaded Process > First <ncpus> threads are IDLE threads > Additional threads are threadpool worker threads è Message driven server è Actually a collection of (almost) independent servers è 4 message handlers è 11(!) resource managers > These resource managers are actually mini filesystems. è All content copyright QNX Software Systems 18
Process Manager Message Handlers Resource Managers (pathmgr/*) proc/rsrcdbmgr_* /dev/mem proc/sysmgr_* /dev/null procmgr/* /dev/text /dev/tty SYSMGR_COID /dev/zero /dev/shmem /dev/tymem memmgr/* /dev/sem /proc/boot /proc / All content copyright QNX Software Systems 19
Process Manager Normal process… but è It has certain privileges > Executes at higher processor privilege level è > > § This varies depending on processor architecture Executes in kernel address space § Not quite true § Because proc’s address space and user address spaces don’t overlap, it may adopt a users address space. This makes for faster message passes between proc and user applications. Has permission to use __Ring 0() kernel call All content copyright QNX Software Systems 20
Process Manager è __Ring 0 Kernel Call > Used by proc when it needs to execute code in the kernel context > > § Mostly used when manipulation of kernel structures is required § Provide atomicity of kernel state modifications to ensure consistency Occasionally used when processor privilege is required § Ex: manipulate privileged CPU registers Arguments are a function pointer and a data pointer § Remember process manager shares address space with kernel _NTO_PF_RING 0 flag needed to use this kernel call Only process manager has this flag set All content copyright QNX Software Systems 21
Process Manager Example The process manager implements many services which would actually be “kernel calls” in traditional UNIX è Example – mmap() > mmap() is the API through which all mappings are setup by user è > > processes Malloc uses mmap() to allocate heap memory, also known as “anonymous” memory, since it is not a mapping of a named object. Not implemented as a kernel call, but rather a message that is sent to the process manager All content copyright QNX Software Systems 22
mmap() void *_mmap(void *addr, size_t len, int prot, int flags, int fd, off 64_t off, unsigned align, unsigned preload, void **base, size_t *size) { mem_map_t msg; Type of operation Parameters Send message to procnto requesting operation be done msg. i. type = _MEM_MAP; msg. i. zero = 0; msg. i. addr = (uintptr_t)addr; msg. i. len = len; msg. i. prot = prot; msg. i. flags = flags; msg. i. fd = fd; msg. i. offset = off; msg. i. align = align; msg. i. preload = preload; msg. i. reserved 1 = 0; if(Msg. Sendnc(MEMMGR_COID, &msg. i, sizeof msg. i, &msg. o, sizeof msg. o) == -1) { return MAP_FAILED; } All content copyright QNX Software Systems 23
memmgr_handler() è The _MEM_MAP message type is picked up and passed to the memmgr message handler switch(msg->type) { … case _MEM_MAP: proc_wlock_adp(prp); status = memmgr_map(ctp, prp, &msg->map); break; case _MEM_CTRL: proc_wlock_adp(prp); status = memmgr_ctrl(prp, &msg->ctrl); break; All content copyright QNX Software Systems 24
Process Manager User Process Manager malloc() mmap() Msg. Sendv() return msg. o. addr; _MEM_MAP Msg. Receivev() memmgr_map() vmm_mmap() map_create() pa_alloc() pte_manipulate() Msg. Replyv() All content copyright QNX Software Systems 25
Other Process Manager Services Creating processes! è The spawn() send a _PROC_SPAWN message to create a new process è The exec() ‘system call’ is actually a spawn message with an SPAWN_EXEC flag set! è The fork() ‘system call’ is a _PROC_FORK message è Procfs debug filesystem > Similar to unix procfs > Used by debugger/pidin/ps è All content copyright QNX Software Systems 26
Ongoing kernel development The kernel team is currently working on our next release > Codename “trinity 2” è Features include: > Memory management enhancements such as variable page support è (~15% improvement in system performance) POSIX PSE 52 certification PPC 9 xx processor support ARMv 6 support Cross-endian QNET capabilities > > è Trinity 2 is currently feature complete – bugfixing/release process underway > Builds available on foundry 27: http: //community. qnx. com/sf/wiki/do/view. Page/projects. core_os/wiki/Trinity 2 All content copyright QNX Software Systems 27
Roadmap – QNX Source Postings Source Bundle Release Date Description Networking Nov 2007 Next Generation Networking stack, protocols, drivers (io-pkt) Block Filesystems March 2008 Block Filesystems and Utilities Flash Filesystems March 2008 Flash (NOR/NAND) Filesystems and Utilities Network Filesystems March 2008 Block/Flash/Network Filesystems and Utilities Devices and Drivers June 2008 Serial, Audio, USB, PCI frameworks and drivers System Services June /2008 Additional system service managers Window Systems Sept 2008 High level Photon server and services Graphics System Sept 2008 Lower level graphics libraries and drivers Multimedia Nov 2008 Full multimedia stack All content copyright QNX Software Systems 28
Brainteasers è Need something to chew on? > Try to figure out the questions below and post the answer to the > > OS_Tech forum on the OS project Prize for the first to answer each question QNX employees not eligible What does STI expand to in the MIPS kernel? 2. NEED_PREEMPT(act) checks queued_event_priority 1. What sets queued_event_priority? 3. In the memmgr message handler, what is the purpose of “proc_wlock_adp(prp); ”? All content copyright QNX Software Systems 29
Want to learn more? è Check out the projects on www. foundry 27. com è Download the QNX Momentics suite on www. qnx. com è Download the microkernel source from the QNX operating system project è Read the tech articles and wiki pages (linked off the project) è Participate in the forums on the QNX operating system project All content copyright QNX Software Systems 30
Questions?
- Slides: 31