Windows Kernel Internals Synchronization Mechanisms David B Probert
Windows Kernel Internals Synchronization Mechanisms David B. Probert, Ph. D. Windows Kernel Development Microsoft Corporation © Microsoft Corporation 1
Kernel synchronization mechanisms Pushlocks Fastref Rundown protection Spinlocks Queued spinlocks IPI SLISTs DISPATCHER_HEADER KQUEUEs KEVENTs Guarded mutexes Mutants Semaphores Event. Pairs ERESOURCEs Critical Sections © Microsoft Corporation 2
Push Locks • • Acquired shared or exclusive NOT recursive Locks granted in order of arrival Fast non-contended / Slow contended Sizeof(pushlock) == Sizeof(void*) Pageable Acquire/release are lock-free Contended case blocks using local stack © Microsoft Corporation 3
Pushlock format © Microsoft Corporation 4
Cache Aware Push Locks Push Lock Push Lock © Microsoft Corporation 5
Pushlock non-contended cases • Exclusive acquire: (SC=0, E=0, W=0) (SC=0, E=1, W=0) • Exclusive release: (SC=0, E=1, W=0) (SC=0, E=0, W=0) • Shared acquire: (SC=n, E=0, W=0) (SC=n+1, E=0, W=0) • Shared release: (SC=n+1, E=0, W=0) (SC=n, E=0, W=0) © Microsoft Corporation 6
Pushlock contended cases • Exclusive acquires: (SC=0, E=1, W=0) (P=wb(ssc=0, e=1), W=1) (SC=n, E=0, W=0) (P=wb(ssc=n, e=1), W=1) • Shared acquire: (SC=0, E=1, W=0) (P=wb(ssc=0, e=0), W=1) wb is a stack-allocated waitblock ssc and e are the saved shared count and exclusive bit saved in the wb © Microsoft Corporation 7
Pushlock contended cases Shared releasing threads: – Search wb list for a wb’ with ssc>0 (or e==1) – If ( Interlocked. Decrement(wb’. ssc) ) == 0) fall through to the exclusive case – note that multiple threads may release but only one will decrement to 0 © Microsoft Corporation 8
Exf. Acquire. Push. Lock. Exclusive while(1) if Push. Lock FREE, confirm by setting to Exclusive DONE if Push. Lock. Waiting set WB (SSC=0, E=1, next=Push. Lock. Next) else n = Push. Lock. Share. Count set WB (SSC=n, E=1, next=NULL) Attempt to set Pushlock. Next = WB, Pushlock. Waiting = 1 Loop on failure Wait for event © Microsoft Corporation 9
Exf. Acquire. Push. Lock. Shared while(1) if Push. Lock FREE or shared (no waiters) count++. DONE if Pushlock. Exclusive OR Pushlock. Waiting // E or SSC if Pushlock. Waiting set WB(SSC=0, E=0, Next=Push. Lock. Next) else set WB(SSC=0, E=0, Next=NULL) Attempt to set Pushlock. Next = WB, Pushlock. Waiting = 1 Loop on failure Wait for event © Microsoft Corporation 10
Pushlock contended cases Exclusive releasing threads: – Search wb list for: • continuous chain of wb with ssc > 0 • or, a wb’ with e == 1 – Can then split the list one of two ways • the list and gives away either s or e © Microsoft Corporation 11
Fast Referencing • Used to protect rarely changing reference counted data • Small pageable structure that’s the size of a pointer • Scalable since it requires no lock acquires in over 99% of calls © Microsoft Corporation 12
Fast Referencing Example // Get a reference to the token using the fast path if we can Token = Ob. Fast. Reference. Object (&Process->Token); if (Token == NULL) { // The fast path failed so we have to obtain the lock first Psp. Lock. Process. Security. Shared (Process); Token = Ob. Fast. Reference. Object. Locked (&Process->Token); Psp. Unlock. Process. Security. Shared (Process); } © Microsoft Corporation 13
Fast Referencing Internals Object Pointer Object: R Ref. Cnt: R + 1 + N © Microsoft Corporation 14
Obtaining a Fast Reference Object Pointer Reference 3 Dereference Object Pointer © Microsoft Corporation 2 15
Replacing Fast Referenced Objects // Swap new token for old Old. Token = Ob. Fast. Replace. Object (&Process->Token, New. Token); // Force any threads out of the slow ref path Psp. Lock. Process. Security. Exclusive (Process); Psp. Unlock. Process. Security. Exclusive (Process); © Microsoft Corporation 16
Fast Referencing Internals Object Pointer Object: R Ref. Cnt: R + 1 + N © Microsoft Corporation 17
Rundown Protection • Protects structures that last a long time but are eventually rundown (destroyed) • Small (The size of a pointer) and can be used in pageable data structures • Acquire and release are fast and lock free in the non-rundown case • Rundown protection can be reset but we don’t use that currently © Microsoft Corporation 18
Rundown Protection Example if (Ex. Acquire. Rundown. Protection (&Parent->Rundown. Protect)) { Section. Object = Parent->Section. Object; if (Section. Object != NULL) { Ob. Reference. Object (Section. Object); } Ex. Release. Rundown. Protection (&Parent->Rundown. Protect); } if (Section. Object == NULL) { Status = STATUS_PROCESS_IS_TERMINATING; goto exit_and_deref; } © Microsoft Corporation 19
Rundown Protection Internals Access Count 0 Wait Block Pointer 1 Access Count KEVENT © Microsoft Corporation 20
Spinlocks Spinlock Acquire: A: lock bts jc S: test jz pause jmp dword ptr [Lock. Address], 0 done dword ptr [Lock. Address], 1 A S Spinlock Release: lock and byte ptr [Lock. Address], 0 © Microsoft Corporation 21
Queued Spinlocks To acquire, processor queues to lock At release, lock passes to queued processor Waiting processors spin on local flag Advantages: Reduced coherency traffic FIFO queuing of waiters © Microsoft Corporation 22
Kernel Queued Lock use Dispatcher. Lock Pfn. Lock System. Space. Lock Vacb. Lock Master. Lock Non. Paged. Pool. Lock Io. Cancel. Lock Work. Queue. Lock Io. Vpb. Lock Io. Database. Lock Io. Completion. Lock Ntfs. Struct. Lock Afd. Work. Queue. Lock Bcb. Lock Mm. Non. Paged. Pool. Lock Kept per-processor: PRCB->Lock. Queue[] e. g. for each processor’s control block: Prcb->Lock. Queue[idx. Dispatcher. Lk]. Next = NULL; Prcb->Lock. Queue[idx. Dispatcher. Lk]. Lock = &Ki. Dispatcher. Lk © Microsoft Corporation 23
Queued Spinlocks © Microsoft Corporation 24
Ki. Acquire. Queued. Lock(p. QL) prev = Exch (p. QL->p. Lock, p. QL) if (prev == NULL) p. QL->p. Lock |= LQ_OWN return p. QL->p. Lock |= LQ_WAIT prev->p. Next = p. QL while (p. QL->p. Lock & LQ_WAIT) Ke. Yield. Processor() return © Microsoft Corporation 25
Ki. Release. Queued. Lock(p. QL) p. QL->p. Lock &= ~(LQOWN | LQWAIT) lockval = *p. QL->p. Lock if (lockval == p. QL) lockval = Cmp. Exch(p. QL->p. Lock, NULL, p. QL) if (lockval == p. QL) return while (! (p. Waiter = p. QL->p. Next)) Ke. Yield. Processor() p. Waiter->p. Lock ^= (LQOWN|LQWAIT) p. QL->p. Next = NULL return © Microsoft Corporation 26
Queued. Lock examples Action spinlk QL-P 0 QL-P 1 QL-P 2 Initial null- null, - A-P 0 QL 0 null-own null- null, - A-P 1 A-P 2 R-P 0 R-P 1 R-P 2 QL 1 QL 2 null QL 1 -own nullnull-wait QL 2 -own null- null, null-wait null-own null- © Microsoft Corporation 27
Inter. Processor Interrupts (IPIs) Synchronously execute a function on every processor Ke. Ipi. Generic. Call(fcn, arg) old. Irql = Raise. Irql(SYNCH_LEVEL) Acquire(Ki. Reverse. Stall. Ipi. Lock) Count = Ke. Number. Of. Processors() Ki. Ipi. Send. Packet(targetprocs, fcn, arg, &count) while (count != 1) Ke. Yield. Processor() Raise. Irql(IPI_LEVEL) Count = 0 // all processors will now proceed fcn(arg) Ki. Ipi. Stall. On. Packet. Targets(targetprocs) Release(Ki. Reverse. Stall. Ipi. Lock) Lower. Irql(Old. Irql) © Microsoft Corporation 28
Ki. Ipi. Send. Packet me = PCR->PRCB; me->Pb. Target. Set = targetset me->Pb. Worker. Routine = fcn me->PPb. Current. Packet = arg for each p in targetset them = Ki. Processor. Block[p]->PRCB while (CMPEXCH(them->Pb. Signal. Done, me, 0)) YIELD Hal. Request. Ipi(p) return // the IPI service routine will invoke Ki. Ipi. Generica. Call. Target // on each processor with fcn and arg as parameters © Microsoft Corporation 29
Ki. Ipi. Generic. Call. Target Interlocked. Decrement(Count) while (count > 0) Ke. Yield. Processor() fcn(arg) Ki. Ipi. Signal. Packet. Done() return © Microsoft Corporation 30
Interlocked Sequenced Lists Singly linked lists Header contains depth and a sequence no. Allows for lock-free pushes/pops Used primarily in memory/heap management © Microsoft Corporation 31
Push. Entry. SList(slh, s) do { SLIST_HEAD nslh, oslh->seqdepth = slh->seqdepth oslh->next = slh->next nslh->seqdepth = oslh->seqdepth + 0 x 10001 nslh->next = s s->next = oslh->next } until Cmp. Exch 64(slh, nslh, oslh) succeeds return © Microsoft Corporation 32
s = Pop. Entry. SList(slh) do { SLIST_HEAD nslh, oslh, *s oslh->seqdepth = slh->seqdepth s = oslh->next = slh->next if (!s) return NULL nslh->seqdepth = oslh->seqdepth – 1 //depth-nslh->next = s->next // can fault! } until Cmp. Exch 64(slh, nslh, oslh) succeeds return s © Microsoft Corporation 33
Pop. Entry. SList faults s->next may cause a fault: top entry allocated on another processor top entry freed between before s referenced Access fault code special cases this: the faulting s->next reference is skipped the compare/exchange fails the pop is retried © Microsoft Corporation 34
DISPATCHER_HEADER Fundamental kernel synchronization mechanism Equivalent to a KEVENT © Microsoft Corporation 35
WAIT fields KTHREAD WAIT fields • • • LONG_PTR Wait. Status; PRKWAIT_BLOCK Wait. Block. List; BOOLEAN Alertable, Wait. Next; UCHAR Wait. Reason; LIST_ENTRY Wait. List. Entry; // Prcb->Wait. List. Head KWAIT_BLOCK Wait. Block[4]; // 0, 1, event/sem, timer KWAIT_BLOCK – represents waiting thread • • • LIST_ENTRY Wait. List. Entry; // Object->Wait. List. Head PKTHREAD Thread; PVOID Object; PKWAIT_BLOCK Next. Wait. Block; // Thread->Wait. Block. List USHORT Wait. Key, Wait. Type; © Microsoft Corporation 36
© Microsoft Corporation 37
WAIT notes KPRCB Wait. List used by balance manager for swapping out stacks Several Wait. Blocks available in thread itself to save allocation time Waitable objects all begin with standard dispatcher header © Microsoft Corporation 38
Ke. Set. Event (KEVENT Event) // while holding Dispatcher. Database. Lock old = Event->Signal. State = 1 if old == 0 && !Empty(Event->Wait. List) if Event->Type == Notification wakeup every thread in the Wait. List else // Synchronization type wakeup one thread in the Wait. List © Microsoft Corporation 39
Kernel Queue Object (KQUEUE) Mechanism for thread work queues Used by worker threads and I/O completion ports KQUEUE Operations: Ke. Initialize. Queue(p. Q, Max. Threads) Ke. Insert. Queue(p. Q, p. Entry) Ke. Insert. Head. Queue(p. Q, p. Entry) Ke. Remove. Queue(p. Q, Wait. Mode, Timeout) Ke. Rundown. Queue(p. Q) © Microsoft Corporation 40
KQUEUE Each _KTHREAD structure contains a field KQUEUE Queue for use with the kernel queue mechanism © Microsoft Corporation 41
Ki. Insert. Queue(p. Q, p. Entry, b. Head) // implements both Ke. Insert routines // called holding the Dispatcher. Database. Lock old. State = p. Q->Header. Signal. State if p. Q->Header points to a waiting thread && Current. Threads < Maximum. Threads && the current thread isn’t waiting on this queue Call Ki. Ready. Thread(last. Thread. Queued) else p. Q->Header. Signal. State++ Insert p. Entry in p. Q->Entry. List. Head according to b. Head return old. State © Microsoft Corporation 42
entry = Ke. Remove. Queue(p. Q) // holding the Dispatcher. Database. Lock if (old. Q = thread->Queue) != p. Q if (old. Q) remove thread from old. Q try to unwait a thread from old. Q->Header’s waitlist insert thread in p. Q->Thread. List. Head else p. Q->Current. Count -while (1) if the queue isn’t empty and Current < Max remove Entry from head of Entry. List. Head start a waiting thread return Entry else © Microsoft queue current thread to. Corporation p. Q->Header 43
Guarded Mutexes © Microsoft Corporation 44
Guarded Mutexes Ke. Acquire. Guarded. Mutex() Ke. Enter. Guarded. Region. Thread(Thread) if (Interlocked. Decrement (&Mutex->Count) != 0) Mutex->Contention += 1 Ke. Wait. For. Single. Object(&Mutex->Event) Mutex->Owner = Thread Ke. Release. Guarded. Mutex() Mutex->Owner = NULL if (Interlocked. Increment (&Mutex->Count) <= 0) Ke. Set. Event. Boost. Priority(&Mutex->Event) Ke. Leave. Guarded. Region. Thread(Thread) © Microsoft Corporation 45
KMUTANT Exposed to usermode via Create. Mutex Differences from KEVENT: Checks ownership and can be abandoned © Microsoft Corporation 46
KSEMAPHORE © Microsoft Corporation 47
KEVENTPAIRS Ki. Set. Server. Wait. Client. Event() • sets the specified server event • waits on specified client event • wait is performed such that an optimal switch to the waiting thread occurs if possible © Microsoft Corporation 48
ERESOURCE kernel reader/writer lock © Microsoft Corporation 49
User-mode Critical Sections Primary Win 32 locking primitive Backed by kernel semaphore as-needed i. e. in non-contended case no semaphore! Because semaphore allocated on-demand acquiring/releasing critical section can fail fixed in Windows Server 2003 under low memory replaces with keyed event © Microsoft Corporation 50
CRITICAL_SECTION © Microsoft Corporation 51
CRITICAL_SECTION_DEBUG © Microsoft Corporation 52
Keyed Events • These are a way to solve the raising Enter/Leave. Critical. Section problem • Require no storage allocation so they can’t fail • Reuse existing LPC structures so don’t require additions to the thread • Don’t prevent the kernel stack from being swapped like push locks etc do © Microsoft Corporation 53
Keyed Event Internals Keyed Event Only one of these in the entire system Thread Wait Key Wait key is the Critical section address © Microsoft Corporation 54
Summary Pushlocks Fastref Rundown protection Spinlocks Queued spinlocks IPI SLISTs DISPATCHER_HEADER KQUEUEs KEVENTs Guarded mutexes Mutants Semaphores Event. Pairs ERESOURCEs Critical Sections © Microsoft Corporation 55
Discussion © Microsoft Corporation 56
- Slides: 56