Parallel and Concurrent Realtime Garbage Collection Part III
Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T. J. Watson Research Center 0
Part 2: Trace (aka Mark) • Initiation – • • – – – Active Finalizer scan Class scan Thread scan** – Debugger, JNI, Class Loader scan • Trace Master (Trace*) (Trace Terminate***) Flip – Move Available Lists to Full List* (contention) – Flush Per-thread Allocation Pages** • • • Finalizable Processing • switch allocation color to white switch to temp full list Sweep* Switch to regular Full List** Move Temp Full List to regular Full List* (contention) Completion – – 1 turn write barrier off Sweeping – – – Re-materialization 2 – Trace Master (Trace*) (Trace Terminate***) Class Unloading • Re-Trace 1 – – – • Weak/Soft/Phantom Reference List Transfer Weak Reference clearing** (snapshot) Monitor Table clearing JNI Weak Global clearing Debugger Reference clearing JVMTI Table clearing Phantom Reference clearing Re-Trace 2 – – Trace* Trace Terminate*** Re-materialization 1 – – • • Trace – – • switch to single barrier, color to black Clearing – – – turn double barrier on Root Scan • • • Setup Finalizer Wakeup Class Unloading Flush Clearable Compaction** Book-keeping * Parallel ** Callback *** Single actor symmetric
Let’s Assume a Stack Snapshot Y X p a b T a b r W a b Z a b Stack 2 U a b
Yuasa Algorithm Review: 2(a): Copy Over-written Pointers Y X p a b T a b r W a b Z a b Stack 3 U a b
Yuasa Algorithm Review: 2(b): Trace Y X p a b W a b T a b U a b Z a b Stack * Color is per-object mark bit 4
Yuasa Algorithm Review: 2(c): Allocate “Black” Y X a b p s Stack a b V a b W a b T a b Z a b 5 U a b
Non-monotonicity in Tracing 1 GB 6
Which Design Pattern is This? Shared Monotonic Work Pool pool of work Per-Thread State Update 7
Trace is Non-Monotonic… and requires thread-local data GC Master Thread GC Worker Threads Application Threads pool of non-monotonic work 8
Basic Solution • Check if there are more work packets – If some found, trace is not done yet – If none found, “probably done” • • • Pause all threads Re-scan for non-empty buffers Resume all threads If none, done Otherwise, try again later 9
Ragged Barriers: How to Stop without Stopping Epoch 1 Epoch 2 Epoch 3 Epoch 4 A B C 10 - Local Epoch - Global Min - Global Max
“Trace” Phase 11
The Thread’s Full Monty thread list pthread writebuf 16 Barrier. On true pthread_t 64 Thread 1 256 sweep next epoch 43 43 color dblb in. VM Phase. Info trace pthread 3 phase workers Epoch 35 agreed pthread_t writebuf 16 64 256 43 newest Thread 2 sweep next 12 epoch 37 color dblb in. VM 41
Work Packet Data Structures wbuf. epoch < Epoch. agreed WBuf. Epoch 39 WBuf. Count 37 39 totrace wbuf trace 13 wbuf fill free
trace() { thread->epoch = Epoch. newest; bool can. Terminate = true; if (WBuf. Count > 0) get. Write. Buffers(); can. Terminate = false; while (b = wbuf-trace. pop()) if (! more. Time()) return; int trace. Count = trace. Buffer. Contents(b); can. Terminate &= (trace. Count == 0); while (b = totrace. pop()) if (! more. Time()) return; int Trace. Count = trace. Buffer. Contents(b); can. Terminate &= (trace. Count == 0); if (can. Terminate) trace. Terminate(); } 14
Getting Write Buffer Roots get. Write. Buffers() { thread->epoch = fetch. And. Add(Epoch. newest, 1); WBuf. Epoch = thread->epoch; // mutators will dump wbufs LOCK(wbuf-fill); LOCK(wbuf-trace); for each (wbuf in wbuf-fill) if (wbuf. epoch < Epoch. agreed) remove wbuf from wbuf-fill; add wbuf to wbuf-trace; UNLOCK(wbuf-trace); UNLOCK(wbuf-fill); } 15
Write Barrier write. Barrier(Object object, Field field, Object new) { if (Barrier. On) Object old = object[field]; if (old != null && ! old. marked) out. Of. Line. Barrier(old); if (thread->dblb) // double barrier out. Of. Line. Barrier(new); } 16
Write Barrier Slow Path out. Of. Line. Barrier(Object obj) { if (obj == null || obj. marked) return; obj. marked = true; bool epoch. OK = thread->wbuf->epoch == WBuf. Epoch; bool have. Room = thread->wbuf->data < thread->wbuf->end; if (! (epoch. OK && enough. Space)) thread->wbuf = flush. WBuf. And. Alloc. New(thread->wbuf); // Updates WBuf. Epoch, Epoch. newest *thread->wbuf->data++ = obj; } 17
“Trace Terminate” Phase 18
Trace Termination WBuf. Epoch 39 WBuf. Count 0 totrace wbuf trace 19 wbuf fill free
Asynchronous Agreement Barrier. On true Phase. Info trace terminate 1 phase workers Epoch 37 agreed 42 newest desired. Epoch = Epooch. newest; … WAIT FOR Epoch. agreed == desired. Epoch if (WBuf. Count == 0) DONE else RESUME TRACING 20
Ragged Barrier bool ragged. Barrier(desired. Epoch, urgent) { if (Epoch. agreed >= desired. Epoch) return true; LOCK(threadlist); int latest = MAXINT; for each (Thread thread in threadlist) latest = min(latest, thread. epoch); Epoch. agreed = latest; UNLOCK(threadlist); if (epoch. agreed >= desired. Epoch) return true; else do. Callbacks(RAGGED_BARRIER, true, urgent); return false; } * Non-locking implementation? 21
Part 1: Scan Roots • Initiation – • • – – – Active Finalizer scan Class scan Thread scan** – Debugger, JNI, Class Loader scan • Trace Master (Trace*) (Trace Terminate***) Flip – Move Available Lists to Full List* (contention) – Flush Per-thread Allocation Pages** • • • Finalizable Processing • switch allocation color to white switch to temp full list Sweep* Switch to regular Full List** Move Temp Full List to regular Full List* (contention) Completion – – 22 turn write barrier off Sweeping – – – Re-materialization 2 – Trace Master (Trace*) (Trace Terminate***) Class Unloading • Re-Trace 1 – – – • Weak/Soft/Phantom Reference List Transfer Weak Reference clearing** (snapshot) Monitor Table clearing JNI Weak Global clearing Debugger Reference clearing JVMTI Table clearing Phantom Reference clearing Re-Trace 2 – – Trace* Trace Terminate*** Re-materialization 1 – – • • Trace – – • switch to single barrier, color to black Clearing – – – turn double barrier on Root Scan • • • Setup Finalizer Wakeup Class Unloading Flush Clearable Compaction** Book-keeping * Parallel ** Callback *** Single actor symmetric
Fuzzy Snapshot • Finally, we assume no magic • Initiate Collection 23
“Initiate Collection” Phase 24
Initiate: Color Black, Double Barrier thread list pthread writebuf 16 Barrier. On true pthread_t 64 Thread 1 256 sweep next epoch 43 43 color dblb in. VM Phase. Info initiate pthread 3 phase workers Epoch 37 agreed pthread_t writebuf 16 64 256 42 newest Thread 2 sweep next 25 epoch 37 color dblb in. VM 41
What is a Double Barrier? Store both Old and New Pointers Y X p a b T a b r W a b Z a b Stack 26 U a b
Why Double Barrier? T 2: m. b = n T 3: j. b = k T 1: q = p. b (writes X. b = W) (writes X. b = V) (reads X. b: V, W, or Z? ? ) m j X p a b q W Z T 1 a b V a b Stack “Snapshot” = { V, W, X, Z } 27 n a b k T 2 T 3 Stack
Yuasa (Single) Barrier with 2 Writers T 2: m. b = n T 3: j. b = k T 2 (X. b = W) (X. b = V) T 3 m j X p a b q W Z T 1 a b V a b Stack 28 n a b k T 2 T 3 Stack
Yuasa Barrier Lost Update T 2: m. b = n T 3: j. b = k T 2 (X. b = W) (X. b = V) T 3 m j X p a b q W Z T 1 a b V a b Stack 29 n a b k T 2 T 3 Stack
T 1: Scan Stack T 2: m. b = n (X. b = W) T 3: j. b = k (X. b = V) T 1: q = p. b (q <- W) T 2: n = null T 2: Scan Stack Hosed! T 1 T 2 T 3 m j X p a b W a b q n Z T 1 a b k V a b Stack 30 T 2 T 3 Stack
“Thread Stack Scan” Phase 31
Scan Stacks (double barrier off) thread list pthread writebuf 16 Barrier. On true pthread_t 64 Thread 1 256 sweep next epoch 43 43 color dblb in. VM Scan Stack 1 Phase. Info initiate pthread 3 phase workers Epoch 37 agreed pthread_t writebuf 16 64 256 42 newest Thread 2 sweep next 32 epoch 37 41 color dblb in. VM Scan Stack 2
All Done! 33
Boosting: Ensuring Progress Boost to master priority GC Master Thread GC Worker Threads Application Threads (may do GC work) Revert(? ) 34
Part 4: Defragmentation • Initiation – • • – – – Active Finalizer scan Class scan Thread scan** – Debugger, JNI, Class Loader scan • Trace* Trace Terminate*** Re-materialization 1 – – Weak/Soft/Phantom Reference List Transfer Weak Reference clearing** (snapshot) • Monitor Table clearing JNI Weak Global clearing Debugger Reference clearing JVMTI Table clearing Phantom Reference clearing Re-Trace 2 – – Trace – – • switch to single barrier, color to black Clearing – – – turn double barrier on Root Scan • • • Setup Trace Master (Trace*) (Trace Terminate***) Class Unloading Flip – Move Available Lists to Full List* (contention) – Flush Per-thread Allocation Pages** • • Re-Trace 1 – – – • Trace Master (Trace*) (Trace Terminate***) • • • – Finalizable Processing switch allocation color to white switch to temp full list Sweeping – – – Re-materialization 2 turn write barrier off Sweep* Switch to regular Full List** Move Temp Full List to regular Full List* (contention) • Defragmentation • Completion – – 35 Finalizer Wakeup Class Unloading Flush Clearable Compaction** Book-keeping * Parallel ** Callback *** Single actor symmetric
16 64 256 alloc’d ? free 16 free 64 free 256 36 free sweep
Two-way Communication GC Master Thread GC Worker Threads Application Threads (may do GC work) pointers have changed objects have moved 37
Defragmentation B a b C a b T T’ a b X a b 38 D a b E a b U a b Z a b
Staccato Algorithm create store FREE NORMAL allocate nop T access load T a b reap access (abort) store cas MOVED access load T a b cas COPYING T’ a b defragment commit cas 39 T T’ a b
Scheduling 40
Guaranteeing Real Time • Guaranteeing usability without realtime: – Must know maximum live memory • If fragmentation & metadata overhead bounded • We also require: – Maximum allocation rate (MB/s) • How does the user figure this out? ? ? – Very simple programming style – Empirical measurement – (Research) Static analysis 41
Conclusions • Systems are made of concurrent components • Basic building blocks: – – – Locks Try-locks Compare-and-Swap Non-locking stacks, lists, … Monotonic phases Logical clocks and asynchronous agreement • Encapsulate so others won’t suffer! 42
http: //www. research. ibm. com/metronome https: //sourceforge. net/projects/tuningforkvp 43
GC Phases … … APP APP APP APP APP APP APP APP … … APP SNAPSHOT TRACE FLIP SWEEP TERMINATE Legend 44
- Slides: 45