An OntheFly Reference Counting Garbage Collector for Java
An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft Corporation ACM Conference on Object Oriented Programming Systems Languages & Applications Tampa, Florida October 18, 2001 On-the-Fly Reference Counting Levanoni & Petrank
Garbage Collection Today • Two classic approaches: – Tracing [Mc. Carthy 1960]: trace reachable objects, reclaim objects not traced. – Reference counting [Collins 1960]: keep reference count for each object, reclaim objects with count 0. • Today’s advanced environments: – multiprocessors – huge memories 2 On-the-Fly Reference Counting Levanoni & Petrank
Motivation for RC • Reference Counting work is proportional to work on creations and modifications. – Can tracing deal with tomorrow’s huge heaps? • Reference counting has good locality. • Tracing rules JVM’s, is it justified? • The Challenge: – RC write barriers seem too expensive. – RC seems impossible to “parallelize”. 3 On-the-Fly Reference Counting Levanoni & Petrank
This work • An improved RC (suitable for Java) – Reduced overhead on write barrier, – Concurrent with low overhead: on-the-fly, no sync. operation in write barrier, multiprocessor. – Thus: low latency, high performance. • Implementation: – JVM: SUN’s Java Virtual Machine 1. 2. 2 – Platform: 4 -way IBM Netfinity 8500 R server with 550 MHz Intel III Xeon and 2 GB memory. 4 On-the-Fly Reference Counting Levanoni & Petrank
Agenda ü Introduction ü Motivation Ø The Algorithm • Related issues • Implementation and Measurements • Conclusions 5 On-the-Fly Reference Counting Levanoni & Petrank
Terminology 6 On-the-Fly Reference Counting Levanoni & Petrank
Basic Reference Counting • Each object has an RC field, new objects get o. RC: =1. • When p that points to o 1 is modified to point to o 2 we do: o 1. RC--, o 2. RC++. • if then o 1. RC==0: – Delete o 1. – Decrement o. RC for all sons of o 1. – Recursively delete objects whose RC is decremented to 0. 7 On-the-Fly Reference Counting Levanoni & Petrank
Basic Reference Counting • Each object has an RC field, new objects get o. RC: =1. • When p that points to o 1 p is modified to point to o 2 we do: o 1. RC--, o 2. RC++. o 1 o 2 • if then o 1. RC==0: – Delete o 1. – Decrement o. RC for all sons of o 1. – Recursively delete objects whose RC is decremented to 0. 8 On-the-Fly Reference Counting Levanoni & Petrank
Deferred Reference Counting • Problem: overhead on updating program variables (locals) costs too much. • Solution [Deutch & Bobrow] : – Don’t update RC for locals. – “Once in a while”: collect all objects with o. RC=0 that are not referenced from local roots. • Deferred RC reduces overhead by 80%. Used in most modern RC systems. 9 On-the-Fly Reference Counting Levanoni & Petrank
Multithreaded RC? • Problem: – Parallel updates confuse counts: Thread 1: Read A. next; A. next C; B. RC- -; C. RC++ C A Thread 2: Read A. next; A. next D; B. RC- -; D. RC++ B D – (And more: Update ref counts in parallel races. ) 10 On-the-Fly Reference Counting Levanoni & Petrank
Multithreaded RC • Problem: – Parallel updates confuse counts. – Update ref counts in parallel races. • [De. Treville]: – Lock heap for each pointer modification. – Thread records its updates in a buffer. – Once in a while (snapshot alike): 11 • GC threads all buffers to update ref counts • Reclaims all objects with 0 rc that are not local. On-the-Fly Reference Counting Levanoni & Petrank
To Summarize… • Overhead on write barrier is considered high. – Even with deferred RC of Deutch & Bobrow. • Using reference counting concurrently with program threads seems to bear high synchronization cost. – Lock or “compare & swap” for each pointer update. 12 On-the-Fly Reference Counting Levanoni & Petrank
Improving RC • Consider a pointer p that takes the following values between GC’s: O 0, O 1, O 2, …, On. • All RC algorithms perform 2 n operations: O 0. RC--; O 1. RC++; O 1. RC--; O 2. RC++; O 2. RC--; … ; On. RC++; • But only two operations are needed: O 0. RC-- and On. RC++ O 0 13 O 1 p O 2 O 3 O 4 On-the-Fly Reference Counting . . . O n Levanoni & Petrank
Improving RC cont’d • Don’t record all pointer modifications. Record first modifications between GC’s (O 0). • During the collection, for each recorded ptr p: – find O 0 by checking the record, – find On by reading the heap during the collection. • Apply only two operations for each such pointer: O 0. RC-- and On. RC++ 14 This reduces numberp of logging & counter updates by a factor of 100 O 0 for. OOn-the-Fly O 2 Reference Obenchmarks! O 4. . . Levanoni On & Petrank 1000 normal 1 3 Counting
Improving Synch. Overhead • Simple solutions bear unacceptable overhead: – De. Treville uses a lock for all pointer modifications – Simple alternatives require 3 compare-andswap’s • Our second contribution: – A carefully designed write barrier (and an observation) allows elimination of all sync. operations from the write barrier. 15 On-the-Fly Reference Counting Levanoni & Petrank
The write barrier Update(Object **slot, Object *new){ Object *old = *slot if (!Is. Dirty(slot)) { log( slot, old ) Observation: Set. Dirty(slot) If two threads: } 1. invoke the write barrier in *slot = new parallel, and 2. both log an old value, } then both record the same old value. 16 On-the-Fly Reference Counting Levanoni & Petrank
Intermediate Algorithm: Snapshot Oriented, Concurrent • Use write barrier with program threads. • To collect: – Stop all threads – Scan roots (locals) – get the buffers with modified slots – Clear all dirty bits. – Resume threads – For each modified slot: • decrease rc for old value (written in buffer), • increase rc for current value (“read heap”), 17 – Reclaim non-local objects with rc 0. On-the-Fly Reference Counting Levanoni & Petrank
The Sliding View Algorithm On-th-Fly • Do all collection as threads run: – – – Read threads buffers (one thread at a time), Clear all dirty bits, Sliding Update reference counts, View Read roots of each thread, one at a time, Reclaim (recursively) objects with rc 0. • Note: rc’s are not correct for any specific point in time, yet, with care, most dead objects may be reclaimed! • Borrow ideas from [Lamport et. Al. ] 18 On-the-Fly Reference Counting Levanoni & Petrank
Cycles Collection • Our solution: use a tracing algorithm infrequently. • Currently this is the most efficient solution. Cycle collectors have high cost. • We propose a new on-the-fly mark & sweep algorithm that works best with the same sliding view. Can also be used “on its own”. 19 On-the-Fly Reference Counting Levanoni & Petrank
Implementation for Java • Based on Sun’s JDK 1. 2. 2 for Windows NT • Main features – 2 -bit RC field per object (á la [Wise et. al. ]) – A supplemental sliding view tracing algorithm – A custom allocator for on-the-fly RC: • Multi leveled fine grained locking • Supports sporadic reclamation of objects • Supports sweeping the heap 20 On-the-Fly Reference Counting Levanoni & Petrank
Performance Measurements • First multiprocessor measurements in a “normal” environment! – • (Previous measured reports assumed one CPU is free for GC all the time. ) Benchmarks: – – 21 Server benchmarks • • SPECjbb 2000 --- simulates business-like transactions in a large firm MTRT --- a multi-threaded ray tracer Client benchmarks • SPECjvm 98 --- a suite of mostly single-threaded client benchmarks On-the-Fly Reference Counting Levanoni & Petrank
Improved RC • How many RC updates are eliminated? Benchmark 22 No of stores No of “first” stored Ratio of “first” stores jbb Compress 71, 011, 357 64, 905 264, 115 51 1/269 1/1273 Db Jack Javac Jess mpegaudio 33, 124, 780 135, 174, 775 22, 042, 028 26, 258, 107 5, 517, 795 30, 696 1, 546 535, 296 27, 333 51 1/1079 1/87435 1/41 1/961 1/108192 On-the-Fly Reference Counting Levanoni & Petrank
SPECjbb Latency (Max Transaction Time) 23 On-the-Fly Reference Counting Levanoni & Petrank
SPECjbb Throughput 24 On-the-Fly Reference Counting Levanoni & Petrank
MTRT Throughput 25 On-the-Fly Reference Counting Levanoni & Petrank
SPECjbb Heap Utilization 26 On-the-Fly Reference Counting Levanoni & Petrank
Client Performance 27 On-the-Fly Reference Counting Levanoni & Petrank
Related Work • On-the-fly tracing: – Dijkstra et. al. (1976), Steele (1976), Lamport (1976), – Kung & Song (1977), Gries (1977) Ben-Ari (1982, 1984), Huelsbergen et. al. (1993, 1998) – Doligez-Gonthier-Leroy (1993 -4), Domani-Kolodner. Petrank (2000) • Concurrent reference counting: – – – 28 – De. Treville (1990), Martinez et. al. (1990), Lins (1992) Plakal & Fischer (2001), Bacon et. al. (2001) On-the-Fly Reference Counting Levanoni & Petrank
Conclusions • A new algorithm for reference counting. – Low overhead on pointer modification – On-the-fly • Implementation for Java • Measurements show high throughput and low latency. • To be out soon: A matching paper on the sliding view tracing collector. 29 On-the-Fly Reference Counting Levanoni & Petrank
- Slides: 29