Reducing Pause Time of Conservative Collectors Toshio Endo

Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93] l Target: Multimedia,

Conservative GC [Boehm et al. 88] l l Mark sweep GC for C/C++ programs

Write barriers in conservative GCs l No fine-grain write barrier by compiler VM’s write

Incremental mark sweep algorithms l Snapshot at beginning&DLG [Yuasa 90] [Doligez 93] – –

Contributions l l Analyze why previous algorithms fail Propose techniques to bound pauses &

Overview of presentation l l l Mostly parallel GC Techniques to reduce pause time

Mostly parallel garbage collector (1) Start GC Write-protect heap Incremental mark write fault Trap

Mostly parallel garbage collector (2) l Second update is un-trapped – Mark r in

Final marking 1. 2. Scan all dirty pages + root Mark all unmarked objects

Overview of presentation l l l Mostly parallel garbage collector Techniques to reduce pause

Goal of our collector l Bound pause time (< constant) – l Mutator utilization

Bounding dirty pages (1) l Basic collector produces many dirty pages l Keep #

Bounding dirty pages (2) l l Is pause now bounded? … No! Unmarked objects

Retrying incremental marking (1) Keep works of final marking < a given limit Start

Retrying incremental marking (2) l l Good: Bound length of single final marking Bad:

The worst case l Abort a final marking with no progress Incr. Final aborts

Ensuring bounded pause and progress l l Either is insufficient… Need two techniques: –

Experimental Environments l l 400 MHz Ultra. SPARC, Solaris 8 Four GCs – –

The iukiller synthetic benchmark root repeat large binary trees ‘Stress-test’ benchmark for mostly parallel

Results of iukiller benchmark: the maximum pause time l Previous collectors fail – –

Application benchmarks l Programs written in C/C++ – – – deltablue: an incremental constraint

Results of application benchmarks: the maximum pause time 215 ms BD+R achieves <5 ms

Results of application benchmarks: overhead Total execution times (‘Stop’=1) BD/BD+R is <9% slower than

Related work l [Appel et al. 88] – l [Furuso et al. 91] –

Summary An incremental conservative GC l Short pause (<5 ms in 5 applications) l

Future direction l Reducing overhead of BD – l Strategy for proper limit for

Slides: 28

Download presentation

Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)

Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93] l Target: Multimedia, game etc. – l l Collection tasks are divided into small pieces Success: Pauses of <5 ms [Cheng 01] – l Pauses should be <10 ms They assume compiler cooperation Reduction of pause for ‘conservative’ GCs is insufficient

Conservative GC [Boehm et al. 88] l l Mark sweep GC for C/C++ programs No compiler cooperation (e. g. , write barriers) Mostly parallel GC [Boehm et al. 91] l Incremental, conservative l Pauses >100 ms fairly common

Write barriers in conservative GCs l No fine-grain write barrier by compiler VM’s write protection Coarse grain – Page level – Detect only first update after protection Restrict design

Incremental mark sweep algorithms l Snapshot at beginning&DLG [Yuasa 90] [Doligez 93] – – – l Make (conceptual) heap snapshot before marking Promise short pause Large space overhead with VM write barrier Incremental update [Steele 75] [Dijkstra 78] – Maintain consistency after marking Need final marking before finish Unlimitedly long! Only choice With VM

Contributions l l Analyze why previous algorithms fail Propose techniques to bound pauses & guarantee progress Show a `stress-test’ benchmark: iukiller Demonstrate experimental results – – < 5 ms in applications < 12 ms in the stress-test benchmark (constant across all heap sizes) (This talk omits parallel issues)

Overview of presentation l l l Mostly parallel GC Techniques to reduce pause time Experimental results Related work Summary

Mostly parallel garbage collector (1) Start GC Write-protect heap Incremental mark write fault Trap handler Remember dirty User (=updated) pages addr. Unprotect Final marking Incremental sweep End GC User

Mostly parallel garbage collector (2) l Second update is un-trapped – Mark r in final phase p q p r write q r Need final marking

Final marking 1. 2. Scan all dirty pages + root Mark all unmarked objects from scanned region The amount of work is unbounded l # of dirty pages l Objects reachable from a dirty page Makes pauses >100 ms root heap

Overview of presentation l l l Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary

Goal of our collector l Bound pause time (< constant) – l Mutator utilization is important, but focus on pause Guarantee progress of collection Combine two techniques: l Bound dirty pages (BD) l Retry incremental marking (RI)

Bounding dirty pages (1) l Basic collector produces many dirty pages l Keep # of dirty pages < a given limit – – If exceeds limit, choose a dirty page Re-protect, scan, clean it Good: Reduce task in final marking Bad: More protection cost

Bounding dirty pages (2) l l Is pause now bounded? … No! Unmarked objects reachable from a dirty page are not bounded root heap

Retrying incremental marking (1) Keep works of final marking < a given limit Start GC Write-protect heap Incremental mark No. Retry! User Final marking Finished before limit? Yes. Incremental sweep End GC User Trap handler

Retrying incremental marking (2) l l Good: Bound length of single final marking Bad: Risk of starvation (no progress) – – Final marking may abort before finishing scanning (unbounded) dirty pages Unmarked objects may ‘escape’ from collector

The worst case l Abort a final marking with no progress Incr. Final aborts write Incr. finishes Incr. write Incr. finishes Final aborts

Ensuring bounded pause and progress l l Either is insufficient… Need two techniques: – – l Bounding dirty pages (BD) Retrying incremental marking (RI) BD Every final marking can scan all dirty pages It finds some unmarked objects, if any

Overview of presentation l l l Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary

Experimental Environments l l 400 MHz Ultra. SPARC, Solaris 8 Four GCs – – Stop: Stop-the-world GC Basic: Basic incremental GC BD: Use bounding dirty pages BD+R: Use bounding dirty pages + retrying incremental marking Basic/BD/BD+R: GC starts when heap usage > 75% BD/BD+R: # of dirty pages < 16

The iukiller synthetic benchmark root repeat large binary trees ‘Stress-test’ benchmark for mostly parallel GC l Trees tend to escape from collector Final marking tends to be long

Results of iukiller benchmark: the maximum pause time l Previous collectors fail – – l > 1. 8 seconds The larger the heap, the longer BD+R achieves <12 ms pause – independent from heap size

Application benchmarks l Programs written in C/C++ – – – deltablue: an incremental constraint solver (25 MB) espresso: a logic optimizer for PLA (10 MB) N-Body: an N-Body solver with Barnes-Hut (15 MB) CKY: a context free grammar parser (40 MB) Cube: a Rubik’s cube puzzle solver (8 MB)

Results of application benchmarks: the maximum pause time 215 ms BD+R achieves <5 ms pause in five applications BD is also OK (< 16 ms) 283 ms

Results of application benchmarks: overhead Total execution times (‘Stop’=1) BD/BD+R is <9% slower than Basic – More protection All incr. GCs are 1— 53% slower than Stop – – – VM write barrier Floating garbage More GC cycles

Related work l [Appel et al. 88] – l [Furuso et al. 91] – l Snapshot-at-beginning on VM. Large space overhead Recent version of [Boehm et al. 91] – l Copy GC with VM read barrier. Slower than write barrier Time limit on final marking. Risks of starvation [Printezis et al. 00] [Ossia et al. 02] – Keep # of dirty cards small. Final marking is still unbounded

Summary An incremental conservative GC l Short pause (<5 ms in 5 applications) l GC progress Use both techniques: – – Bounding dirty pages Retrying incremental marking

Future direction l Reducing overhead of BD – l Strategy for proper limit for dirty pages Bounding roots to be scanned – Protect stacks partially