Reducing Pause Time of Conservative Collectors Toshio Endo

  • Slides: 28
Download presentation
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura

Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)

Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93] l Target: Multimedia,

Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93] l Target: Multimedia, game etc. – l l Collection tasks are divided into small pieces Success: Pauses of <5 ms [Cheng 01] – l Pauses should be <10 ms They assume compiler cooperation Reduction of pause for ‘conservative’ GCs is insufficient

Conservative GC [Boehm et al. 88] l l Mark sweep GC for C/C++ programs

Conservative GC [Boehm et al. 88] l l Mark sweep GC for C/C++ programs No compiler cooperation (e. g. , write barriers) Mostly parallel GC [Boehm et al. 91] l Incremental, conservative l Pauses >100 ms fairly common

Write barriers in conservative GCs l No fine-grain write barrier by compiler VM’s write

Write barriers in conservative GCs l No fine-grain write barrier by compiler VM’s write protection Coarse grain – Page level – Detect only first update after protection Restrict design

Incremental mark sweep algorithms l Snapshot at beginning&DLG [Yuasa 90] [Doligez 93] – –

Incremental mark sweep algorithms l Snapshot at beginning&DLG [Yuasa 90] [Doligez 93] – – – l Make (conceptual) heap snapshot before marking Promise short pause Large space overhead with VM write barrier Incremental update [Steele 75] [Dijkstra 78] – Maintain consistency after marking Need final marking before finish Unlimitedly long! Only choice With VM

Contributions l l Analyze why previous algorithms fail Propose techniques to bound pauses &

Contributions l l Analyze why previous algorithms fail Propose techniques to bound pauses & guarantee progress Show a `stress-test’ benchmark: iukiller Demonstrate experimental results – – < 5 ms in applications < 12 ms in the stress-test benchmark (constant across all heap sizes) (This talk omits parallel issues)

Overview of presentation l l l Mostly parallel GC Techniques to reduce pause time

Overview of presentation l l l Mostly parallel GC Techniques to reduce pause time Experimental results Related work Summary

Mostly parallel garbage collector (1) Start GC Write-protect heap Incremental mark write fault Trap

Mostly parallel garbage collector (1) Start GC Write-protect heap Incremental mark write fault Trap handler Remember dirty User (=updated) pages addr. Unprotect Final marking Incremental sweep End GC User

Mostly parallel garbage collector (2) l Second update is un-trapped – Mark r in

Mostly parallel garbage collector (2) l Second update is un-trapped – Mark r in final phase p q p r write q r Need final marking

Final marking 1. 2. Scan all dirty pages + root Mark all unmarked objects

Final marking 1. 2. Scan all dirty pages + root Mark all unmarked objects from scanned region The amount of work is unbounded l # of dirty pages l Objects reachable from a dirty page Makes pauses >100 ms root heap

Overview of presentation l l l Mostly parallel garbage collector Techniques to reduce pause

Overview of presentation l l l Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary

Goal of our collector l Bound pause time (< constant) – l Mutator utilization

Goal of our collector l Bound pause time (< constant) – l Mutator utilization is important, but focus on pause Guarantee progress of collection Combine two techniques: l Bound dirty pages (BD) l Retry incremental marking (RI)

Bounding dirty pages (1) l Basic collector produces many dirty pages l Keep #

Bounding dirty pages (1) l Basic collector produces many dirty pages l Keep # of dirty pages < a given limit – – If exceeds limit, choose a dirty page Re-protect, scan, clean it Good: Reduce task in final marking Bad: More protection cost

Bounding dirty pages (2) l l Is pause now bounded? … No! Unmarked objects

Bounding dirty pages (2) l l Is pause now bounded? … No! Unmarked objects reachable from a dirty page are not bounded root heap

Retrying incremental marking (1) Keep works of final marking < a given limit Start

Retrying incremental marking (1) Keep works of final marking < a given limit Start GC Write-protect heap Incremental mark No. Retry! User Final marking Finished before limit? Yes. Incremental sweep End GC User Trap handler

Retrying incremental marking (2) l l Good: Bound length of single final marking Bad:

Retrying incremental marking (2) l l Good: Bound length of single final marking Bad: Risk of starvation (no progress) – – Final marking may abort before finishing scanning (unbounded) dirty pages Unmarked objects may ‘escape’ from collector

The worst case l Abort a final marking with no progress Incr. Final aborts

The worst case l Abort a final marking with no progress Incr. Final aborts write Incr. finishes Incr. write Incr. finishes Final aborts

Ensuring bounded pause and progress l l Either is insufficient… Need two techniques: –

Ensuring bounded pause and progress l l Either is insufficient… Need two techniques: – – l Bounding dirty pages (BD) Retrying incremental marking (RI) BD Every final marking can scan all dirty pages It finds some unmarked objects, if any

Overview of presentation l l l Mostly parallel garbage collector Techniques to reduce pause

Overview of presentation l l l Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary

Experimental Environments l l 400 MHz Ultra. SPARC, Solaris 8 Four GCs – –

Experimental Environments l l 400 MHz Ultra. SPARC, Solaris 8 Four GCs – – Stop: Stop-the-world GC Basic: Basic incremental GC BD: Use bounding dirty pages BD+R: Use bounding dirty pages + retrying incremental marking Basic/BD/BD+R: GC starts when heap usage > 75% BD/BD+R: # of dirty pages < 16

The iukiller synthetic benchmark root repeat large binary trees ‘Stress-test’ benchmark for mostly parallel

The iukiller synthetic benchmark root repeat large binary trees ‘Stress-test’ benchmark for mostly parallel GC l Trees tend to escape from collector Final marking tends to be long

Results of iukiller benchmark: the maximum pause time l Previous collectors fail – –

Results of iukiller benchmark: the maximum pause time l Previous collectors fail – – l > 1. 8 seconds The larger the heap, the longer BD+R achieves <12 ms pause – independent from heap size

Application benchmarks l Programs written in C/C++ – – – deltablue: an incremental constraint

Application benchmarks l Programs written in C/C++ – – – deltablue: an incremental constraint solver (25 MB) espresso: a logic optimizer for PLA (10 MB) N-Body: an N-Body solver with Barnes-Hut (15 MB) CKY: a context free grammar parser (40 MB) Cube: a Rubik’s cube puzzle solver (8 MB)

Results of application benchmarks: the maximum pause time 215 ms BD+R achieves <5 ms

Results of application benchmarks: the maximum pause time 215 ms BD+R achieves <5 ms pause in five applications BD is also OK (< 16 ms) 283 ms

Results of application benchmarks: overhead Total execution times (‘Stop’=1) BD/BD+R is <9% slower than

Results of application benchmarks: overhead Total execution times (‘Stop’=1) BD/BD+R is <9% slower than Basic – More protection All incr. GCs are 1— 53% slower than Stop – – – VM write barrier Floating garbage More GC cycles

Related work l [Appel et al. 88] – l [Furuso et al. 91] –

Related work l [Appel et al. 88] – l [Furuso et al. 91] – l Snapshot-at-beginning on VM. Large space overhead Recent version of [Boehm et al. 91] – l Copy GC with VM read barrier. Slower than write barrier Time limit on final marking. Risks of starvation [Printezis et al. 00] [Ossia et al. 02] – Keep # of dirty cards small. Final marking is still unbounded

Summary An incremental conservative GC l Short pause (<5 ms in 5 applications) l

Summary An incremental conservative GC l Short pause (<5 ms in 5 applications) l GC progress Use both techniques: – – Bounding dirty pages Retrying incremental marking

Future direction l Reducing overhead of BD – l Strategy for proper limit for

Future direction l Reducing overhead of BD – l Strategy for proper limit for dirty pages Bounding roots to be scanned – Protect stacks partially