Garbage Collection Introduction and Overview Excerpted from presentation
Garbage Collection Introduction and Overview Excerpted from presentation by Christian Schulte Programming Systems Lab Universität des Saarlandes, Germany schulte@ps. uni-sb. de
Garbage Collection… …is concerned with the automatic reclamation of dynamically allocated memory after its last use by a program
Garbage collection… l Dynamically allocated memory l Last use by a program l Examples for automatic reclamation
Kinds of Memory Allocation static int i; void foo(void) { int j; int* p = (int*) malloc(…); }
Static Allocation static int i; void foo(void) { int j; int* p = (int*) malloc(…); } l l l By compiler (in text area) Available through entire runtime Fixed size
Automatic Allocation static int i; void foo(void) { int j; int* p = (int*) malloc(…); } l l l Upon procedure call (on stack) Available during execution of call Fixed size
Dynamic Allocation static int i; void foo(void) { int j; int* p = (int*) malloc(…); } Dynamically allocated at runtime (on heap) l Available until explicitly deallocated l Dynamically varying size l
Dynamically Allocated Memory Also: heap-allocated memory l Allocation: malloc, new, … l – before first usage l Deallocation: free, delete, dispose, … – after last usage l Needed for – C++, Java: objects – SML: datatypes, procedures – anything that outlives procedure call
Getting it Wrong l Forget to free (memory leak) – program eventually runs out of memory – long running programs: OSs. servers, … l Free to early (dangling pointer) – lucky: illegal access detected by OS – horror: memory reused, in simultaneous use • programs can behave arbitrarily • crashes might happen much later l Estimates of effort – Up to 40%! [Rovner, 1985]
Nodes and Pointers l Node n p – Memory block, cell l Pointer p – Link to node – Node access: *p l Children n children(n) – set of pointers to nodes referred by n
Mutator l Abstraction of program – introduces new nodes with pointer – redirects pointers, creating garbage
Shared Nodes referred to by several pointers l Makes manual deallocation hard l – local decision impossible – respect other pointers to node l Cycles instance of sharing
Last Use by a Program l Question: When is node M not any longer used by program? – Let P be any program not using M – New program sketch: Execute P; Use M; – Hence: M used P terminates – We are doomed: halting problem! l So “last use” undecidable!
Safe Approximation l Decidable and also simple l What means safe? – only unused nodes freed l What means approximation? – some unused nodes might not be freed l Idea – nodes that can be accessed by mutator
Reachable Nodes root l Reachable from root set – processor registers – static variables – automatic variables (stack) l Reachable from reachable nodes
Summary: Reachable Nodes l A node n is reachable, iff – n is element of the root set, or – n is element of children(m) and m is reachable l Reachable node also called “live”
Mark and Sweep l Compute set of reachable nodes l Free nodes known to be not reachable
Reachability: Safe Approximation l Safe – access to not reachable node impossible – depends on language semantics – but C/C++? later… l Approximation – reachable node might never be accessed – programmer must know about this! – have you been aware of this?
Example Garbage Collectors l Mark-Sweep l Others – Mark-Compact – Reference Counting – Copying – see Chapter 1&2 of [Lins&Jones, 96]
The Mark-Sweep Collector l Compute reachable nodes: Mark – tracing garbage collector l Free not reachable nodes: Sweep l Run when out of memory: Allocation l First used with LISP [Mc. Carthy, 1960]
Allocation node* new() { if (free_pool is empty) mark_sweep(); …
Allocation node* new() { if (free_pool is empty) mark_sweep(); return allocate(); }
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); …
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); … all live nodes marked
Recursive Marking void mark(node* n) { if (!is_marked(n)) { set_mark(n); … } }
Recursive Marking void mark(node* n) { if (!is_marked(n)) { set_mark(n); … } } nodes reachable from n marked
Recursive Marking void mark(node* n) { if (!is_marked(n)) { set_mark(n); for (m in children(n)) mark(m); } } i-th recursion: nodes on path with length i marked
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); sweep(); …
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); sweep(); … all nodes on heap live
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); sweep(); … all nodes on heap live and not marked
Eager Sweep void sweep() { node* n = heap_bottom; while (n < heap_top) { … } }
Eager Sweep void sweep() { node* n = heap_bottom; while (n < heap_top) { if (is_marked(n)) clear_mark(n); else free(n); n += sizeof(*n); } }
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); sweep(); if (free_pool is empty) abort(“Memory exhausted”); }
Assumptions l Nodes can be marked l Size of nodes known l Heap contiguous l Memory for recursion available l Child fields known!
Assumptions: Realistic l Nodes can be marked l Size of nodes known l Heap contiguous l Memory for recursion available l Child fields known
Assumptions: Conservative l Nodes can be marked l Size of nodes known l Heap contiguous l Memory for recursion available l Child fields known
Mark-Sweep Properties Covers cycles and sharing l Time depends on l – live nodes (mark) – live and garbage nodes (sweep) l Computation must be stopped – non-interruptible stop/start collector – long pause Nodes remain unchanged (as not moved) l Heap remains fragmented l
Software Engineering Issues l Design goal in SE: • decompose systems • in orthogonal components l Clashes with letting each component do its memory management • liveness is global property • leads to “local leaks” • lacking power of modern gc methods
Typical Cost l Early systems (LISP) up to 40% [Steele, 75] [Gabriel, 85] • “garbage collection is expensive” myth l Well engineered system of today 10% of entire runtime [Wilson, 94]
Areas of Usage l Programming languages and systems – – l Java, C#, Smalltalk, … SML, Lisp, Scheme, Prolog, … Perl, Python, PHP, Java. Script Modula 3, Microsoft. NET Extensions – C, C++ (Conservative) l Other systems – Adobe Photoshop – Unix filesystem – Many others in [Wilson, 1996]
Understanding Garbage Collection: Benefits l Programming garbage collection – programming systems – operating systems l Understand systems with garbage collection (e. g. Java) – memory requirements of programs – performance aspects of programs – interfacing with garbage collection (finalization)
References l Garbage Collection. Richard Jones and Rafael Lins, John Wiley & Sons, 1996. l Uniprocessor garbage collection techniques. Paul R. Wilson, ACM Computing Surveys. To appear. • Extended version of IWMM 92, St. Malo.
- Slides: 42