Distributed Garbage Collection An Overview Presented by Dotan
Distributed Garbage Collection An Overview Presented by Dotan Adler Copyright, 1996 © Dale Carnegie & Associates, Inc.
Representation Outline • • Present the need for Distributed GC Present the Distributed Model Present the problem of Distributed GC Present some outlines of solutions to the problem (one Direct, and two Indirect) • Conclusion 01/11/2021 Distributed Garbage Collection 2
The need for a Distributed GC • Where do we need Distributed GC ? – Distributed applications (DCOM, CORBA, etc. . ) – Internet applications (Java extensions) – Distributed file servers – HTML pages & links 01/11/2021 Distributed Garbage Collection 3
The Distributed Computing Model A Distributed System is a set of Autonomous systems connected by a network. 01/11/2021 Distributed Garbage Collection 4
The Classical GC Problem 01/11/2021 Distributed Garbage Collection 5
Distributed Garbage Collection • Each computer in a distributed system has : – A local store (memory or disk) – A stack – Local running programs (or RPCs) – A GC algorithm (a part of a global GC algorithm) • Each computer has access only to local store • Access to remote store is achieved by message passing 01/11/2021 Distributed Garbage Collection 6
Distributed Garbage Collection 01/11/2021 Distributed Garbage Collection 7
Distributed Garbage Collection • Because there is no global address space, references to remote cells are necessarily indirect 01/11/2021 Distributed Garbage Collection 8
Problems With Distributed GC • Unreliability (Relaxed) – Duplications of messages – Out of order delivery – Communication fail • Latency - the elapsed time between the issue of a task and when it is executed is undeterministic • Synchronization - it takes a lot of resources to synchronize all computers in a network 01/11/2021 Distributed Garbage Collection 9
Direct Approach • Direct approach means to identify the garbage and mark it, so that it could be removed • Reference counting is a direct GC approach since it marks garbage with count equals to 0 01/11/2021 Distributed Garbage Collection 10
Reference Counting • • Every object has a reference count Inc(@a) message increases a’s count Dec(@a) decreases the count “Inc”, and “Dec” messages are sent to the CPU which hosts the object • Deletion occurs when reference equals 0 01/11/2021 Distributed Garbage Collection 11
Reference Counting (Cont’d) • Problem : If A or C were responsible for incrementing b’s count, then latency is an important factor 01/11/2021 Distributed Garbage Collection 12
Reference Counting (Cont’d) • Solution : When A duplicates @b, it first sends an ack-request to B and a copy message to C • B will not accept any request after it gets an ack_req until it sends an ack to C • C will not use @b until it get ack(@b) 01/11/2021 Distributed Garbage Collection 13
Reference Counting (Cont’d) • Problems – Does not destroy inter-site cyclic references – Relies on blocking of operations 01/11/2021 Distributed Garbage Collection 14
Indirect Approach • Indirect approach means to identify all the live objects first, and then reclaim all space not used by “live” objects • Examples of indirect GC algorithms : – Mark-scan – Generation scavenging – Dijkstra’s 3 -color algorithm 01/11/2021 Distributed Garbage Collection 15
Centralized Indirect Solution • Let every site run a GC locally on it’s private memory. RCT = remote cell table 01/11/2021 ERT = exit reference table Distributed Garbage Collection 16
Centralized Indirect Solution (Cont) • Still we have problems with inter-site cycles. • For this we add a centralized service : – Whenever a local site finishes it’s GC it sends a list of RCT to ERT paths, which are inaccessible from the roots to the centralized service. – The centralized service can then find dead inter-site cycles, and report them to the sites. 01/11/2021 Distributed Garbage Collection 17
Cell Migration • We can solve the inter-site cycles problem by migrating cells, so that inter-site cycles become local cycles. • We need to define an order, among the sites, so that we won’t get a cyclic migration. This way a cell migrates only to an inferior site. 01/11/2021 Distributed Garbage Collection 18
Cell Migration • Problems: – Copy operations could be very slow - especially when working with big networks, and big data structures – An order between sites must be set prior to the running of the algorithm 01/11/2021 Distributed Garbage Collection 19
Empirical Results • Little work has been done in the field of DGC because it is hard to compare the execution of two distributed systems. (delay times, synchronization, etc …) • Tests, however, show that Direct methods give better pause times than Indirect methods. 01/11/2021 Distributed Garbage Collection 20
Future Directions of research • Use some hybrid methods (Direct & Indirect) • Try to create a framework for testing Distributed GC algorithms 01/11/2021 Distributed Garbage Collection 21
Conclusions • Distributed GC adds a new dimesion to oldstyle GC. It adds the dimension of asynchronity & latency. • Alltought standard GCs suffer from the same problems (memory leaks, pause time), creating a robust DGC is much more complicated. 01/11/2021 Distributed Garbage Collection 22
The End 01/11/2021 Distributed Garbage Collection 23
- Slides: 23