Directions for Distributed Garbage Collection Microsoft Research Cambridge
Directions for Distributed Garbage Collection Microsoft Research, Cambridge Monday 7 August 2000 Richard Jones Computing Laboratory University of Kent at Canterbury ©Richard Jones, 2000. All rights reserved. © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 1
Outline Motivation Model An ideal GC Why conventional taxonomies are unsatisfactory A new taxonomy What this taxonomy offers Examples DGC in practice Research directions © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 2
Motivation: GC Why GC? • • “Illusion of infinite memory” ? ? A safety net? Language requirement Problem requirement (ownership) Software engineering • Liveness is a global question • Modularity • Abstraction © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 3
Motivation: DGC Arguments above apply • Liveness is now an even harder problem Open systems • Location transparency • Lack of control over components Fault tolerance © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 4
Motivation: this talk Several previous attempts at DGC survey • [abdu 92, abdu 98] – Quite full, little structure or rationale, ? accuracy • [plai 95] – Better structure but incomplete • Lins in [jone 96] – Short on detail Towards a well-structured, complete survey Avoid centralised GC legacy Insight into new areas for research References: http: //www. cs. ukc. ac. uk/people/staff/rej/gcbib. html © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 5
Model and terminology Processes exchange messages Failure model is fail-stop: no Byzantine failures Mutators, local collectors, distributed collectors Liveness by reachability Entry and exit items Local and global roots • Local: roots for the process • Global: entry items which may be reachable from a local root of another process © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 6
Root set Local roots Global roots Remotely reachable © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 7
Properties of an ideal DGC safety: • only garbage should be reclaimed. completeness: • all objects, including components of distributed cycles, that are garbage at the start of a collection cycle should be reclaimed by its end. concurrency: • neither mutator nor local collector processes should be suspended; distinct distributed collection processes should run concurrently. promptness: • garbage should be reclaimed promptly. © Richard Jones, 2000 efficiency: • time and space costs should be minimised. locality: • inter-process communication should be minimised. expediency: • garbage should be reclaimed despite the unavailability of parts of the system. scalability: • the collector should scale to networks of many processes. fault tolerance: • it should be robust against message delay, loss or replication, or process failure. Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 8
Strategy/policy/mechanism Wilson suggests classification by strategy, policy and mechanism [wils 95]. Malloc example: • Strategy: “don’t let small objects prevent reclamation of a larger contiguous area” • Policy: best-fit Most taxonomies are based on mechanisms © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 9
Conventional taxonomy Conventional Region Parallelism organisation mechanisms • • Reference Single counting Mark-sweep Large object area Mark-compact Generational Copying Generations Concurrent LOA Incremental Microsoft Research, 7 August 2000 ns MS MCGarbage Collection Copy Directions for Distributed tio RC ra © Richard Jones, 2000 ne Ge Si A LO gle n Single Sequential 10
Consequences (almost) All direct mechanisms are variants of simple reference counting. All indirect mechanisms are tracing collectors Conventional conclusion: indirect tracing RC cannot reclaim garbage cycles. Conventional conclusion: All complete algorithms are indirect © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 11
What’s the problem? Indirect collectors are better called “live object detectors” They are set-difference algorithms: they must provide an estimate of the set of live objects. Depending on conservatism of this estimate • Not scalable — every site must participate, or • Not complete — assume live if no other information Synchronisation of phases is a bottleneck © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 12
Direct algorithms are inherently scalable. • E. g. simple RC requires cooperation of only 3 objects • Only necessary to visit objects that might be garbage It is always safe for a direct algorithm to ‘give up’ early in discovering garbage • At worst this defers reclamation (e. g. [weiz 69]) © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 13
Barrier technology Appropriateness of barrier technology changes as we move from centralised to distributed systems. Read barriers are conventionally held to be expensive (as reads are much more common than writes). But this overhead is diminished in context of message passing. Combinations of read and write barriers become viable. © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 14
Tricolour abstraction Black • object and its immediate descendants have been visited • GC has finished with black objects and need not visit again. Grey • object has been visited but its components may not have been scanned. • or, for an incremental/concurrent GC, the mutator has rearranged connectivity of the graph. • in either case, the collector must visit them again. White • object is unvisited and, at the end of the phase, garbage. A collection terminates when no grey objects remain, i. e. all live objects have been blackened. © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 15
Barrier technology There are two ways to prevent the mutator from interfering with a collection by writing white pointers into black objects. 1) Ensure the mutator never sees a white object • when mutator attempts to access a white object, the object is visited by the collector • protect white objects with a read-barrier 2) Record where mutator writes black-white pointers • GC can (re)visit modified objects • protect objects with a write-barrier © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 16
Additional goals Necessity for compromises • scalability, fault-tolerance and efficiency may only be achievable at the expense of completeness, • concurrency introduces synchronisation overheads. Lack of empirical data A further goal: • Flexibility — the collector should be configurable, guided by heuristics or hints © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 17
A more appropriate taxonomy A simple, orthogonal taxonomy but captures all proposed DGC algorithms Indirect • Non-tracing • Tracing Direct • Non-tracing • Tracing Note also Louboutin’s Proactive/Reactive taxonomy [loub 98] — more later © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 18
Indirect, non-tracing DC Global-root graph reconstruction • Liskov & Ladin algorithms [lisk 86, ladi 92] • (replicated) Central service + Clients • Client local GC passes Service lists – Acc: all non-resident objects reachable from local roots – Paths: all pairs (g 1, g 2) where g 2 is a remote global root reachable from locally unreachable global root g 1 – Trans: references in transit • Service reconstructs graph of global roots • Periodically Clients query Service asking which of its global roots are no longer globally reachable © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 19
Leases • Provide fault tolerance by preventing leaks in the face of remote process failure • Clients take out a lease on a remote object • Until this lease expires, object is protected from local collector • Java RMI: – Lease default is 10 minutes – Leases renewed every 5 minutes © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 20
Indirect, tracing DGC Classify by the degree of synchronisation required. Centralised: single initiating process • Examples: [huda 82, augu 87, juul 92] Partitioned: a partition of processes cooperates to collect independently of other processes • Example: [lang 92 a] Autonomous: multiple, simultaneous collections • Timestamp propagation: pipelined collections • Examples: [hugh 85, fess 98] © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 21
Augusteijn Process initiates collection (activedisquiet) • Active Disquiet Sends scan request to remote processes for which it holds references received all ACKs On receipt of a scan request, a process • • disquiet: adds request to its work queue, ACKs immediately quiet: processes request (disquiet), ACKs on completion Stable algorithm. • • Only disquiet processes can send requests. Always chain of responsibility from each disquiet process back to active process. Passive Quiet Marking terminates when initiator has received ACK of each scan request sent © Richard Jones, 2000 received mark request received all ACKs Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 Passive Disquiet received mark request 22
Garbage collecting the world 1. Processes negotiate partitions 2. Processes send decrement messages from each exit-item • At end, entry-items with positive counters (hard) are reachable from outside the group; other entry-items are soft. 3. Global mark within the group from 1. local roots and black entry-items propagating black 2. Soft entry-items marking unvisited items soft 4. Detect termination and reclaim soft entry-items © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 23
Timestamp algorithms All global objects contain time-stamps. Time-stamp of new object is local time. Local GC propagates time stamps to remote objects • Time-stamp of remote object is increased if lower than value in message Intuition: time-stamp of garbage never increases Process p has time-stamp redop time-stamp of any live object in this process minredo = min {redop | p processes} Any object with time-stamp redo is garbage. © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 24
Direct, non-tracing DGC Reference Counting • Standard RC algorithm insufficient — race conditions • Simple protocol to avoid premature reclamation [lerm 86] • Weighted RC avoids race: doesn’t send INC messages [beva 87, wats 87] • Diffusion tree algorithms [piqu 91, more 98 a] Reference Listing • Maintain lists of processes holding reference to global root rather than a count • More fault tolerant • Examples: Network Objects [birr 93], SSP chains [shap 92 a] © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 25
Copying • ‘Move’ locally unreachable global objects to site that references them — require an ordering on sites • ‘Move’ may be real or virtual [bish 77, vest 87, shap 90, huds 97] Causal dependency tracking • Analyse mutator’s computation graph directly [sche 89, loub 98] © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 26
Direct, tracing GC Completeness requires tracing. Direct algorithms offer scalability. How can we combine these ideas to produce effective DGCs? Back-tracing • Examples: [fuch 95, mahe 97] Partial tracing • Example: [rodr 98] © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 27
Back-tracing Identify suspects Back-step(o) for some exit-item o Back-step(e) • • X X If e is not a suspect, return Live If e is marked, return Garbage Mark e For each remote object r pointing to e R X if Back-step(r) is Live, return Live • Return Garbage Problem of multiple overlapping traces © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 28
Partial tracing Identify suspects 1. Mark red from these suspects • • Construct ‘red sets’ (akin to ‘client sets’) Dynamically forms a group 2. Scan suspects whose red and client sets differ • • Rescues objects inadvertently marked red — mark them green Run all scans concurrently 3. Reclaim any red objects Group merger scheme permits multiple, overlapping collections © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 29
Mark-red © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 30
Scan © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 31
Benefits Both schemes • Are direct — attempt to trace only garbage • Are scalable — can limit extent of trace (by process, by object, by hop-count, …) • No global synchronisation • Can take advantage of heuristics Partial tracing • Mark-red does not have to synchronise with mutators • Scan synchronisation through read and write barriers, DGC piggy-backs on mutator messages for fault tolerance • [rodr 98] shows how to manage overlapping traces © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 32
What this taxonomy offers Simple and orthogonal approach Offers complete taxonomy Not distracted by legacy of centralised GC Identifies new, scalable approaches © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 33
Louboutin taxonomy Global garbage detection • Proactive – In-situ graph colouring – Global-root graph reconstruction comprehensive • Reactive – Time-stamp packet distribution – IRC – WRC – RL © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 34
In practice Many direct, acyclic schemes Tracing is only needed to recover cycles How do these arise in practice? • Stereotypes – A holds reference to B, and B holds reference to A – E. g. Callbacks – Use a Design Pattern to manage these by explicitly dropping references e. g. Client sends Disconnect; Server drops callback • General patterns – Do these arise? © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 35
Further research Object demographics • Study real applications • How do distributed objects behave? How much of this behaviour is imposed by limits of DGC technology? Frameworks for DGC • Build a framework into which component DGCs could be plugged cf. Sun’s RVM for Java Comparative analysis • How do different DGCs perform against different applications? • Allow developers to pick © Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 36
© Richard Jones, 2000 Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 37
- Slides: 37