Chameleon Automatic Selection of Collections Ohad Shacham Tel
- Slides: 24
Chameleon Automatic Selection of Collections Ohad Shacham Tel Aviv University Martin Vechev Eran Yahav IBM T. J. Watson Research Center Presented by: Yingyi Bu
Collections l l Abstract data types Many implementations Different space/time tradeoffs Incompatible selection might lead to l l runtime degradation Space bloat – wasted space Set Hash. Set Linked. Set Array. Set Lazy. Set Map Hash. Map Linked. Map Array. Map Lazy. Map Linked. List Array. List Lazy. List
Collection Bloat l Collection bloat is a non justified space overhead for storing data in collections List s = new Array. List(); s. add(1); 1 Bloat for s is 9
Collection Bloat l Collection-bloat is a serious problem in practice l l Hard to detect and fix l l l Observed to occupy 90% of the heap in real-world applications Accumulation: death by a thousand cuts Correction: Need to correlate bloat to program code How to pick the right implementation? l l Minimize bloat But without degrading running time
Our Vision l Programmer declares the ADT to be used l Set s = new Set(); l Programmer defines what metric to optimize l e. g. space-time l Runtime automatically selects implementation based on metric l Online: detect application usage of Set l Online: select appropriate implementation of Set Hash. Set Array. Set Linked. Set …
This Work l Programmer defines the implementation to be used l l Programmer defines what metric to optimize l l l Set s = new Hash. Set(); space-time product Space = Bloat Runtime suggests implementation based on metric l l l Online: automatically detect application usage of Hash. Set() Online: automatically suggest alternative to Hash. Set() Offline: programmer modifies program accordingly l e. g. Set s = new Array. Set();
How Can We Calculate Bloat ? l Data structure Bloat l l Occupied Data – Used Data Example: List s = new Array. List(); s. add(1); 1 Bloat for s is 9
How to Detect Collection Bloat? l Each collection maintains a field for used data l Language runtime can find out actually occupied data l l Bloat = Occupied Data – Used Data Solution: Garbage Collector Computes Bloat Online l l Reads used data fields from collections Low-overhead: can work online in production
Semantic Maps l How Collections Communicate Information to GC l l Includes size and pointers to actual data fields Allows for trivial support of Custom Collections Array. List … int size … Object[] Array … … Array. List Semantic map Hash. Map Semantic map Used Data Occupied Data GC Hash. Map … element. Count … element. Data …
Example: Collections Bloat in TVLA
Example: Collections Bloat in TVLA
Example: Collections Bloat in TVLA Lower bound for bloat
Fixing Bloat l Must correlate all bloat stats to program point l Need Trace Information l Remember: do not want to degrade time
Correlating Code and Bloat l l Aggregate bloat potential per allocation context Done by the garbage collector public final class Concrete. KAry. Predicate extends Concrete. Predicate { … public void modify() { … values = Hash. Map. Factory. make(this. values); } … } public class Generic. Blur extends Blur { … public void blur(TVS structure) { … Map inv. Canonic. Name =Hash. Map. Factory. make(structure. nodes(). size()); … } } public class Hash. Map. Factory { public static Map make(int size) { return new Hash. Map(size); } } Ctx 1 40% Ctx 4 7% Ctx 2 11% Ctx 5 5% Ctx 6 3% Ctx 8 3% Ctx 7 7% Ctx 3 5%
Trace Information l Track Collection Usage in Library: l l l Distribution of operations Distribution of size Aggregated per allocation context ctx 1 Size = 7 Get = 3 Add = 9 …. ctx 2 Size = 1 Contains = 100 Insert = 1 …. ctx 3 Size = 103 Contains = 10041 Insert = 140 Remove = 20 … ctxi …. ….
But how to choose the new Collection ? l Rule Engine: user defined rules l l l Input: Heap and Trace Statistics per-context Output: Suggested Collection for that context Rules based on trace and heap information l l Hash. Map: #contains < X Collmax. Size < Y → Array. Map Hash. Map: #contains < X Collmax. Size < Y+10 %live. Heap > Z → Array. Map Rule Engine Hashmap: max. Size < X → Array. Map Linked. List: No. List. Op → Array. List Hashmap: (#contains < X Collmax. Size < Y+10 %live. Heap > Z ) → Array. Map …
Overall Picture Potential report Recommendations Semantic Profiler Rule Engine Program ctx 1 Size = 7 Get = 3 Add = 9 …. ctx 2 Size = 1 Contains = 100 Insert = 1 …. Semantic maps Hashmap: max. Size < X → Array. Map Linked. List: No. List. Op → Array. List Hashmap: (#contains < X Collmax. Size < Y+10 %live. Heap > Z ) → Array. Map … … … Rules
Correct Collection Bloat – Typical Usage l Step 1: Profile for Bloat without Context l l Step 2: Combine heap information with trace information per context l l Can switch automatically to step 2 from step 1 Higher-overhead than step 1 Automatic: prior to Chameleon - a manual step (very hard) Step 3: Suggest fixes to user based on rules l l Low-overhead, can run in production If problem detected, go to step 2 Automatic Step 4: Programmer applies suggested fixes l Manual
Chameleon on TVLA 1: Hash. Map: tvla. . . Hash. Map. Factory: 31 ; tvla. core. base. Base. TVS: 50 replace with Array. Map … Size Operations Potential 4: Array. List: Base. Hash. TVSSet: 112; tvla. . . base. Base. Hash. TVSSet: 60 set initial capacity Max Avg Stddev 15 11. 33 1. 36 26 6. 31 5. 05 7 4. 8 1. 17
Implementation l Built on top of IBM’s JVM l Modifications to Parallel Mark and Sweep GC l Modular changes, readily applicable to other GCs l Modifications to collection libraries l Runtime overhead l l Detection Phase: Negligible Correction Phase: ~2 x (due to cost of getting context) l Can Use PCC by Bond & Mc. Kinley
Experimental Results – Memory
Experimental Results – Time
Related Work l Large volume of work on SETL l l Automatic data structure selection in SETL [Schonberg et. al. , POPL'79] SETL representation sublanguage [Dewar et. al, TOPLAS'79] … Bloat l The Causes of Bloat, The Limits of Health [ Mitchell and Sevitsky, OOPSLA’ 07]
Summary l Collection selection is a real problem l l l Chameleon integrates trace and heap information for choosing a collection implementation l l based on predefined rules Using Chameleon, reduced the footprint of several applications l l Runtime penalty Bloat Never degrading running time, often improving it First step towards automatic collection selection as part of the runtime system
- Microsoft cloud os
- Hovav shacham
- Something that moves
- Chameleon clustering algorithm
- Chameleón leopardí
- Comma comma comma chameleon meme
- Wisi chameleon
- Chameleon attacking in dream
- Ds3 how to get chameleon
- The chameleon effect
- Router en chameleon
- Chameleon ea performance
- Balancing selection vs stabilizing selection
- Disruptive selection
- Multiway selection
- Similarities
- What is exponential growth in ecology
- Procedure of pure line selection
- Natural selection vs artificial selection
- K selected
- Natural selection vs artificial selection
- Difference between continuous and discontinuous variation
- Two way selection and multiway selection in c
- Tel laminar 37m
- Tel. fax