Department of Computer Sciences No Bit Left Behind
Department of Computer Sciences No Bit Left Behind: The Limits of Heap Data Compression Jennifer B. Sartor* Martin Hirzel†, Kathryn S. Mc. Kinley* *U Texas at Austin, †IBM Watson ISMM 2008 1
Current State o Managed languages ubiquitous o Embedded devices o Multicore CPU CPU L 1 L 1 o Need memory efficiency! L 2 ISMM 2008 Department of Computer Sciences 2
Memory Efficiency of Managed Languages X COST X X X 8 -94% information content in heap in 37 benchmarks. [Mitchell & Sevitsky, OOPSLA 07] Boxed objects Trailing zeros in arrays Redundant objects Extra bit-width Data structure back-bones bzip 2 86% ü OPPORTUNITY ü Memory layout abstraction ü (Location + size) != identity ISMM 2008 Department of Computer Sciences 3
Related Work Ananian & Rinard. LCTES 03 Dom value field hash Appel & Goncalves. Tech Report 93 Eql obj sharing, Const field elide, Bit-width reduction Chen, Kandemir & Irwin. VEE 05 Dom value field elide Chen, et al. OOPSLA 03 Zero compr, Trail zero trim Cooprider & Regehr. PLDI 07 Value set indirection Marinov & O’Callahan. OOPSLA 03 Eql obj sharing Stephenson, Babb & Amarasinghe. PLDI 00 Const field elide, Bit-width reduction Titzer, et al. PLDI 07 Value set indirection Zilles. ISMM 07 Bit-width reduction ISMM 2008 Department of Computer Sciences 4
Limit Study o Quantitatively compare heap data compression n Surveyed literature n Savings equations n Methodology for evaluation n Apples-to-apples comparison n Future work: implementation o Hybrid techniques 58% o Findings: array & hybrid compression ISMM 2008 Department of Computer Sciences 5
Hybrid Array Compression x 0001 x 0058 x 0001 x 0004 x 0001 x 0000 x 0001 o Redundancy n Equal array sharing ISMM 2008 Department of Computer Sciences 6
Equal Object Sharing o Marinov & O’Callahan. OOPSLA 03; Appel & Goncalves. Tech Report 93 14% o Two objects are equal if both n Same class & all fields have same value o Strictly-equal: pointer fields identical o Deep: objects pointer targets are equal o JVM store only 1 copy in hashtable o Class C, N objects, D distinct; save: ISMM 2008 Department of Computer Sciences 7
Hybrid Array Compression x 0001 x 0058 x 0001 x 0004 x 0001 x 0000 x 0001 o Redundancy n Equal array sharing n Value set indirection Dictionary: x 0001 x 0058 x 0004 x 0000 0 0 1 0 2 0 3 0 ISMM 2008 Department of Computer Sciences 8
Value Set Indirection & Caching o Cooprider & Regehr/ Titzer, et al. PLDI 07 o For object field or array elements with large range of values n Dictionary (or cache) of 256 most frequent values, instance stores small 1 byte indices n If > 256 values, 255 in dictionary, 256 th says to store rest (M) in hashtable w/ object. ID 14% ISMM 2008 Department of Computer Sciences 9
Hybrid Array Compression 2 x 00 A 0 x 0073 x 0002 x 0001 x 0101 x 0000 o Remove zeros n Trim trailing zeros x 00 A 0 x 0073 x 0002 x 0001 x 0101 8 5 n Bit width reduce x 0 A 0 x 073 x 002 x 001 x 101 8 5 n Zero compress 8 5 10101111 x. AF x 0 A x 73 x 2 x 001 x 101 ISMM 2008 Department of Computer Sciences 10
Zero-based Object Compression o Chen, et al. OOPSLA 03 o Remove bytes that are entirely zero o Per object bit-map: 1 bit per byte o Store only non-zero bytes o Savings: 45% ISMM 2008 Department of Computer Sciences 11
Hybrid Array Compression 2 x 00 A 0 x 0073 x 0002 x 0001 x 0101 x 0000 o Remove zeros n Trim trailing zeros x 00 A 0 x 0073 x 0002 x 0001 x 0101 8 5 n Bit width reduce x 0 A 0 x 073 x 002 x 001 x 101 8 5 n Zero compress 8 5 x. AF x 0 A x 73 x 2 x 001 x 101 ISMM 2008 Department of Computer Sciences 12
Methodology Garbage Collection ���� Model 1 ���� … ���� t – snapshot Analysis Program run Heap dump series representation Model n ISMM 2008 Department of Computer Sciences s Limit savings 13
Experimental Details o Jikes Research Virtual Machine n Java-in-Java o Da. Capo benchmarks + pseudojbb o 20 -25 heap snapshots per benchmark n Mark. Sweep with 2 x min heap o Analysis n Per class n Objects and arrays separated n JVM+app vs application (separated in paper) n Per heap snapshot, and over all snapshots ISMM 2008 Department of Computer Sciences 14
Technique Class Array GC/Run Lempel-Ziv compression X GC Strictly-equal object sharing Obj Type GC Deep-equal object sharing Obj Type GC Zero-based object compression Obj Inst GC/Run Trailing zero array trimming Bit-width reduction Fld Dominant-value field hashing Fld GC Lazy invariant computation Fld GC Value set indirection Fld Type GC Value set caching Fld Type GC Constant field elision Fld Run Dominant-value field elision Fld Run ISMM 2008 Department of Computer Sciences 15
Value Indirection & Cache Deep Equal Sharing Zero Compression Hybrid Compression ISMM 2008 Department of Computer Sciences 16
Stability of Savings fop: snapshots over time ISMM 2008 Department of Computer Sciences 17
Conclusions o Limit study compare apples-to-apples heap data compression techniques o Potential to reduce memory inefficiencies in managed languages n Arrays n Hybrids o Future: save space n Challenge: efficient detection & recovery Thank you! ISMM 2008 Department of Computer Sciences 18
- Slides: 18