Crinkler compressing Windows 4 k intros to EXE






















- Slides: 22
Crinkler - compressing Windows 4 k intros to EXE files Aske Simon Christensen Rune L. H. Stubbe Assembly 2005, Helsinki, July 2005 1
Overview • • • Background Compression method Function import Header layout Demo Future plans Assembly 2005, Helsinki, July 2005 2
Why another one? • Most common method: CAB dropping EXE file EXE optimizer CAB compressor BAT inserter BAT file • Dropping is a mess • We want EXE files! Assembly 2005, Helsinki, July 2005 3
How is Crinkler different? • The normal build process: C/C++ files ASM files Compiler object / library files Assember Linker Cruncher EXE file Assembly 2005, Helsinki, July 2005 4
How is Crinkler different? • The Crinkler way: C/C++ files ASM files Compiler object / library files Assember Crinkler EXE file Assembly 2005, Helsinki, July 2005 5
Why another one? • Control over code and data placement – Choose base address – Optimize order for best compression – Separate code and data – Put in extra code • Import code • Code transformations Assembly 2005, Helsinki, July 2005 6
Compression method • Context modelling + Much better compression ratio than LZX + Well suited for small amounts of data + Small decompression code (< 250 bytes) + Pays off even with the extra header - Extremely slow - Very memory-hungry Assembly 2005, Helsinki, July 2005 7
Data compression basics • Take advantage of self-similarity • Find patterns and eliminate them • Dictionary compression • Statistical compression Assembly 2005, Helsinki, July 2005 8
Dictionary compression • LZ 77: Refer repetitions back to original M I S S I P P I • Reasonable compression ratio • Fast compression • Very fast decompression Assembly 2005, Helsinki, July 2005 9
Statistical compression • Estimate probability distribution of each symbol based on earlier data • PPM: M I S S I P P I • Problem: local Assembly 2005, Helsinki, July 2005 10
Context modelling • Generalization of PPM • Look at combinations of recent symbols • A bit mask describes a model 0 0 0 1 0 0 M I S S I P P I • Problem: Many masks to choose from Assembly 2005, Helsinki, July 2005 11
Implementation • • Estimation for each single bit Context is current byte + selection of last 8 Estimate the best collection of masks Estimate the best weights of the masks Keep track of contexts in a hash table Ignore hash collisions Find hash table size with few collisions Assembly 2005, Helsinki, July 2005 12
Function import • Import by name: Name of each function – The import table is a big part of an EXE file • Import by ordinal: Number instead of name – Much smaller but quite incompatible • Import by hash: Hash code of each function – Small and compatible – Not supported directly • Import by hashed ordinal range Assembly 2005, Helsinki, July 2005 13
Header optimization DOS header PE offset DOS stub PE header Data directories Section header 544 bytes! Assembly 2005, Helsinki, July 2005 14
Header optimization DOS header PE offset DOS stub PE header Data directories Section header Assembly 2005, Helsinki, July 2005 15
Header optimization DOS header PE offset DOS stub PE header Data directories Section header Assembly 2005, Helsinki, July 2005 16
Header optimization DOS header PE offset DOS stub PE header Data directories Section header Ignored Assembly 2005, Helsinki, July 2005 17
Header optimization DOS header PE offset DOS stub PE header Data directories Section header 196 bytes! Ignored Assembly 2005, Helsinki, July 2005 18
Header optimization DOS header PE offset DOS stub PE header Data directories 124 bytes + 18 hash codes! Section header Hash code Assembly 2005, Helsinki, July 2005 19
Demo Assembly 2005, Helsinki, July 2005 20
Future plans • • • Windows 2000 compatibility Even better compression Section reordering Transformations More feedback 64 k specialized version Assembly 2005, Helsinki, July 2005 21
Thank you Questions? Comments? Suggestions? Assembly 2005, Helsinki, July 2005 22