Code Compression n n Motivations Data compression techniques

  • Slides: 17
Download presentation
Code Compression n n Motivations Data compression techniques Code compression options and methods Comparison

Code Compression n n Motivations Data compression techniques Code compression options and methods Comparison 1

Motivations for Code Compression n Code storage is significant fraction of the cost of

Motivations for Code Compression n Code storage is significant fraction of the cost of an embedded system ranging from 10% to 50% Instruction fetch bandwidth is significant part of performance, e. g. 5% to 15% of execution time Code increase can be attributed to Embedded applications are becoming more complex ¨ VLIW/EPIC instructions are explicitly less dense ¨ Aggressive (VLIW) compiler optimizations for code speed (ILP enhancement) also increases code size ¨ 2

Data Compression Techniques n n n We can view code sequences as “random” sources

Data Compression Techniques n n n We can view code sequences as “random” sources of symbols from an alphabet of instructions Instructions have non-uniform frequency distributions, e. g. reuse of opcodes and registers The entropy H(X) of a stochastic source X measures the information content of X Suppose the alphabet of X is AX = {a 1, …, an} with probabilities {p 1, …, pn} in the source X then H(X) = 1<i<n pi log 2(1/pi) 3

Examples n n Take sequence of letters from alphabet {A, B, …, Z} such

Examples n n Take sequence of letters from alphabet {A, B, …, Z} such that probabilities are uniform {1/26, …, 1/26}, then H(X) = 1<i<26 pilog 2(1/pi)= 1<i<26 log 2(26)/26 = 26 log 2(26)/26 4. 7 bits Take X = {a, b, a, c, a} with AX = {a, b, c}, then probabilities of symbols in X are {1/2, 1/4}, and thus H(X) = 1<i<3 pilog 2(1/pi) 1. 5 bits, so any sequence with same symbol frequencies as X can be theoretically compressed to 8*1. 5 bits = 12 bits 4

Huffman Encoding a. 5 b. 25 a. 5 n c. 25 n . 5

Huffman Encoding a. 5 b. 25 a. 5 n c. 25 n . 5 0 c. 25 1 b. 25 n 1. 0 1 a. 5 0. 5 1 b. 25 0 c. 25 Symb. Prob. Code a . 5 1 b . 25 01 c . 25 00 Optimal compression is achieved for 2 -k symbol frequency distributions Take X = {a, b, a, c, a} with AX = {a, b, c}, then probabilities are {1/2, 1/4} Huffman encoding uses 12 bits total to encode X: 1011001 5

Code Compression Issues n br B 7 Decompressed code ? Compressed code To execute

Code Compression Issues n br B 7 Decompressed code ? Compressed code To execute the branch, we need to obtain compressed code for B 7 and decompress it n n Runtime on-the-fly decoding requires random access into the compressed program to support branching Not a big problem with Huffman encoding (e. g. use padding to align branch target) Coarse-grain compression methods that require decompression from the beginning of the code are not acceptable 6

Compression Options n Code compression can take place in three different places: Instructions can

Compression Options n Code compression can take place in three different places: Instructions can be decompressed on fetch from cache 2. Instructions can be decompressed when refilling the cache from memory 3. Program can be decompressed when loaded into memory 1. 7

Decompression on Fetch n n Decompress instruction on IF Advantage: ¨ fetch I-cache Decompression

Decompression on Fetch n n Decompress instruction on IF Advantage: ¨ fetch I-cache Decompression decode Instruction decoder execute n Increased I-cache efficiency Disadvantages: Decompression occurs on critical timing path! ¨ Requires additional pipeline stage(s) ¨ Compression method must be simple to reduce overhead, e. g. MIPS 16 and ARM-Thumb use simple encodings with fewer bits ¨ 8

Decompression on Refill n n Fills I-cache line with decompressed code Advantages: No circuitry

Decompression on Refill n n Fills I-cache line with decompressed code Advantages: No circuitry on critical path ¨ Enhanced memory bandwidth ¨ fetch Decompression I-cache decode Instruction decoder n Disadvantages: Increased cache miss latency ¨ Must preserve randomaccess property of program ¨ execute 9

Load-time Decompression n n Program is decompressed when loaded into memory Advantages: Compressing the

Load-time Decompression n n Program is decompressed when loaded into memory Advantages: Compressing the entire code is more efficient ¨ No random-access requirement, e. g. can use Lempel-Ziv ¨ Can also compress data in data and code segments ¨ n Disadvantage: ¨ Code in ROM must be duplicated to RAM on embedded systems 10

Code Compression Methods n Five major categories: 1. 2. 3. 4. 5. Hand-tuned ISAs

Code Compression Methods n Five major categories: 1. 2. 3. 4. 5. Hand-tuned ISAs Ad-hoc compression schemes RAM decompression Dictionary-based software compression Cache-based compression 11

Hand-tuned ISAs n n n Most commonly used in CISC and DSP world Reduce

Hand-tuned ISAs n n n Most commonly used in CISC and DSP world Reduce instruction size by designing a compact ISA based on operation frequencies Disadvantages: Makes the ISA more complex and the decode stage more expensive ¨ Makes the ISA non-orthogonal hampering compiler optimizations and inflexible for future extensions of the ISA ¨ 12

Ad-hoc Compression Schemes n n n Typically specifies two instruction modes: compressed and uncompressed

Ad-hoc Compression Schemes n n n Typically specifies two instruction modes: compressed and uncompressed MIPS 16 and ARM-Thumb Advantages: Instructions stay compressed in cache ¨ Decode is simple ¨ n Disadvantages: Decompression is on the critical path ¨ Compression rates are low ¨ ARM Thumb 13

RAM Decompression n n Stores compressed program in ROM and decompresses to RAM at

RAM Decompression n n Stores compressed program in ROM and decompresses to RAM at load time Used by the Linux boot loader Rarely used in embedded systems See load-time decompression for pros and cons 14

Dictionary-based Software Compression n … add ldw ldw add stw add … r 1,

Dictionary-based Software Compression n … add ldw ldw add stw add … r 1, #8 r 0, 0[r 1] r 2, 4[r 1] r 0, r 2 r 0, 0[r 3] r 3, #4 L 1: add ldw add stw add ret r 1, #8 r 0, 0[r 1] r 2, 4[r 1] n r 0, r 2 r 0, 0[r 3] r 3, #4 n Identifies code sequences that can be factored out into “subroutines” Comparable to microcode and nanocode techniques from the microprogramming era Advantage: ¨ … call L 17 … n No specialized hardware needed Disadvantages: Invasive to compiler tools, debuggers, profilers, etc. ¨ Slow with no hardware support for fast lookup ¨ 15

Cache-based Compression Cache line address >> 5 Cache line lookaside buffer (CLB) Line address

Cache-based Compression Cache line address >> 5 Cache line lookaside buffer (CLB) Line address table (LAT) cache n Refill with decomressed line n Corresponding compressed code cache line address MEM n n Uses software compression and simple hardware decompression to refill cache lines with decompressed code Cache line address is translated to memory address of the compressed code using the line address table (LAT) Cache-line look-aside buffer (CLB) caches the LAT Technique is the basis of IBM Code. Pack for the Power. PC ¨ MMU has bit per page to indicate compressed page 16

Compression Benefits n Ad-hoc compression schemes ARM-Thumb compression rate 30% ¨ MIPS 16 compression

Compression Benefits n Ad-hoc compression schemes ARM-Thumb compression rate 30% ¨ MIPS 16 compression rate 40% ¨ n LAT-based compression ¨ n n n IBM Power. Pack compression rate is 47% These numbers are near the first-order entropy of the programs tested However, compression can be improved by using crosscorrelation between two or more instructions Note: compression rate = (uncompressed_size - compressed_size) / uncompressed_size 17