LempelZiv Compression Techniques Classification of Lossless Compression techniques

  • Slides: 27
Download presentation
Lempel-Ziv Compression Techniques • Classification of Lossless Compression techniques • Introduction to Lempel-Ziv Encoding:

Lempel-Ziv Compression Techniques • Classification of Lossless Compression techniques • Introduction to Lempel-Ziv Encoding: LZ 77 & LZ 78 • LZ 78 – Encoding Algorithm – Decoding Algorithm • LZW – Encoding Algorithm – Decoding Algorithm 1

Classification of Lossless Compression Techniques Recall what we studied before: • Lossless Compression techniques

Classification of Lossless Compression Techniques Recall what we studied before: • Lossless Compression techniques are classified into static, adaptive (or dynamic), and hybrid. • Static coding requires two passes: one pass to compute probabilities (or frequencies) and determine the mapping, and a second pass to encode. • Examples of Static techniques: Static Huffman Coding • All of the adaptive methods are one-pass methods; only one scan of the message is required. • Examples of adaptive techniques: LZ 77, LZ 78, LZW, and Adaptive Huffman Coding 2

Introduction to Lempel-Ziv Encoding • Data compression up until the late 1970's mainly directed

Introduction to Lempel-Ziv Encoding • Data compression up until the late 1970's mainly directed towards creating better methodologies for Huffman coding. • An innovative, radically different method was introduced in 1977 by Abraham Lempel and Jacob Ziv. • This technique (called Lempel-Ziv) actually consists of two considerably different algorithms, LZ 77 and LZ 78. • Due to patents, LZ 77 and LZ 78 led to many variants: LZ 77 Variants LZR LZSS LZB LZH LZ 78 Variants LZW LZC LZT LZMW LZJ LZFG • The zip and unzip use the LZH technique while UNIX's compress methods belong to the LZW and LZC classes. 3

LZ 78 Encoding Algorithm LZ 78 inserts one- or multi-character, non-overlapping, distinct patterns of

LZ 78 Encoding Algorithm LZ 78 inserts one- or multi-character, non-overlapping, distinct patterns of the message to be encoded in a Dictionary. The multi-character patterns are of the form: C 0 C 1. . . Cn-1 Cn. The prefix of a pattern consists of all the pattern characters except the last: C 0 C 1. . . Cn-1 LZ 78 Output: Note: The dictionary is usually implemented as a hash table. 4

LZ 78 Encoding Algorithm (cont’d) Dictionary empty ; Prefix empty ; Dictionary. Index 1;

LZ 78 Encoding Algorithm (cont’d) Dictionary empty ; Prefix empty ; Dictionary. Index 1; while(character. Stream is not empty) { Char next character in character. Stream; if(Prefix + Char exists in the Dictionary) Prefix + Char ; else { if(Prefix is empty) Code. Word. For. Prefix 0 ; else Code. Word. For. Prefix Dictionary. Index for Prefix ; Output: (Code. Word. For. Prefix, Char) ; insert. In. Dictionary( ( Dictionary. Index , Prefix + Char) ); Dictionary. Index++ ; Prefix empty ; } } if(Prefix is not empty) { Code. Word. For. Prefix Dictionary. Index for Prefix; Output: (Code. Word. For. Prefix , ) ; } 5

Example 1: LZ 78 Encoding Encode (i. e. , compress) the string ABBCBCABABCAAB using

Example 1: LZ 78 Encoding Encode (i. e. , compress) the string ABBCBCABABCAAB using the LZ 78 algorithm. The compressed message is: (0, A)(0, B)(2, C)(3, A)(2, A)(4, A)(6, B) Note: The above is just a representation, the commas and parentheses are not transmitted; we will discuss the actual form of the compressed message later on in slide 12. 6

Example 1: LZ 78 Encoding (cont’d) 1. A is not in the Dictionary; insert

Example 1: LZ 78 Encoding (cont’d) 1. A is not in the Dictionary; insert it 2. B is not in the Dictionary; insert it 3. B is in the Dictionary. BC is not in the Dictionary; insert it. 4. B is in the Dictionary. BCA is not in the Dictionary; insert it. 5. B is in the Dictionary. BA is not in the Dictionary; insert it. 6. B is in the Dictionary. BCAA is not in the Dictionary; insert it. 7. B is in the Dictionary. BCAAB is not in the Dictionary; insert it. 7

Example 2: LZ 78 Encoding Encode (i. e. , compress) the string BABAABRRRA using

Example 2: LZ 78 Encoding Encode (i. e. , compress) the string BABAABRRRA using the LZ 78 algorithm. The compressed message is: (0, B)(0, A)(1, A)(2, B)(0, R)(5, R)(2, ) 8

Example 2: LZ 78 Encoding (cont’d) 1. B is not in the Dictionary; insert

Example 2: LZ 78 Encoding (cont’d) 1. B is not in the Dictionary; insert it 2. A is not in the Dictionary; insert it 3. B is in the Dictionary. BA is not in the Dictionary; insert it. 4. A is in the Dictionary. AB is not in the Dictionary; insert it. 5. R is not in the Dictionary; insert it. 6. R is in the Dictionary. RR is not in the Dictionary; insert it. 7. A is in the Dictionary and it is the last input character; output a pair containing its index: (2, ) 9

Example 3: LZ 78 Encoding Encode (i. e. , compress) the string AAAAA using

Example 3: LZ 78 Encoding Encode (i. e. , compress) the string AAAAA using the LZ 78 algorithm. 1. A is not in the Dictionary; insert it 2. A is in the Dictionary AA is not in the Dictionary; insert it 3. A is in the Dictionary. AAA is not in the Dictionary; insert it. 4. A is in the Dictionary. AAA is in the Dictionary and it is the last pattern; output a pair containing its index: (3, ) 10

LZ 78 Encoding: Number of bits transmitted • Example: Uncompressed String: ABBCBCABABCAAB Number of

LZ 78 Encoding: Number of bits transmitted • Example: Uncompressed String: ABBCBCABABCAAB Number of bits = Total number of characters * 8 = 18 * 8 = 144 bits • Suppose the codewords are indexed starting from 1: Compressed string( codewords): (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) Codeword index 1 2 3 4 5 6 7 • Each code word consists of an integer and a character: • The character is represented by 8 bits. • The number of bits n required to represent the integer part of the codeword with index i is given by: • Alternatively number of bits required to represent the integer part of the codeword with index i is the number of significant bits required to represent the integer i – 1 11

LZ 78 Encoding: Number of bits transmitted (cont’d) Codeword index Bits: (0, A) (0,

LZ 78 Encoding: Number of bits transmitted (cont’d) Codeword index Bits: (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) 1 2 3 4 5 6 7 (1 + 8) + (2 + 8) + (3 + 8) = 71 bits The actual compressed message is: 0 A 0 B 10 C 11 A 010 A 100 A 110 B where each character is replaced by its binary 8 -bit ASCII code. 12

LZ 78 Decoding Algorithm Dictionary empty ; Dictionary. Index 1 ; while(there are more

LZ 78 Decoding Algorithm Dictionary empty ; Dictionary. Index 1 ; while(there are more (Code. Word, Char) pairs in codestream){ Code. Word next Code. Word in codestream ; Char character corresponding to Code. Word ; if(Code. Word = = 0) String empty ; else String string at index Code. Word in Dictionary ; Output: String + Char ; insert. In. Dictionary( (Dictionary. Index , String + Char) ) ; Dictionary. Index++; } Summary: Ø Ø input: (CW, character) pairs output: if(CW == 0) output: current. Character else output: string. At. Index CW + current. Character Ø Insert: current output in dictionary 13

Example 1: LZ 78 Decoding Decode (i. e. , decompress) the sequence (0, A)

Example 1: LZ 78 Decoding Decode (i. e. , decompress) the sequence (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) The decompressed message is: ABBCBCABABCAAB 14

Example 2: LZ 78 Decoding Decode (i. e. , decompress) the sequence (0, B)

Example 2: LZ 78 Decoding Decode (i. e. , decompress) the sequence (0, B) (0, A) (1, A) (2, B) (0, R) (5, R) (2, ) The decompressed message is: BABAABRRRA 15

Example 3: LZ 78 Decoding Decode (i. e. , decompress) the sequence (0, A)

Example 3: LZ 78 Decoding Decode (i. e. , decompress) the sequence (0, A) (1, A) (2, A) (3, ) The decompressed message is: AAAAA 16

LZW Encoding Algorithm • If the message to be encoded consists of only one

LZW Encoding Algorithm • If the message to be encoded consists of only one character, LZW outputs the code for this character; otherwise it inserts two- or multi-character, overlapping*, distinct patterns of the message to be encoded in a Dictionary. *The last character of a pattern is the first character of the next pattern. • The patterns are of the form: C 0 C 1. . . Cn-1 Cn. The prefix of a pattern consists of all the pattern characters except the last: C 0 C 1. . . Cn-1 LZW output if the message consists of more than one character: Ø If the pattern is not the last one; output: The code for its prefix. Ø If the pattern is the last one: • if the last pattern exists in the Dictionary; output: The code for the pattern. • If the last pattern does not exist in the Dictionary; output: code(last. Prefix) then output: code(last. Character) Note: LZW outputs codewords that are 12 -bits each. Since there are 212 = 4096 codeword possibilities, the minimum size of the Dictionary is 4096; however since the Dictionary is usually implemented as a hash table its size is larger than 4096. 17

LZW Encoding Algorithm (cont’d) Initialize Dictionary with 256 single character strings and their corresponding

LZW Encoding Algorithm (cont’d) Initialize Dictionary with 256 single character strings and their corresponding ASCII codes; Prefix first input character; Code. Word 256; while(not end of character stream){ Char next input character; if(Prefix + Char exists in the Dictionary) Prefix + Char; else{ Output: the code for Prefix; insert. In. Dictionary( (Code. Word , Prefix + Char) ) ; Code. Word++; Prefix Char; } } Output: the code for Prefix; 18

Example 1: Compression using LZW Encode the string BABAABAAA by the LZW encoding algorithm.

Example 1: Compression using LZW Encode the string BABAABAAA by the LZW encoding algorithm. 1. BA is not in the Dictionary; insert BA, output the code for its prefix: code(B) 2. AB is not in the Dictionary; insert AB, output the code for its prefix: code(A) 3. BA is in the Dictionary. BAA is not in Dictionary; insert BAA, output the code for its prefix: code(BA) 4. AB is in the Dictionary. ABA is not in the Dictionary; insert ABA, output the code for its prefix: code(AB) 5. AA is not in the Dictionary; insert AA, output the code for its prefix: code(A) 6. AA is in the Dictionary and it is the last pattern; output its code: code(AA) • The compressed message is: <66><65><256><257><65><260> • Note: Each of the characters < and > is not sent; each code word is sent using 12 bits 19

Example 2: Compression using LZW Encode the string BABAABRRRA by the LZW encoding algorithm.

Example 2: Compression using LZW Encode the string BABAABRRRA by the LZW encoding algorithm. 1. BA is not in the Dictionary; insert BA, output the code for its prefix: code(B) 2. AB is not in the Dictionary; insert AB, output the code for its prefix: code(A) 3. BA is in the Dictionary. BAA is not in Dictionary; insert BAA, output the code for its prefix: code(BA) 4. AB is in the Dictionary. ABR is not in the Dictionary; insert ABR, output the code for its prefix: code(AB) 5. RR is not in the Dictionary; insert RR, output the code for its prefix: code(R) 6. RR is in the Dictionary. RRA is not in the Dictionary and it is the last pattern; insert RRA, output code for its prefix: code(RR), then output code for last character: code(A) The compressed message is: <66><65><256><257><82><260> <65> 20

LZW: Number of bits transmitted Example: Uncompressed String: aaabbbbbbaabaaba Number of bits = Total

LZW: Number of bits transmitted Example: Uncompressed String: aaabbbbbbaabaaba Number of bits = Total number of characters * 8 = 16 * 8 = 128 bits Compressed string (codewords): <97><256><98><259><257><261> Number of bits = Total Number of codewords * 12 = 7 * 12 = 84 bits Note: Each codeword is 12 bits because the minimum Dictionary size is taken as 4096, and 212 = 4096 21

LZW Decoding Algorithm The LZW decompressor creates the same string table during decompression. Initialize

LZW Decoding Algorithm The LZW decompressor creates the same string table during decompression. Initialize Dictionary with 256 ASCII codes and corresponding single character strings as their translations; Previous. Code. Word first input code; Output: string(Previous. Code. Word) ; Char character(first input code); Code. Word 256; while(not end of code stream){ Current. Code. Word next input code ; if(Current. Code. Word exists in the Dictionary) String string(Current. Code. Word) ; else String string(Previous. Code. Word) + Char ; Output: String; Char first character of String ; insert. In. Dictionary( (Code. Word , string(Previous. Code. Word) + Char ) ); Previous. Code. Word Current. Code. Word ; Code. Word++ ; } 22

LZW Decoding Algorithm (cont’d) Summary of LZW decoding algorithm: output: string(first Code. Word); code.

LZW Decoding Algorithm (cont’d) Summary of LZW decoding algorithm: output: string(first Code. Word); code. Word = 256; while(there are more Code. Words){ if(Current. Code. Word is in the Dictionary) output: string(Current. Code. Word); else output: Previous. Output + Previous. Output first character; insert at Dictionary[code. Word++]: Previous. Output + Current. Output first character; } 23

Example 1: LZW Decompression Use LZW to decompress the output sequence <66> <65> <256>

Example 1: LZW Decompression Use LZW to decompress the output sequence <66> <65> <256> <257> <65> <260> 1. 2. 3. 4. 5. 6. 66 is in Dictionary; output string(66) i. e. B 65 is in Dictionary; output string(65) i. e. A, insert BA 256 is in Dictionary; output string(256) i. e. BA, insert AB 257 is in Dictionary; output string(257) i. e. AB, insert BAA 65 is in Dictionary; output string(65) i. e. A, insert ABA 260 is not in Dictionary; output previous output + previous output first character: AA, insert AA 24

Example 2: LZW Decompression Decode the sequence <67> <70> <256> <258> <259> <257> by

Example 2: LZW Decompression Decode the sequence <67> <70> <256> <258> <259> <257> by LZW decode algorithm. 1. 2. 3. 4. 5. 6. 67 is in Dictionary; output string(67) i. e. C 70 is in Dictionary; output string(70) i. e. F, insert CF 256 is in Dictionary; output string(256) i. e. CF, insert FC 258 is not in Dictionary; output previous output + C i. e. CFC, insert CFC 259 is not in Dictionary; output previous output + C i. e. CFCC, insert CFCC 257 is in Dictionary; output string(257) i. e. FC, insert CFCCF 25

LZW: Limitations • What happens when the dictionary gets too large? • One approach

LZW: Limitations • What happens when the dictionary gets too large? • One approach is to clear entries 256 -4095 and start building the dictionary again. • The same approach must also be used by the decoder. 26

Exercises 1. Use LZ 78 to trace encoding the string SATATASACITASA. 2. Write a

Exercises 1. Use LZ 78 to trace encoding the string SATATASACITASA. 2. Write a Java program that encodes a given string using LZ 78. 3. Write a Java program that decodes a given set of encoded codewords using LZ 78. 4. Use LZW to trace encoding the string ABRACADABRA. 5. Write a Java program that encodes a given string using LZW. 6. Write a Java program that decodes a given set of encoded codewords using LZW. 27