Huffman Coding CSE 373 Data Structures CSE 373

  • Slides: 29
Download presentation
Huffman Coding CSE 373 Data Structures CSE 373 AU 04 -- Huffman Coding

Huffman Coding CSE 373 Data Structures CSE 373 AU 04 -- Huffman Coding

Reading • Reading › Goodrich and Tamassia, Chapter 11, section 11. 4, pp. 567

Reading • Reading › Goodrich and Tamassia, Chapter 11, section 11. 4, pp. 567 -569. 8/29/2021 CSE 373 AU 04 -- Huffman Coding 2

Outline Motivation Character occurrence in documents: Frequency Tables Huffman trees Huffman codes Decoding 8/29/2021

Outline Motivation Character occurrence in documents: Frequency Tables Huffman trees Huffman codes Decoding 8/29/2021 CSE 373 AU 04 -- Huffman Coding 3

Motivation • Documents typically have a certain amount of redundancy in their representation in

Motivation • Documents typically have a certain amount of redundancy in their representation in a computer. • ASCII and other conventional character codes use equal numbers of bits for each character. • If certain symbols (characters) occur a lot more frequently than others, then let’s use shorter bit sequences for them, (although that will mean longer bit sequences for the less frequent characters). • Huffman coding provides the optimal assignment of numbers of bits to the different characters. 8/29/2021 CSE 373 AU 04 -- Huffman Coding 4

Frequency Tables “For years an oven course ages” 8/29/2021 CSE 373 AU 04 --

Frequency Tables “For years an oven course ages” 8/29/2021 CSE 373 AU 04 -- Huffman Coding a: 3 c: 1 e: 4 F: 1 g: 1 n: 2 o: 3 r: 3 s: 3 u: 1 v: 1 y: 1 Blank: 5 5

Huffman Trees A Huffman Tree is a binary tree having the following properties: Each

Huffman Trees A Huffman Tree is a binary tree having the following properties: Each leaf node contains a symbol (a character) and a frequency value. Each internal node contains a frequency (the sum of its children’s frequencies). 8/29/2021 CSE 373 AU 04 -- Huffman Coding a: 2 c: 1 e: 4 F: 1 g: 1 n: 1 o: 3 r: 3 s: 3 u: 1 v: 1 Blank: 5 6

Huffman Tree Construction To build a Huffman tree, do the following: Step 1: Initialize

Huffman Tree Construction To build a Huffman tree, do the following: Step 1: Initialize a forest of binary trees, with one-node tree per character occurring in the document. Each of these is a leaf node, so it gets labeled with the character and the frequency. Step 2: (a) Select a minimum-frequency Huffman tree from the forest. Call it A. Select the a second minimum-frequency Huffman tree from those remaining. Call it B. Remove A and B from the forest. (b) Combine A and B by creating a new node C, and making A the left subtree of C and B the right subtree of C. Label C with the sum of the frequencies of A and B. (c) Put the new tree, rooted at C back into the forest. Step 3: Repeat step 2 until only a single tree remains. Output this tree. 8/29/2021 CSE 373 AU 04 -- Huffman Coding 7

Example, Step 1: a: 3, c: 1, e: 4, F: 1, g: 1, n:

Example, Step 1: a: 3, c: 1, e: 4, F: 1, g: 1, n: 2, o: 3, r: 3, s: 3, u: 1, v: 1, y: 1, Blank: 5 The intial forest contains 13 trees of one leaf node each. a c e F g n o r s u v y Blank 3 1 4 1 1 2 3 3 3 1 1 1 5 8/29/2021 CSE 373 AU 04 -- Huffman Coding 8

Example, Step 2 a: c F 1 1 Removal of two minimum-frequency trees from

Example, Step 2 a: c F 1 1 Removal of two minimum-frequency trees from the forest. a e g n o r s u v y Blank 3 4 1 2 3 3 3 1 1 1 5 8/29/2021 CSE 373 AU 04 -- Huffman Coding 9

Example, Step 2 b: 2 c F 1 1 Combining the two, using a

Example, Step 2 b: 2 c F 1 1 Combining the two, using a new root node. a e g n o r s u v y Blank 3 4 1 2 3 3 3 1 1 1 5 8/29/2021 CSE 373 AU 04 -- Huffman Coding 10

Example, Step 2 c: Adding the new tree to the forest. There are now

Example, Step 2 c: Adding the new tree to the forest. There are now only 12 trees in the forest. 2 a c F e g n o r s u v y Blank 3 1 1 4 1 2 3 3 3 1 1 1 5 8/29/2021 CSE 373 AU 04 -- Huffman Coding 11

Example, Next iteration: Two more trees have been merged. 2 2 a c F

Example, Next iteration: Two more trees have been merged. 2 2 a c F e g u n o r s v y Blank 3 1 1 4 1 1 2 3 3 3 1 1 5 8/29/2021 CSE 373 AU 04 -- Huffman Coding 12

Example, Third iteration: There are now 10 trees in the forest. 2 2 2

Example, Third iteration: There are now 10 trees in the forest. 2 2 2 a c F e g u n o r s v y Blank 3 1 1 4 1 1 2 3 3 3 1 1 5 8/29/2021 CSE 373 AU 04 -- Huffman Coding 13

Example, Fourth iteration: There are now 9 trees in the forest. 4 2 2

Example, Fourth iteration: There are now 9 trees in the forest. 4 2 2 2 c F g u a e n o r s v y Blank 1 1 3 4 2 3 3 3 1 1 5 8/29/2021 CSE 373 AU 04 -- Huffman Coding 14

Example, 5 th iteration: There are now 8 trees in the forest. 4 4

Example, 5 th iteration: There are now 8 trees in the forest. 4 4 2 2 2 Blank c F g u a e n v y o r s 1 1 3 4 2 1 1 3 3 3 8/29/2021 CSE 373 AU 04 -- Huffman Coding 15 5

Example, 6 th iteration: There are now 7 trees in the forest. 4 4

Example, 6 th iteration: There are now 7 trees in the forest. 4 4 2 2 6 2 Blank c F g u a o e n v y r s 1 1 3 3 4 2 1 1 3 3 8/29/2021 CSE 373 AU 04 -- Huffman Coding 16 5

Example, 7 th iteration: There are now 6 trees in the forest. 4 4

Example, 7 th iteration: There are now 6 trees in the forest. 4 4 2 2 6 Blank c F g u a o e n v y r s 1 1 3 3 4 2 1 1 3 3 8/29/2021 CSE 373 AU 04 -- Huffman Coding 17 5

Example, 8 th iteration: There are now 5 trees in the forest. 8 4

Example, 8 th iteration: There are now 5 trees in the forest. 8 4 4 2 2 2 6 6 Blank c F g u n v y a o e r s 1 1 2 1 1 3 3 4 3 3 8/29/2021 CSE 373 AU 04 -- Huffman Coding 18 5

Example, 9 th iteration: There are now 4 trees in the forest. 8 4

Example, 9 th iteration: There are now 4 trees in the forest. 8 4 4 2 2 2 9 6 6 c F g u n v y e Blank a o r s 1 1 2 1 1 4 5 3 3 8/29/2021 CSE 373 AU 04 -- Huffman Coding 19

Example, 10 th iteration: There are now 3 trees in the forest. 8 4

Example, 10 th iteration: There are now 3 trees in the forest. 8 4 12 4 2 2 2 9 6 6 c F g u n v y e Blank a o r s 1 1 2 1 1 4 5 3 3 8/29/2021 CSE 373 AU 04 -- Huffman Coding 20

Example, 11 th iteration: There are now 2 trees in the forest. 17 8

Example, 11 th iteration: There are now 2 trees in the forest. 17 8 4 12 4 2 2 2 9 6 6 c F g u n v y e Blank a o r s 1 1 2 1 1 4 5 3 3 8/29/2021 CSE 373 AU 04 -- Huffman Coding 21

Example, 12 th (last) iteration: There is now only one tree in the forest.

Example, 12 th (last) iteration: There is now only one tree in the forest. 29 17 8 4 12 4 2 2 2 9 6 6 c F g u n v y e Blank a o r s 1 1 2 1 1 4 5 3 3 8/29/2021 CSE 373 AU 04 -- Huffman Coding 22

Example, Assign Edge Labels: 0 on left edges, 1 on right edges. 0 8

Example, Assign Edge Labels: 0 on left edges, 1 on right edges. 0 8 0 4 0 0 c 1 F 1 1 8/29/2021 1 4 2 0 1 1 17 1 1 2 29 0 0 1 2 0 g u n 0 v 1 1 2 1 9 1 6 1 0 y e Blank 1 4 5 CSE 373 AU 04 -- Huffman Coding 12 1 0 6 1 0 1 a o r s 3 3 23

Example, Coding Table: a: 100 c: 00000 e: 010 F: 00001 g: 00010 n:

Example, Coding Table: a: 100 c: 00000 e: 010 F: 00001 g: 00010 n: 0010 o: 101 r: 110 s: 111 u: 00011 v: 00110 y: 00111 Blank: 011 8/29/2021 CSE 373 AU 04 -- Huffman Coding 24

Example, The Document, Coded: a: 100 c: 00000 e: 010 F: 00001 g: 00010

Example, The Document, Coded: a: 100 c: 00000 e: 010 F: 00001 g: 00010 n: 0010 o: 101 r: 110 s: 111 u: 00011 v: 00110 y: 00111 Blank: 011 8/29/2021 00011011100111010100110111000010 011101001000100110000010100011110111 01001110000010010111 Formatted with spaces after each letter and breaks after each blank: 0001 110 011 00111 010 100 111 011 100 0010 011 101 00110 0010 011 00000 101 00011 110 111 010 011 100 00010 111 For years an oven course ages CSE 373 AU 04 -- Huffman Coding 25

Example, Decoding: a: 100 c: 00000 e: 010 F: 00001 g: 00010 n: 0010

Example, Decoding: a: 100 c: 00000 e: 010 F: 00001 g: 00010 n: 0010 o: 101 r: 110 s: 111 u: 00011 v: 00110 y: 00111 Blank: 011 8/29/2021 Use the Huffman tree as a decoding aid, starting at the root and following the edge left or right depending on whether the current next symbol in the code is a 0 or a 1. When you reach a leaf, output the character there, and start processing the next symbol from the root again. 00011011100111010100110111000010 011101001000100110000010100011110111 01001110000010010111 Huffman codes are “prefix codes” and there is never any ambiguity about how to process the next symbol. CSE 373 AU 04 -- Huffman Coding 26

Compression Ratio The coded document uses 100 bits*. The 8 -bit ASCII version requires

Compression Ratio The coded document uses 100 bits*. The 8 -bit ASCII version requires 29*8 = 232 bits. The compression ratio is 100/232 = 0. 4310 (*Not including the coding table) 8/29/2021 CSE 373 AU 04 -- Huffman Coding 27

Efficient Implementation While constructing the Huffman tree, maintain a priority queue that holds the

Efficient Implementation While constructing the Huffman tree, maintain a priority queue that holds the forest of trees. This makes it easy to obtain the minimum frequency tree using FINDMIN and DELETEMIN. 8/29/2021 CSE 373 AU 04 -- Huffman Coding 28

Closing Remarks Huffman Coding is an important data compression method. It can be applied

Closing Remarks Huffman Coding is an important data compression method. It can be applied to text, images or any data that can be described as a sequence of symbols from a fixed set of symbols. It is often used as part of other systems, such as the JPEG image compression method. 8/29/2021 CSE 373 AU 04 -- Huffman Coding 29