Data Compression and Entropy General Transmission of Information

  • Slides: 21
Download presentation
Data Compression and Entropy • General Transmission of Information • Entropy of a Source

Data Compression and Entropy • General Transmission of Information • Entropy of a Source as a Measure of Information and Optimal Coding • Entropy Coding: the Huffman Code

General Transmission of Information We saw the general structure of an efficient transmission system:

General Transmission of Information We saw the general structure of an efficient transmission system: segment transform reconstruct The symbols with each other. quantize inverse transform encode decode are quantized and (as much as possible) uncorrelated Problem: what is the most efficient way of transmitting them?

General Transmission of Information Source of Information encode Since is a quantized sequence, each

General Transmission of Information Source of Information encode Since is a quantized sequence, each sample can assume values on a finite set (called an alphabet), as with probabilities Also assume i. i. d. (independent, identically distributed) entries, as if

General Transmission of Information Example: a binary source A typical realization looks like this:

General Transmission of Information Example: a binary source A typical realization looks like this: Source In long messages 80% of the symbols are “ 1”s and 20% “ 0”s

General Transmission of Information Example: a signal in a data file with 8 bits/sample,

General Transmission of Information Example: a signal in a data file with 8 bits/sample, equally likely File

General Transmission of Information Example: Text … this is a text message and we

General Transmission of Information Example: Text … this is a text message and we see a lot of words … (neglect punctuation, upper case, numbers …) Symbol Probabilities (highest) Space 0. 1859 E 0. 1031 T 0. 0796 A 0. 0642 O 0. 0632 N 0. 0574 … … Symbol Probabilities (lowest) … … X 0. 0013 J 0. 0008 Q 0. 0008 Z 0. 0005

General Transmission of Information In this case we know a possible encoding for data

General Transmission of Information In this case we know a possible encoding for data transmission: the Morse Code. Text encode morse code Every letter is encoded with dots (*) and lines (-) as: Notice: • Shorter code for most likely letters (e, t) • Longer code for least likely letters (x, q, z…)

Entropy of a Source Along similar lines as the Morse Code, the encoder assigns

Entropy of a Source Along similar lines as the Morse Code, the encoder assigns a binary code (not necessarily of a fixed length) to every symbol of the alphabet: Source s[n] Example: encode y[n] 01 10 001 010 011 Problem: • what is the min. number of average bits/symbol we need?

Entropy of a Source Now a question: Given a Source with Alphabet and Probabilities

Entropy of a Source Now a question: Given a Source with Alphabet and Probabilities how many messages of given length N >>> m we can generate? In any message of length N we have message of length N repetitions of symbol

Entropy of a Source Therefore the number of all possible combinations (ie the number

Entropy of a Source Therefore the number of all possible combinations (ie the number of all possible messages) is given by Example. Given a source with alphabet and probabilities: A message of length N=1000 has the following numbers of 0’s and 1’s Total number of messages of length N=1000 is: very tedious to compute!!!!!

Entropy of a Source A better way: use Sterling approximation for large n as

Entropy of a Source A better way: use Sterling approximation for large n as Apply it to the number Q Use the fact to obtain ENTROPY

Entropy of a Source This defines the entropy of a source of information Meaning:

Entropy of a Source This defines the entropy of a source of information Meaning: For sequences of length N, sufficiently large, the total number of all possible realizations with is

Entropy of a Source Example. Consider again the source Entropy There are possible messages

Entropy of a Source Example. Consider again the source Entropy There are possible messages of length N=1000 symbols where 800 symbols are “ 0”s and 200 symbols are “ 1”’s. . Just to check, compare this number with the original formula (use logarithms and matlab functions) which approximates the total number of messages.

Entropy of a Source Now notice that, if we assign L bits/symbol, we can

Entropy of a Source Now notice that, if we assign L bits/symbol, we can make messages, since we have possible symbols Equate this with the expression of the entropy to obtain Therefore

Entropy of a Source Example. Consider the source Entropy: which yields

Entropy of a Source Example. Consider the source Entropy: which yields

Entropy of a Source Example. Consider a source with only one symbol The entropy:

Entropy of a Source Example. Consider a source with only one symbol The entropy: No information!!! Nothing is random, fully predictable.

Entropy of a Source Example. A source of symbols equally likely then If the

Entropy of a Source Example. A source of symbols equally likely then If the total number of symbols is already a power of 2 as then H=L the maximum entropy.

Entropy of a Source Example: Gray levels only, 8 bits/pixel. encode Histogram of Gray

Entropy of a Source Example: Gray levels only, 8 bits/pixel. encode Histogram of Gray Levels

Entropy of a Source Example: Gray levels and DCT, 8 bits/pxl. DCT 2 Histogram

Entropy of a Source Example: Gray levels and DCT, 8 bits/pxl. DCT 2 Histogram of DCT 2 encode

Huffman Code: how can we assign a binary code to every symbol. The Huffman

Huffman Code: how can we assign a binary code to every symbol. The Huffman code is constructed by a recursive argument. probabilities • Consider an alphabet of n symbols ordered with decreasing probabilities • select the two least likely symbols • combine them into a symbol with probability • form a new alphabet of length n-1 and rearrange the probabilities in the decreasing order. • assign the codewords

Huffman Code Example: consider the alphabet with probabilities 1 000 010 0 0 1

Huffman Code Example: consider the alphabet with probabilities 1 000 010 0 0 1 0 Length=2. 2 bits/symbol Entropy H=2. 15 bits/symbol 1 1 0 011 1