Coding and Entropy February 3, 2010 Harvard QR48 1

Coding and Entropy
February 3, 2010
Harvard QR48
1
Squeezing out the “Air”

Suppose you want to ship pillows in boxes and
are charged by the size of the box

Lossless data compression
Entropy = lower limit of compressibility

February 3, 2010
Harvard QR48
2
Claude Shannon (1916-2001)
A Mathematical Theory of Communication (1948)
February 3, 2010
Harvard QR48
3
Communication over a Channel
Source
Coded Bits
S
Received Bits
X
Y
Decoded Message
T
Channel
symbols
bits
bits
symbols
Encode bits before putting them in the channel
Decode bits when they come out of the channel
E.g. the transformation from S into X changes
“yea” --> 1
“nay” --> 0
Changing Y into T does the reverse
For now, assume no noise in the channel, i.e. X=Y
February 3, 2010
Harvard QR48
4
Example: Telegraphy
Source English letters -> Morse Code
Baltimore
D
-..
-..
Washington
-..
February 3, 2010
Harvard QR48
D
5
Low and High Information Content
Messages

The more frequent a message is, the less information it
conveys when it occurs
Two weather forecast messages:

Bos:

LA:
In LA “Sunny” is a low information message and “cloudy” is
a high information message


February 3, 2010
Harvard QR48
6
Harvard Grades
%
A
A-
B+
B
B-
C+
2005
24
25
21
13
6
2
1995
21
23
20
14
8
3
1986
14
19
21
17
10
5

Less information in Harvard grades now than in recent
past
February 3, 2010
Harvard QR48
7
Fixed Length Codes (Block Codes)





Example: 4 symbols, A, B, C, D
A=00, B=01, C=10, D=11
In general, with n symbols, codes need to be of
length lg n, rounded up
For English text, 26 letters + space = 27 symbols,
length = 5 since 24 < 27 < 25
(replace all punctuation marks by space)
AKA “block codes”
February 3, 2010
Harvard QR48
8
Modeling the Message Source
Source


Destination
Characteristics of the stream of messages
coming from the source affect the choice of
the coding method
We need a model for a source of English
text that can be described and analyzed
mathematically
February 3, 2010
Harvard QR48
9
How can we improve on block codes?




Simple 4-symbol example: A, B, C, D
If that is all we know, need 2 bits/symbol
What if we know symbol frequencies?
Use shorter codes for more frequent symbols


Morse Code does something like this
Example:
February 3, 2010
A
.7
B
.1
C
.1
D
.1
0
100
101
110
Harvard QR48
10
Prefix Codes
Only one way to decode left to right
February 3, 2010
A
.7
B
.1
C
.1
D
.1
0
100
101
110
Harvard QR48
11
Minimum Average Code Length?
Average bits per symbol:
.7·1+.1·3+.1·3+.1·3 = 1.6
bits/symbol (down from 2)
.7·1+.1·2+.1·3+.1·3 = 1.5
February 3, 2010
Harvard QR48
A
B
C
D
.7
.1
.1
.1
0
100
101
110
A
B
C
D
.7
.1
.1
.1
0
10
110
111
12
Entropy of this code <= 1.5
bits/symbol

Possibly lower? How low?
.7·1+.1·2+.1·3+.1·3 = 1.5
February 3, 2010
Harvard QR48
A
B
C
D
.7
.1
.1
.1
0
10
110
111
13
Self-Information

If a symbol S has frequency p, its selfinformation is H(S) = lg(1/p) = -lg p.
February 3, 2010
S
A
B
C
D
p
.25
.25
.25
.25
H(S)
2
2
2
2
p
.7
.1
.1
.1
H(S)
.51
3.32
3.32
3.32
Harvard QR48
14
First-Order Entropy of Source
= Average Self-Information
S
A
B
C
D
p
.25
.25
.25
.25
-lgp
2
2
2
2
-plgp
.5
.5
.5
.5
p
.7
.1
.1
.1
-lgp
.51
3.32
3.32
3.32
-plgp
.357
.332
.332
.332
February 3, 2010
Harvard QR48
-∑ plgp
2
1.353
15
Entropy, Compressibility,
Redundancy





Lower entropy  More redundant  More
compressible  Less information
Higher entropy  Less redundant  Less
compressible  More information
A source of “yea”s and “nay”s takes 24 bits per
symbol but contains at most one bit per symbol of
information
010110010100010101000001 = yea
010011100100000110101001 = nay
February 3, 2010
Harvard QR48
16
Entropy and
Compression





A
B
C
D
.7
.1
.1
.1
0
10
110
111
Average length for this code
=.7·1+.1·2+.1·3+.1·3 = 1.5
No code taking only symbol frequencies into
account can be better than first-order entropy
First-order Entropy of this source =
.7·lg(1/.7)+.1·lg(1/.1)+ .1·lg(1/.1)+.1·lg(1/.1) =
1.353
First-order Entropy of English is about 4
bits/character based on “typical” English texts
“Efficiency” of code = (entropy of
source)/(average code length) = 1.353/1.5 =
February 8, 2010
Harvard QR48
17
A Simple Prefix Code:
Huffman Codes



Suppose we know the symbol frequencies. We
can calculate the (first-order) entropy. Can we
design a code to match?
There is an algorithm that transforms a set of
symbol frequencies into a variable-length, prefix
code that achieves average code length
approximately equal to the entropy.
David Huffman, 1951
February 8, 2010
Harvard QR48
18
Huffman Code Example
A
B
C
D
E
.35
.05
.2
.15
.25
BD
.2
BCD
.4
AE
.6
ABCDE
February 8, 2010
1.0
Harvard QR48
19
Huffman Code Example
A
B
C
D
E
.35
.05
.2
.15
.25
0
BD
0
A
B
C
D
E
1
1
.2
1
0
BCD
.4
00
100
11
101
01
AE
.6
1
0
ABCDE
1.0
February 8, 2010
Harvard QR48
Entropy
2.12
Ave
length
2.20
20
Efficiency of Huffman Codes


Huffman codes are as efficient as possible if only
first-order information (symbol frequencies) is
taken into account.
Huffman code is always within 1 bit/symbol of the
entropy.
February 8, 2010
Harvard QR48
21
Second-Order Entropy



Second-Order Entropy of a source is
calculated by treating digrams as single
symbols according to their frequencies
Occurrences of q and u are not
independent so it is helpful to treat qu as
one
Second-order entropy of English is about
3.3 bits/character
February 8, 2010
Harvard QR48
22
How English Would Look Based
on frequencies alone
•
•
•
•
0: xfoml rxkhrjffjuj zlpwcfwkcyj
ffjeyvkcqsghyd qpaamkbzaacibzlhjqd
1: ocroh hli rgwr nmielwis eu ll
nbnesebya th eei alhenhttpa oobttva
2: On ie antsoutinys are t inctore st
be s deamy achin d ilonasive tucoowe at
3: IN NO IST LAT WHEY CRATICT FROURE
BIRS GROCID PONDENOME OF DEMONSTURES OF
THE REPTAGIN IS REGOACTIONA
February 8, 2010
Harvard QR48
23
How English Would Look Based
on word frequencies
•
•
1) REPRESENTING AND SPEEDILY IS AN
GOOD APT OR COME CAN DIFFERENT NATURAL
HERE HE THE A IN CAME THE TO OF TO
EXPERT GRAY COME TO FURNISHES THE LINE
MESSAGE HAD BE THESE
2) THE HEAD AND IN FRONTAL ATTACK ON
AN ENGLISH WRITER THAT THE CHARACTER OF
THIS POINT IS THEREFORE ANOTHER METHOD
FOR THE LETTERS THAT THE TIME OF WHO
EVER TOLD THE PROBLEM FOR AN UNEXPECTED
February 8, 2010
Harvard QR48
24
What is entropy of English?




Entropy is the “limit” of the information per
symbol using single symbols, digrams,
trigrams, …
Not really calculable because English is a
finite language!
Nonetheless it can be determined
experimentally using Shannon’s game
Answer: a little more than 1 bit/character
February 8, 2010
Harvard QR48
25
Shannon’s Remarkable 1948 paper
February 8, 2010
Harvard QR48
26
Shannon’s Source Coding
Theorem



No code can achieve efficiency greater
than 1, but
For any source, there are codes with
efficiency as close to 1 as desired.
The proof does not give a method to find
the best codes. It just sets a limit on how
good they can be.
February 8, 2010
Harvard QR48
27
Huffman coding used widely

Eg JPEGs use Huffman codes to for the
pixel-to-pixel changes in color values



Colors usually change gradually so there are many
small numbers, 0, 1, 2, in this sequence
JPEGs sometimes use a fancier
compression method called “arithmetic
coding”
Arithmetic coding produces 5% better
compression
February 8, 2010
Harvard QR48
28
Why don’t JPEGs use arithmetic
coding?

Because it is patented by IBM
United States Patent
4,905,297
Langdon, Jr. , et al.
February 27, 1990
Arithmetic coding encoder and decoder system
Abstract Apparatus and method for compressing and de-compressing binary decision data by arithmetic
coding and decoding wherein the estimated probability Qe of the less probable of the two decision events,
or outcomes, adapts as decisions are successively encoded. To facilitate coding computations, an augend
value A for the current number line interval is held to approximate …

What if Huffman had patented his code?
February 8, 2010
Harvard QR48
29