Signal Compression

Introduction to Signal Processing
Iasonas Kokkinos
Ecole Centrale Paris
Lecture 9
Linear Models and Signal Compression
=
^
x
=
+
µ
+
w1u1+w2u2+w3u3+w4u4+ …
1
Lecture 8: Mean Square Estimation
•  Optimality criterion:
Solution:
Or:
d = Rc
2
3
Optimal LTI Wiener Filter
Clean
Signal
Input
Output
Linear TI
Discrete-time
Filter
+
-
Desired
Response
+
Noise
• 
• 
• 
• 
Denoising:
Smoothing:
Prediction:
Deconvolution:
+
Estimation Error
Causal Filter
Non-causal Filter
(with, or without noise)
Optimal (Wiener) Filter
IIR Non-Causal Wiener Filters
System Function
Wiener-Hopf Equations:
Correlations:
4
Ergodic Processes
•  In theory: we model a family of functions with a
stochastic process
•  In practice: we only observe a single realization of this
process
•  Ergodicity: from a single member, we can determine the
properties of the whole family
5
Ergodic Processes
•  Ergodicity: estimate process statistics from a single realization
•  Mean- and covariance- ergodic process:
•  Optimal estimator can be determined from a single realization of the
process
6
Implications of MSE criterion
•  Optimal estimator:
•  Estimation error: orthogonal to inputs
7
8
Projection Theorem
Hilbert Space
:Subspace spanned by basis elements
•  Theorem: There is a unique vector
error norm.
•  The corresponding error vector
therefore
and
that minimizes the
is orthogonal to
Slide credits: P. Maragos
9
Approximation Problem
Hilbert Space
•  Problem: Approximate
with the linear combination
minimizes the energy
•  Solution: Normal Equations
that
Approximation Error (Pythagorean Theorem):
Slide credits: P. Maragos
Minimum Mean Squared Error Estimation
10
Hilbert Space
• 
:`Vectors’: Random Variables
•  Minimum Mean Squared Error:
•  Normal Equations:
Slide credits: P. Maragos
11
Least Squares Error Approximation
Hilbert Space
• 
:`Vectors’: Sequences with N elements
•  Least Squares : Deterministic Interpretation
•  Normal Equations
Slide credits: P. Maragos
A geometric interpretation
• 
12
Task: minimize L2 distance between target vector and point on the span of a basis
•  Projection theorem:
•  Orthogonal projection
•  Reconstruction Error
Slide credits: P. Maragos
13
Application: Linear Speech Modeling
14
Application: Linear Speech Modeling
Synthesis
Analysis
Speech
Signal
Model
Excitation
Model
Excitation
Speech
signal
Random Signals
¨ 
Speech signal:
¨ 
Scatter plots of successive samples
Application: Linear Predictive Model of Speech
•  Time-varying model of speech production
16
Problem determination
•  Simplified speech model:
•  Linear Predictor with coefficients
•  How can we find optimal predictor coefficients?
17
Linear Model Identification
18
•  Short time average prediction error:
Where:
•  Summation over a limited interval of the speech signal
•  At mimimum:
Autocorrelation method for LPC
•  Prediction error can be large on the boundaries
•  Hamming Windowing:
•  Squared error
19
20
Autocorrelation Method for LPC
• 
R: Toeplitz matrix
– 
• 
Levinson-Durbin algorithm
All poles inside unit circle (stable system)
Linear Prediction Analysis
21
Phoneme/IY/
Phoneme/AH/
LPC Spectrum
22
Analysis Order
23
LPC Vocoder
LPC Synthesizer
24
LPC Synthesizer
25
Signal Compression Problem
=
^
x
=
+
µ
+
w1u1+w2u2+w3u3+w4u4+ …
‘λακωνίζειν εστί φιλοσοφείν’
(laconic is philosophic)
Image representations
Canonical basis
26
Image representations
Harmonic basis (Fourier transform)
27
28
Appearance modelling for faces
•  When viewed as vectors of pixel values, face images are
extremely high-dimensional
–  100x100 image = 10,000 dimensions
•  Very few vectors correspond to valid face images
•  Can we model the subspace of faces with a few dimensions?
New subspace: `better’ coordinate system
29
New coordinates reflect the
distribution of the data.
Mean
Few coordinates suffice to
represent a high
dimensional vector
They can be viewed as
parameters of a model
1. Active Shape Models
29
30
PCA (Karhunen-Loeve transform)
Dimitris Manolakis et al., Statistical and Adaptive Signal Processing, ARTECH House
31
PCA (Karhunen-Loeve transform)
Dimitris Manolakis et al., Statistical and Adaptive Signal Processing, ARTECH House
32
PCA (Karhunen-Loeve transform)
Dimitris Manolakis et al., Statistical and Adaptive Signal Processing, ARTECH House
33
PCA (Karhunen-Loeve transform)
Dimitris Manolakis et al., Statistical and Adaptive Signal Processing, ARTECH House
34
Covariance matrix reminder
•  Covariance matrix:
•  Uncorrelated coordinates: diagonal covariance
Height, Income
Height, Weight
35
PCA: decorrelation of random variables
•  PCA: projection onto eigenvectors of covariance matrix
Dimensionality
reduction by using
only leading
eigenvectors
Grades in 60 courses -> Good in math, physics, computer science
35
36
PCA: step by step
1.  Compute the empirical mean and subtract it from the data, and
compute empirical covariance matrix
2.  Compute eigenvalue decomposition of the covariance matrix
resulting in
3.  Retain only the k eigenvectors with highest corresponding
eigenvalues.
1. Active Shape Models
37
Eigenfaces: Key idea
•  Assume that most face images lie on
a low-dimensional subspace determined by the first k (k<d)
directions of maximum variance
•  Use PCA to determine the vectors or “eigenfaces” u1,…uk that span
that subspace
•  Represent all face images in the dataset as linear combinations of
eigenfaces
•  Same idea as shapes
M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991
38
Eigenfaces example
•  Training images
•  x1,…,xN
39
How about natural images?
•  Training images
•  x1,…,xN
40
Eigenfaces example
Principal component (eigenvector) uk
µ + 3σkuk
µ – 3σkuk
41
Eigenfaces example
•  Face x in “face space” coordinates:
=
42
Eigenfaces example
•  Face x in “face space” coordinates:
=
•  Reconstruction:
=
^
x
=
+
µ
+
w1u1+w2u2+w3u3+w4u4+ …
43
Recognition with eigenfaces
•  Process labeled training images:
•  Find mean µ and covariance matrix Σ
•  Find k principal components (eigenvectors of Σ) u1,…uk
•  Project each training image xi onto subspace spanned by principal
components:
(wi1,…,wik) = (u1T(xi – µ), … , ukT(xi – µ))
•  Given novel image x:
•  Project onto subspace:
(w1,…,wk) = (u1T(x – µ), … , ukT(x – µ))
•  Check reconstruction error x – x to determine whether image is really a
face
^
•  Classify as closest training face in k-dimensional subspace
M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991
44
Can we do it for the ensemble of natural images?
50
100
150
200
50
250
100
300
150
350
200
400
250
50
300
100
450
500
50
100
150
200
350
250
300
350
400
450
150
500
200
400
250
450
300
500
50
100
150
350
200
250
300
350
100
150
400
450
500
400
450
500
50
200
250
300
350
400
450
500
PCA of natural image patches
45
2nd order statistics, translation invariance & Fourier
Proof: in the appendix
R Gray, Toeplitz and Circulant Matrices, A review
http://ee.stanford.edu/~gray/toeplitz.pdf
46
Power Spectrum & Autocorrelation
47
•  Definition of power spectrum of a WSS stochastic process x[n]:
•  Wiener-Khintchine-Einstein theorem:
•  The power spectrum of a WSS stochastic process equals the DTFT of its
autocorrelation
=> power spectrum captures all of the 2nd order statistical information
Power spectrum of natural images
48
49
2nd order generative model for images
PCA of natural images <-> Harmonic Basis
Fourier transform: change of basis (rotation)
PCA: in new coordinate system, variables are uncorrelated
Gaussian variables: uncorrelated= independent
Synthesis equation:
Coefficients
Basis
Fourier Synthesis Equation: use sinusoidals as signal basis
2D:
Image generation: sample Fourier coefficients & invert
50
51
Maybe we should do it for patches?
And also let’s do it with a real harmonic basis
50
100
150
200
50
250
100
300
150
350
200
400
250
50
300
100
450
500
50
100
150
200
350
250
300
350
400
450
150
500
200
400
250
450
300
500
50
100
150
350
200
250
300
350
100
150
400
450
500
400
450
500
50
200
250
300
350
400
450
500
http://en.wikipedia.org/wiki/Discrete_cosine_transform
1D Discrete Cosine Transform (DCT) matrix
1
⎡
⎢
2
⎢
2 ⎢ cos π
C=
2n
n ⎢
"
⎢
(n − 1)π
⎢
⎢⎣cos 2n
1
2
3π
cos
2n
"
(n − 1)3π
cos
2n
1
⎤
⎥
2
(2n − 1)π ⎥⎥
!
cos
2n
⎥
!
"
⎥
(n − 1)(2n − 1)π ⎥
! cos
⎥⎦
2n
!
⎡
⎢
⎢
⎢
2
C −1 = C T =
n ⎢⎢
⎢
⎢
⎣
1
2
1
2
"
1
2
cos
π
2n
3π
cos
2n
"
(2n − 1)π
cos
2n
(n − 1)π
⎤
⎥
2n
(n − 1)3π ⎥⎥
!
cos
2n
⎥
⎥
!
"
(n − 1)(2n − 1)π ⎥
! cos
⎥
2n
⎦
!
cos
DCT vs. DFT: better energy concentration
52
53
Image Compression
54
Image Compression
•  Gray-Scale Example:
•  Value Range 0 (black) --- 255 (white)
63 33 36 28 63 81
27 18 17 11 22 48
72 52 28 15 17 16
132 100 56 19 10 9
187 186 166 88 13 34
184 203 199 177 82 44
211 214 208 198 134 52
211 210 203 191 133 79
86 98
104 108
47 77
21 55
43 51
97 73
78 83
74 86
55
Image Compression
•  2D-DCT of matrix
-304 210
-327 -260
93 -84
89 33
-9 42
-5 15
10
3
12 30
104 -69 10
67 70 -10
-66 16 24
-19 -20 -26
18 27 -7
-10 17 32
-12 -1 2
0 -3 -3
Y
20
-15
-2
21
-17
-15
3
-6
-12 7
21 8
-5 9
-3 0
29 -7
-4 7
-2 -3
12 -1
Reminder: Power spectrum of natural images
Energy is typically concentrated at low frequencies
Highenergy
midenergy
Lowenergy
56
57
Zig-Zag Scan
•  Group low frequency coefficients in top of vector and high
frequency coefficients at the bottom
•  Maps 8 x 8 matrix to a 1 x 64 vector
8x8
...
1x64
57
58
Image Compression
•  2D-DCT of matrix
-304 210
-327 -260
93 -84
89 33
-9 42
-5 15
10
3
12 30
104 -69 10
67 70 -10
-66 16 24
-19 -20 -26
18 27 -7
-10 17 32
-12 -1 2
0 -3 -3
20
-15
-2
21
-17
-15
3
-6
-12 7
21 8
-5 9
-3 0
29 -7
-4 7
-2 -3
12 -1
59
Image Compression
•  Cut the least significant components
-304 210
-327 -260
93 -84
89 33
-9 42
-5 15
10
0
0
0
104 -69 10
67 70 -10
-66 16 24
-19 -20 0
18 0 0
0 0 0
0 0 0
0 0 0
20 -12 0
-15 0 0
0
0 0
0
0 0
0
0 0
0
0 0
0
0 0
0
0 0
60
Reconstructing the Image
•  New Matrix and Compressed Image
55 41 27 39 56 69 92 106
35 22 7 16 35 59 88 101
65 49 21 5 6 28 62 73
130 114 75 28 -7 -1 33 46
180 175 148 95 33 16 45 59
200 206 203 165 92 55 71 82
205 207 214 193 121 70 75 83
214 205 209 196 129 75 78 85
61
Can You Tell the Difference?
Original
Compressed
62
Image Compression
Original
Compressed
63
JPEG Modes
Progressive Mode: It allows a coarse version of an
image to be transmitted at a low rate, which is then
progressively improved over subsequent
transmissions.
– 
Spectral Selection : Send DC component and first few AC
coefficients first, then gradually some more ACs.
Spectral Selection:
First Scan:
Second Scan:
Third Scan:
Nth Scan:
.
.
Image Pixels
63
64
Principal Component Analysis (PCA)
•  Find a low-dimensional subspace to reconstruct high-dimensional data
•  Reconstruction on orthogonal basis
Approximation with K terms
65
Alternative compression scheme: vector quantization
http://www.data-compression.com/vqanim.shtml
66
K-Means algorithm
–  Coordinate descent on distortion cost:
F (m, c) =
N
X
i=1
|xi
cm(i) |2
–  Local minima (multiple initializations to find better solution)
Additive Image Patch Modeling
K
=
PCA
D
Clustering
67
68
Three Modeling Regimes
sparsity
PCA
Sparse Coding
(next lecture)
Clustering