Homework 1

L. Vandenberghe
EE133A Fall 2014
Homework 1
Due: Wednesday 10/15/2014.
• Data files for homework problems can be found at
www.seas.ucla.edu/~vandenbe/ee133a.
• Homework is due at the start of the lecture (4:00PM) on the due date. Late homework
will not be accepted.
• You are allowed to discuss the homework with each other, but you must write up your
own answers (and for programming assignments, code your own programs) to hand in.
Reading assignment: Chapters 1 and 2 of the course reader
www.seas.ucla.edu/~vandenbe/133A/reader-133A.pdf.
Homework problems
1. Which of the following functions f : Rn → R are linear? Which are affine? If a
function is linear, give its inner product representation, i.e., an n-vector a such that
f (x) = aT x for all x. If it is affine, give a and b such that f (x) = aT x + b holds for
all x. If it is neither, give specific x, y, α, β for which superposition fails, i.e.,
f (αx + βy) 6= αf (x) + βf (y).
(If α + β = 1, this shows the function is neither linear nor affine. If α + β 6= 1 it shows
the function is not linear.)
(a) The spread of values of the vector, defined as f (x) = maxk xk − mink xk .
(b) The difference of the last element and the first, f (x) = xn − x1 .
(c) The difference of the squared distances to two fixed vectors c and d, defined as
f (x) = kx − ck2 − kx − dk2 .
(d) The median of an n-vector, defined as the middle value of the sorted vector, when
n is odd, and the average of the two middle values in the sorted vector, when n
is even.
2. The temperature T of an electronic device containing three processors is an affine
function of the power dissipated by the three processors, P = (P1 , P2 , P3 ). When all
three processors are idling, we have P = (10, 10, 10), which results in a temperature
T = 30. When the first processor operates at full power and the other two are idling,
we have P = (100, 10, 10), and the temperature rises to T = 60. When the second
processor operates at full power and the other two are idling, we have P = (10, 100, 10)
and T = 70. When the third processor operates at full power and the other two are
idling, we have P = (10, 10, 100) and T = 65. Now suppose that all three processors
are operated at the same power, i.e., P1 = P2 = P3 . How large can the power be, if we
require that T ≤ 85?
3. Cauchy-Schwarz inequality.
(a) Use the Cauchy-Schwarz inequality to prove that
n
1
1X
|xi | ≤ √ kxk
n i=1
n
for all vectors x ∈ Rn . In other words, the average absolute value of a vector
is less than or equal to the RMS value. What are the conditions on x to have
equality?
(b) Use the Cauchy-Schwarz inequality to prove that
n
1X
xk ≥
n k=1
n
1X
1
n k=1 xk
!−1
for all vectors x ∈ Rn with positive elements xk . The left-hand side of the
inequality is the arithmetic mean (average) of the numbers xk ; the right-hand
side is called the harmonic mean.
4. K-means clustering. The MNIST database of handwritten digits, available from
yann.lecun.com/exdb/mnist,
is a popular dataset for testing and comparing classification algorithms. In this problem
we apply the K-means clustering algorithm to it.
Download the file mnist.mat.zip from the course website, unzip it, and load it in
MATLAB or Octave using the command load mnist. This will create two variables:
a 784 × 60000 array digits and a 1 × 60000 array labels. Each column of digits is
a 28 × 28 grayscale image, stored as a vector of length 282 = 784 with entries between
0 and 1 (0 corresponds to white and 1 to black). The first 25 images are shown below.
To display these images we used the commands
X = reshape(digits(:, k), 28, 28);
imshow(1-X);
The first of these two commands converts column k of digits to a 28 × 28 array. The
second command displays the array as an image. We call imshow with argument 1 − X
because imshow interprets 0 as black and 1 as white.
The second array labels has length 60000 and entries 0–9. The kth element of labels
is the digit shown by column k in digits.
To reduce the size of problem we will only consider digits 0–4. We do this as follows.
I = find(labels < 5);
digits = digits(:, I);
labels = labels(I);
This should reduce the number of columns in digits to N = 30596.
The assignment is to implement the K-means algorithm (page 15 of lecture 2) and run
it on the N vectors xk stored as columns of digits. The number of clusters is K = 5.
• Run the K-means algorithm with a few randomly generated initial values for the
five representative vectors z1 , . . . , z5 . If the representatives are stored as columns
of a 784 × 5 matrix Z, you can generate initial values using the command
Z = rand(784, 5);
This randomly generates a 784 × 5 matrix Z with elements uniformly distributed
beteween 0 and 1.
You may notice that for some initial values of Z, the first step of the algorithm
produces one or more empty clusters. In that case, we terminate the algorithm
and generate a new set of initial representatives.
• We use the quantity
J=
N
1 X
min kxi − zj k2
N i=1 j=1,...,5
to evaluate the quality of the clustering. We terminate the algorithm when J is
equal (or nearly equal) in two successive iterations, or when a maximum number
of iterations (for example, 100) is reached.
• After running the K-means algorithm with a few different starting points, choose
the best clustering (with the smallest value of J) and display the representative
vectors of the five clusters.