L. Vandenberghe EE133A Fall 2014 Homework 1 Due: Wednesday 10/15/2014. • Data files for homework problems can be found at www.seas.ucla.edu/~vandenbe/ee133a. • Homework is due at the start of the lecture (4:00PM) on the due date. Late homework will not be accepted. • You are allowed to discuss the homework with each other, but you must write up your own answers (and for programming assignments, code your own programs) to hand in. Reading assignment: Chapters 1 and 2 of the course reader www.seas.ucla.edu/~vandenbe/133A/reader-133A.pdf. Homework problems 1. Which of the following functions f : Rn → R are linear? Which are affine? If a function is linear, give its inner product representation, i.e., an n-vector a such that f (x) = aT x for all x. If it is affine, give a and b such that f (x) = aT x + b holds for all x. If it is neither, give specific x, y, α, β for which superposition fails, i.e., f (αx + βy) 6= αf (x) + βf (y). (If α + β = 1, this shows the function is neither linear nor affine. If α + β 6= 1 it shows the function is not linear.) (a) The spread of values of the vector, defined as f (x) = maxk xk − mink xk . (b) The difference of the last element and the first, f (x) = xn − x1 . (c) The difference of the squared distances to two fixed vectors c and d, defined as f (x) = kx − ck2 − kx − dk2 . (d) The median of an n-vector, defined as the middle value of the sorted vector, when n is odd, and the average of the two middle values in the sorted vector, when n is even. 2. The temperature T of an electronic device containing three processors is an affine function of the power dissipated by the three processors, P = (P1 , P2 , P3 ). When all three processors are idling, we have P = (10, 10, 10), which results in a temperature T = 30. When the first processor operates at full power and the other two are idling, we have P = (100, 10, 10), and the temperature rises to T = 60. When the second processor operates at full power and the other two are idling, we have P = (10, 100, 10) and T = 70. When the third processor operates at full power and the other two are idling, we have P = (10, 10, 100) and T = 65. Now suppose that all three processors are operated at the same power, i.e., P1 = P2 = P3 . How large can the power be, if we require that T ≤ 85? 3. Cauchy-Schwarz inequality. (a) Use the Cauchy-Schwarz inequality to prove that n 1 1X |xi | ≤ √ kxk n i=1 n for all vectors x ∈ Rn . In other words, the average absolute value of a vector is less than or equal to the RMS value. What are the conditions on x to have equality? (b) Use the Cauchy-Schwarz inequality to prove that n 1X xk ≥ n k=1 n 1X 1 n k=1 xk !−1 for all vectors x ∈ Rn with positive elements xk . The left-hand side of the inequality is the arithmetic mean (average) of the numbers xk ; the right-hand side is called the harmonic mean. 4. K-means clustering. The MNIST database of handwritten digits, available from yann.lecun.com/exdb/mnist, is a popular dataset for testing and comparing classification algorithms. In this problem we apply the K-means clustering algorithm to it. Download the file mnist.mat.zip from the course website, unzip it, and load it in MATLAB or Octave using the command load mnist. This will create two variables: a 784 × 60000 array digits and a 1 × 60000 array labels. Each column of digits is a 28 × 28 grayscale image, stored as a vector of length 282 = 784 with entries between 0 and 1 (0 corresponds to white and 1 to black). The first 25 images are shown below. To display these images we used the commands X = reshape(digits(:, k), 28, 28); imshow(1-X); The first of these two commands converts column k of digits to a 28 × 28 array. The second command displays the array as an image. We call imshow with argument 1 − X because imshow interprets 0 as black and 1 as white. The second array labels has length 60000 and entries 0–9. The kth element of labels is the digit shown by column k in digits. To reduce the size of problem we will only consider digits 0–4. We do this as follows. I = find(labels < 5); digits = digits(:, I); labels = labels(I); This should reduce the number of columns in digits to N = 30596. The assignment is to implement the K-means algorithm (page 15 of lecture 2) and run it on the N vectors xk stored as columns of digits. The number of clusters is K = 5. • Run the K-means algorithm with a few randomly generated initial values for the five representative vectors z1 , . . . , z5 . If the representatives are stored as columns of a 784 × 5 matrix Z, you can generate initial values using the command Z = rand(784, 5); This randomly generates a 784 × 5 matrix Z with elements uniformly distributed beteween 0 and 1. You may notice that for some initial values of Z, the first step of the algorithm produces one or more empty clusters. In that case, we terminate the algorithm and generate a new set of initial representatives. • We use the quantity J= N 1 X min kxi − zj k2 N i=1 j=1,...,5 to evaluate the quality of the clustering. We terminate the algorithm when J is equal (or nearly equal) in two successive iterations, or when a maximum number of iterations (for example, 100) is reached. • After running the K-means algorithm with a few different starting points, choose the best clustering (with the smallest value of J) and display the representative vectors of the five clusters.