CS331: Machine Learning Prof. Dr. Volker Roth [email protected] FS 2015 Aleksander Wieczorek [email protected] Dept. of Mathematics and Computer Science Spiegelgasse 1 4051 Basel Date: Monday, April 13th 2015 Exercise 8: Bias-variance decomposition of mean squared error Suppose that data (xi , yi ) are observed, where yi = f (xi ) + i i = 1, . . . , n and • xi = (xi1 , . . . , xip ) ∈ Rp • yi ∈ R • f : Rp → R • i error with Ei = 0, V ar[i ] = σ, Cov[i , j ] = 0 for i 6= j. Assume that function fˆ constructed based on data (xi , yi ) is used to approximate unknown f . The mean squared error (MSE) of fˆ measures how well fˆ approximates f (i.e. predicts y given new x): M SE(fˆ(x)) = E[(fˆ(x) − f (x))2 ] For a new data point x, MSE can be shown to depend on the bias and variance of fˆ(x): M SE(fˆ(x)) = Bias(fˆ(x))2 + V ar(fˆ(x)) where Bias(fˆ(x)) = E[fˆ(x)] − f (x) V ar(fˆ(x)) = E[(fˆ(x) − E[fˆ(x)])2 ] Prove the above result. What does it mean? What is its interpretation in the context of regression / regularized (ridge) regression? Exercise 9: Hoeffding’s inequality Consider independent random variables X1 , . . . , Xn which are bounded i.e. P Xi takes values in [ai , bi ] with probability 1, i = 1, . . . , n. Then, for any t > 0, the sum Sn = n i=1 Xi fulfils the following inequality: −2t2 . (1) P (|Sn − ESn | ≥ t) ≤ 2 exp P (bi − ai )2 Exercise Give a proof of Hoeffding’s Inequality. This can be done in several steps: 1 CS331: Machine Learning FS 2015 1. Show that for independent rv X1 , . . . , Xn and any s > 0 we have: P (Sn − ESn ≥ t) ≤ e−st n Y Ees(Xi −EXi ) . (2) i=1 (a) Multiply both sides by s and take the exp . (b) Use Markov’s inequality: for a positive rv X we have that P (X ≥ t) ≤ EX . t (c) Use the independence of X1 , . . . , Xn . 2. Show that for a rv X with EX = 0 if X takes values in [a, b] with probability 1 then for any s > 0: 2 2 EesX ≤ es (b−a) /8 (3) (a) Since exp is a convex function esX ≤ Cesb + Desa , for some C and D to determine. (b) Take the expectation EesX . (c) Take the Taylor serie expansion of log(EesX ). 3. Combining (2) and (3) we obtain: P (Sn − ESn ≥ t) ≤ inf s>0 2 −st e n Y i=1 s2 (bi −ai )2 /8 e !