Fast approximate leave-one-out cross-validation for large sample sizes Rosa Meijer

Introduction
The approximation method
Results
Summary
Fast approximate
leave-one-out cross-validation
for large sample sizes
Rosa Meijer
Jelle Goeman
Department of Medical Statistics
Leiden University Medical Center
Validation in Statistics and Machine Learning
6 October 2010
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Outline
1
Introduction
2
The approximation method
3
Results
4
Summary
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Penalized regression
Ridge regression
βˆridge = argmax{l(β) − λ
X
βi2 }
i
Shrinks
Lasso regression
βˆridge = argmax{l(β) − λ
X
|βi |}
i
Shrinks and selects
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
The penalized package
On CRAN: R package penalized
Ridge
Lasso
Elastic net
Regression models
Linear regression
Logistic regression (GLM)
Cox Proportional Hazards model
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Choosing the value of λ
Between
λ too large: over-shrinkage
λ too small: overfit
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Choosing the value of λ
Between
λ too large: over-shrinkage
λ too small: overfit
How to optimize λ?
Leave-one-out cross-validation
K -fold cross-validation
Akaike’s information criterion
Generalized cross-validation
(.632+) bootstrap cross-validation
...
Fast approximate leave-one-out cross-validation for large sample sizes
lumc-logo
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Leave-one-out
Ingredients
Response y1 , . . . , yn
Predictor variables x1 , . . . , xn
ˆ λ(−i) not using xi and yi
Fitted models β
A loss function L. Assume continuity.
Leave-one-out loss
n
X
ˆ λ(−i) )
L(yi , xi , β
i=1
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Approximate leave-one-out
Leave-one-out loss
ˆ λ(−1) , . . . , β
ˆ λ(−n)
Requires calculation of β
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Approximate leave-one-out
Leave-one-out loss
ˆ λ(−1) , . . . , β
ˆ λ(−n)
Requires calculation of β
Time consuming
- when n is large
ˆ λ(−i) takes much time
- when each β
- double cross-validation
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Approximate leave-one-out
Leave-one-out loss
ˆ λ(−1) , . . . , β
ˆ λ(−n)
Requires calculation of β
Time consuming
- when n is large
ˆ λ(−i) takes much time
- when each β
- double cross-validation
Solution
ˆ λ(−i) based on β
ˆλ
approximate β
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Models
Assumption
−
∂2l
=D
∂η∂η 0
(diagonal)
with η = Xβ the linear predictor
Generalized linear models
Linear regression
Logistic regression
Cox proportional hazards (full likelihood)
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
General idea
0
ˆ
Taylor approximation of l(−i)
(β) at β = β
λ
0
0
ˆ λ ) + (β − β
ˆ λ )l 00 (β
ˆ λ ) + O (β − β
ˆ λ )2 .
l(−i)
(β) = l(−i)
(β
(−i)
0
ˆ λ(−i) gives:
solving l(−i)
(β) = 0 at β = β
−1
λ
0
ˆ λ(−i) = β
ˆ λ − l 00 (β
ˆ λ)
ˆ λ ) + O (β
ˆ (−i) − β
ˆ λ )2
β
l
(
β
(−i)
(−i)
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
General idea
0
ˆ
Taylor approximation of l(−i)
(β) at β = β
λ
0
0
ˆ λ ) + (β − β
ˆ λ )l 00 (β
ˆ λ ) + O (β − β
ˆ λ )2 .
l(−i)
(β) = l(−i)
(β
(−i)
0
ˆ λ(−i) gives:
solving l(−i)
(β) = 0 at β = β
−1
λ
0
ˆ λ(−i) = β
ˆ λ − l 00 (β
ˆ λ)
ˆ λ ) + O (β
ˆ (−i) − β
ˆ λ )2
β
l
(
β
(−i)
(−i)
still n inverses to be calculated
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Sherman-Morrison-Woodbury theorem
−1
B−1 uvT B−1
B + uvT
= B−1 −
,
1 + vT B−1 u
B nonsingular p × p matrix, u, v p-dimensional column vectors
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Sherman-Morrison-Woodbury theorem
−1
B−1 uvT B−1
B + uvT
= B−1 −
,
1 + vT B−1 u
B nonsingular p × p matrix, u, v p-dimensional column vectors
00 (β
ˆ λ ))−1 (in the ridge model)
Apply to (l(−i)
XT
(−i) D(−i) X(−i) + λIp
−1
−1
= XT DX + λIp − dii xi xT
i
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Final formula ridge
ˆ λ(−i)
β
with
−1
T
xi ∆i
X
DX
+
λI
p
ˆ −
,
=β
1 − vii
λ
−1
1
1
XT D 2
V = D 2 X XT DX + λIp
ˆλ
D and ∆ (residuals) based on value β
ˆ λ(−i) ’s with just 1 inverse calculation and
all approximate β
some matrix multiplications!
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Final formula ridge
ˆ λ(−i)
β
with
−1
T
xi ∆i
X
DX
+
λI
p
ˆ −
,
=β
1 − vii
λ
−1
1
1
XT D 2
V = D 2 X XT DX + λIp
ˆλ
D and ∆ (residuals) based on value β
ˆ λ(−i) ’s with just 1 inverse calculation and
all approximate β
some matrix multiplications!
Reparamaterization
Dimension covariate space can be reduced from p to n
Fast approximate leave-one-out cross-validation for large sample sizes
lumc-logo
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Models
Linear model
Approximation = exact
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Models
Linear model
Approximation = exact
Cox proportional hazards
Use full likelihood, not partial likelihood
Baseline hazard not cross-validated
Trick possible: add intercept term
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Final formula lasso
ˆ λ(−i)
β
with
−1
T
xi ∆i
X
DX
ˆ −
,
=β
1 − vii
λ
−1
1
1
V = D 2 X XT DX
XT D 2
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Final formula lasso
ˆ λ(−i)
β
with
−1
T
xi ∆i
X
DX
ˆ −
,
=β
1 − vii
λ
−1
1
1
V = D 2 X XT DX
XT D 2
ˆ λk ≈ β
ˆ λ(−i) we know:
locally, if β
k
ˆ λk = 0
if β
⇒
ˆ λ(−i) = 0
β
k
Refinements possible
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
To what extent is this approximation useful?
Are the approximated values comparable to the real values?
ˆ λ(−i) ) ≈ cvl(approximated β
ˆ λ(−i) )?
- cvl(real β
Would we find approximately the same values of λ?
- do we find approximately the same maximum of the cvl when
ˆ λ(−i) ’s?
using the approximated β
How much worse are the models?
- do we find approximately the same cvl at the maximum found?
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
The dataset used
Breast cancer data of the Netherlands Cancer Institute
Paper by Van ’t Veer et al. (Nature, 2002)
Followed up by Van de Vijver et al. (NEJM, 2002)
295 breast cancer patients
effective dimension 79, due to censoring
Microarray (Agilent): 4,919 genes preselected (Rosetta
technology)
Response of interest
survival time (up to 18 years follow-up)
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Ridge Regression
cvpl
−480
−460
−440
−420
−400
cvpl
appr cvpl
train cvpl
0
2000
4000
6000
8000
lambda
Fast approximate leave-one-out cross-validation for large sample sizes
10000
lumc-logo
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Ridge Regression: in more detail
−480
cvpl
appr cvpl
appr cvpl: lambda= 438.2634,
−485
cvpl
cvpl= -475.8422
real cvpl:
lambda= 458.5212,
−490
cvpl= -476.2204
0
2000
4000
6000
8000
lambda
Fast approximate leave-one-out cross-validation for large sample sizes
10000
lumc-logo
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
cvpl
appr cvpl
train cvpl
cvpl
−1000
−900
−800
−700
−600
−500
−400
−300
Lasso Regression
5
10
15
lambda
Fast approximate leave-one-out cross-validation for large sample sizes
20
lumc-logo
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Lasso Regression: in more detail
−500
−480
cvpl
appr cvpl
appr cvpl:
lambda= 7.60564,
cvpl
cvpl= -477.3704
−520
real cvpl:
lambda= 7.70299,
−540
cvpl= -479.4855
5
10
15
lambda
Fast approximate leave-one-out cross-validation for large sample sizes
20
lumc-logo
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
−400
Wang breast cancer data: ridge
−600
−800
−700
cvpl
−500
cvpl
appr cvpl
appr int cvpl
train cvpl
0
10000
20000
30000
40000
lambda
Fast approximate leave-one-out cross-validation for large sample sizes
50000
lumc-logo
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
−660
Wang breast cancer data: ridge
cvpl
−680
−675
−670
−665
cvpl
appr cvpl
appr int cvpl
0
10000
20000
30000
40000
lambda
Fast approximate leave-one-out cross-validation for large sample sizes
50000
lumc-logo
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
cvpl
appr cvpl
appr int cvpl
train cvpl
cvpl
−700
−680
−660
−640
−620
−600
−580
Wang breast cancer data: lasso
30
40
50
60
70
lambda
Fast approximate leave-one-out cross-validation for large sample sizes
lumc-logo
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Wang breast cancer data: lasso zoomed
cvpl
−700
−695
−690
−685
−680
cvpl
appr cvpl
appr cvpl intercept
30
40
50
60
70
lambda
Fast approximate leave-one-out cross-validation for large sample sizes
lumc-logo
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Efficiency
Time needed to calculate cvl for specific value of λ, lasso
λ = 7.70
real cvpl: 49.00 seconds
appr cvpl: 6.09 seconds
approximately 8 times as fast
Time needed to calculate cvl for specific value of λ, ridge
λ = 458.5
real cvpl: 389.27 seconds
appr cvpl: 17.40 seconds
more than 20 times as fast!
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Some additional comments
Are these results representative of different datasets?
What aspects of a dataset determine the performance of the
approximation method?
Back to the theory:
λ
ˆ (−i) − β
ˆ λ )2
O (β
Error diminishes when:
n gets larger
λ gets larger
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
In short...
Approximate LOOCV
gives reasonable approximate of λ in penalization methods
reasonable outcomes of approximated cvl: comparisons
between models possible
works great for ridge; less stable for lasso
Can be used to find ”neighborhood” of optimal λ
Best for large values of n
best possible approximations
most time saved
double LOOCV
Fast approximate leave-one-out cross-validation for large sample sizes
lumc-logo
Rosa Meijer, Jelle Goeman
Introduction
The approximation method
Results
Summary
Questions?
lumc-logo
Fast approximate leave-one-out cross-validation for large sample sizes
Rosa Meijer, Jelle Goeman