Smoothing with penalized splines - A brief

Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Smoothing with penalized splines
A brief introduction and an illustrative application
Antonio Gasparrini
Department of Social and Environmental Health Research
London School of Hygiene and Tropical Medicine (LSHTM)
Centre for Statistical Methodology – LSHTM
29 May 2015
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Outline
1
The issue
2
Splines
3
A penalized approach
4
A comparison
5
An extension
6
Software
7
Some comments
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Outline
1
The issue
2
Splines
3
A penalized approach
4
A comparison
5
An extension
6
Software
7
Some comments
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Relationships between x and y
−5
0
y
5
10
15
Scatterplot of x and y
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Non-linearity
−5
0
y
5
10
15
Scatterplot of x and y
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Regression models
A relationship between a predictor x and an outcome y is usually
estimated through regression models, controlling for potential
confounders
In the simple linear case:
yi = α + f (xi ) +
P
X
γp zip
(1)
p=1
A number of alternative options are available for representing f (x),
describing the relationship as a smooth shape
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Smoothing methods
Parametric
Polynomials, fractional polynomials, regression splines
In between
Penalized splines
Non-parametric
Lowess, kernel, smoothing splines
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Outline
1
The issue
2
Splines
3
A penalized approach
4
A comparison
5
An extension
6
Software
7
Some comments
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Splines: basis representation
A spline is a numeric function composed by piecewise-connected
polynomial functions
The advantage of using regression splines is that f (x) can be
represented in a basis form:
f (xi ; β) =
d
X
βj bj (xi ) = xT β
(2)
j=1
where bj (x) are a series of d (known) basis transformations of x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Regression splines
This type of splines allow the use of standard estimation
methods, derived by minimizing the usual least square objective:
N
X

y − α − f (x; β) +
i=1
P
X
2
γp zp  = ||y − α − Xβ − Zγ||2 (3)
p=1
Several different transformations for bj (x) (e.g. B-splines, natural
splines), determining the mathematical properties
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Graphical representation - I
0.5
0.0
Splines
1.0
Knots and splines
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Graphical representation - II
Splines and coefficients
−0.49
1.09
2
4
4.45
13.29
6
8
4.26
0.5
0.0
Splines
1.0
1.01
0
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Graphical representation - III
5
−5
0
Terms
10
15
Sum of linear terms
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Graphical representation - IV
−5
0
y
5
10
15
Estimated relationship
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Graphical representation - V
−5
0
y
5
10
15
Estimated and true
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Problems and limitations
In regression splines, the smoothness of the fitted curve is
determined by:
the degree of the spline
the specific parameterization
the number of knots
the location of knots
No general selection method for number and position of knots
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Number of knots
15
Degree of smoothness
−5
0
y
5
10
df = 1
df = 3
df = 5
df = 7
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Knots location
15
Best fitting models
−5
0
y
5
10
5k at eq.−spaced val
8k at eq.−spaced per
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Outline
1
The issue
2
Splines
3
A penalized approach
4
A comparison
5
An extension
6
Software
7
Some comments
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Generalized additive models
A general framework of smoothing methods is offered by
generalized additive models (GAMs)
GAMs extends traditional GLMs by allowing the linear predictor to
depend linearly on unknown smooth functions. In the linear case:
yi = α + f (xi ) +
P
X
f (zip )
(4)
p=1
where f are traditionally represented by non-parametric terms such
as smoothing splines of lowess
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Penalty
The idea is to define a flexible function and control the smoothness
through a penalty term, usually on the second derivative
The objective in (3) is modified to:
N
X

y − α − f (x; β) +
i=1
P
X
2
Z
f (zip ) + λ
[f 00 (x)]2 dx
(5)
p=1
with λ as smoothing parameter
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Penalized splines
However, traditional GAMs are limited by complex and
computationally-heavy estimation methods
Penalized splines offer an flexible and efficient version of GAM,
based on low-rank basis transformations
The objective in (5) can be re-written in matrix terms as:
||y − α − Xβ − Zγ||2 + λβ T Sβ
(6)
where S is a penalty matrix
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Smoothers
Alternative smoothers available, differing by parameterization and
penalty:
Thin-plate splines
Cubic splines
P-splines
Random-effects
Markov random fields
Soap film smooths
...
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Graphical representation - I
Increasing knots and splines
Splines
1.0
0.5
0.0
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Graphical representation - II
0.5
0.39
0.26
1.70
4.52
9.25
7.42
0.79
0.11
1.07
3.25
7.62
9.43 2.19
1.20 0.11
0.54
2.20
5.88
10.07
4.82
0.0
Splines
1.0
Splines and coefficients
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Graphical representation - III
Sum of linear terms
15
Terms
10
5
0
−5
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Estimation
Estimation concerns coefficients of penalized and unpenalized
terms (α, β, γ) and smoothing parameters (λs)
For the former, a penalized iteratively reweighted least squares
(P-IRLS) scheme is used
Estimation of λs is integrated through either outer iteration or
performance iteration, using GCV, UBRE/AIC or REML
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Advantages
Relatively low-rank basis and simplified penalties
Completely parametric form
Number and location of knots not critical
Automatic smoothing selection
Efficient computational methods
Well-grounded theoretical framework
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Outline
1
The issue
2
Splines
3
A penalized approach
4
A comparison
5
An extension
6
Software
7
Some comments
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Comparison
Comparison of smoothing methods
Fractional polynomials
Regression splines
Penalized splines
y
10
5
0
0
2
4
6
8
10
x
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Simulations
Comparing alternative methods:
Fractional polynomials
Regression splines
Penalized splines
Different shapes:
Linear
Decay
Peak
Complex
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Simulated shapes
Decay
10
8
8
6
6
4
4
y
y
Linear
10
2
2
0
0
−2
−2
0
2
4
6
8
10
0
2
4
x
Peak
8
10
8
10
Complex
10
10
8
8
6
6
4
4
y
y
6
x
2
2
0
0
−2
−2
0
2
4
6
x
Gasparrini A
Smoothing with penalized splines
8
10
0
2
4
6
x
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Simulation results - I
Fractional polynomials
Decay
10
8
8
6
6
4
4
y
y
Linear
10
2
2
0
0
−2
−2
0
2
4
6
8
10
0
2
4
x
Peak
8
10
8
10
Complex
10
10
8
8
6
6
4
4
y
y
6
x
2
2
0
0
−2
−2
0
2
4
6
x
Gasparrini A
Smoothing with penalized splines
8
10
0
2
4
6
x
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Simulation results - II
Regression splines
Decay
10
8
8
6
6
4
4
y
y
Linear
10
2
2
0
0
−2
−2
0
2
4
6
8
10
0
2
4
x
Peak
8
10
8
10
Complex
10
10
8
8
6
6
4
4
y
y
6
x
2
2
0
0
−2
−2
0
2
4
6
x
Gasparrini A
Smoothing with penalized splines
8
10
0
2
4
6
x
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Simulation results - III
Penalized splines
Decay
10
8
8
6
6
4
4
y
y
Linear
10
2
2
0
0
−2
−2
0
2
4
6
8
10
0
2
4
x
Peak
8
10
8
10
Complex
10
10
8
8
6
6
4
4
y
y
6
x
2
2
0
0
−2
−2
0
2
4
6
x
Gasparrini A
Smoothing with penalized splines
8
10
0
2
4
6
x
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Simulation results - IV
Statistics
Linear
Linear
Decay
Decay
Peak
Peak
Complex
Comples
Gasparrini A
Smoothing with penalized splines
Fun
GLM
GAM
GLM
GAM
GLM
GAM
GLM
GAM
(e)df
2.21
1.28
3.45
2.61
4.82
4.06
6.65
5.87
Bias
0.01
0.00
0.05
0.13
0.05
0.19
0.11
0.31
Cov
0.90
0.95
0.87
0.94
0.89
0.94
0.87
0.91
RMSE
0.82
0.66
0.89
0.76
0.93
0.84
1.02
0.94
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Outline
1
The issue
2
Splines
3
A penalized approach
4
A comparison
5
An extension
6
Software
7
Some comments
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Distributed lag non-linear models
Statistical tools to model non-linear and lagged dependencies
Defined by a cross-basis function of lagged exposures:
s(xt−`0 , . . . , xt−` ) =
L
X
f ·w (xt−` , `)
(7)
`=`0
The function is composed of an exposure response function f (x)
and a lag-response function w (`)
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
An example
1.10
1.05
RR
0
1.00
5
0.95
La
g
10
20
15
15
Tem
p
10
erat
Gasparrini A
Smoothing with penalized splines
ure
5
(C)
0
20
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Tensor product basis
Parameterized by a special tensor product:
T
s(xt−`0 , . . . , xt−` ) = (1T
L−`0 +1 At )η = wt η
(8)
T
At = (1T
v` ⊗ Rt ) (C ⊗ 1vx )
(9)
with
where Rt and C are basis matrices for x and `, respectively
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Penalized DLNMs
Question: what about a penalized version of DLNMs?
Modify objective in (6) to:
T
||y−α−Wη −Zγ||2 +η T λx 1T
⊗
S
+λ
S
⊗
1
η (10)
x
`
`
v`
vx
with λx , λ` and Sx , S` as smoothing parameters and penalty
matrices for each dimension
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Simulated surfaces
Scenario 1
Scenario 2
Scenario 3
0.5
0.6
0.4
0.10
Outcome
0.4
0.3
Outcome
Outcome
0.05
0.2
0.2
0.1
0.00
0.0
0.0
10
10
0
8
10
6
x
4
2
30
0 40
Gasparrini A
Smoothing with penalized splines
20
g
La
10
0
8
10
6
x
4
2
30
0 40
20
g
La
0
8
10
6
x
4
2
30
20
g
La
0 40
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Simulation results
Scenario 1
Scenario 2
Scenario 3
20
Lag
30
40
0
10
30
40
GAMps
0
2
4
6
x
Gasparrini A
Smoothing with penalized splines
8
10
0
10
20
Lag
30
40
GAMps
Outcome
0.1
0.3
−0.1
−0.1
−0.10
Outcome
0.1
0.3
Outcome
0.05
0.20
GAMps
20
Lag
0.5
10
−0.1
Outcome
0.1
0.3
Outcome
0.1
0.3
−0.1
−0.10
0
GAMps
0.5
GAMps
Outcome
0.05
0.20
GAMps
0
2
4
6
x
8
10
0
2
4
6
8
10
x
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
An application
Lag−response for 29C
Overall cumulative exposure−response
2.0
RR
1.5
1.2
1.1
RR
1.2
2.5
1.3
3.0
Exposure−lag−response
1.0
RR
1.0
1.1
1.0
g
10
La
20
Tem 10
pera
ture
0.5
0.9
0
5
0.9
0
5
10
15
20
0
5
10
15
20
25
30
15
0
20
Gasparrini A
Smoothing with penalized splines
Lag
Temperature
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Outline
1
The issue
2
Splines
3
A penalized approach
4
A comparison
5
An extension
6
Software
7
Some comments
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
The R package mgcv
Collection of functions implementing GAMs with penalized splines
Written by Simon Wood, extensively documented
Example of code:
library(mgcv)
model <- gam(y ~ s(x,bs="ps") + z, data, family=gaussian,
method="REML")
The function s determines the spline transformations and penalties
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
The R package dlnm
Collection of functions implementing DLNMs
Example of code:
library(dlnm)
cb <- crossbasis(x,lag=c(10),
argvar=list(fun="bs",degree=2,knots=5,cen=0),
arglag=list(fun="ns",knots=c(3,6),int=F))
model <- glm(y ~ s(x,bs="ps") + z, data, family=gaussian)
pred <- crosspred(cb,model)
plot(pred,"3d",xlab="x",ylab="Lag",zlab="Effect")
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Embedding dlnm and mgcv
Example of code:
library(dlnm) ; library(mgcv)
cb <- crossbasis(x,lag=c(10), argvar=list(fun="ps"),
arglag=list(fun="ps"))
pen <- cbPen(cb)
model <- model <- gam(y ~ cb + z, data, family=gaussian,
method="REML",parapen=list(cb=list(pen)))
pred <- crosspred(cb,model)
plot(pred,"3d",xlab="x",ylab="Lag",zlab="Effect")
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Outline
1
The issue
2
Splines
3
A penalized approach
4
A comparison
5
An extension
6
Software
7
Some comments
Gasparrini A
Smoothing with penalized splines
LSHTM
Outline
The issue
Splines
Penalized
Comparison
An extension
Software
Comments
Some comments
Penalized splines combine the flexibility of non-parametric
methods with stability and simplicity of parametric smoothers
Based on theoretically-grounded and computationally-efficient
estimators
Well implemented in the package mgcv in R
Research still ongoing
Gasparrini A
Smoothing with penalized splines
LSHTM