Generalized Estimating Equations (GEE)

Generalized Estimating
Equations (GEE)
Siljander Ilona
GEE provides a regression approach for GLMs
when the responses are not independent
THE MODEL IS USED IN
u 
Longitudinal/ repeated measures studies
u 
Clustered/ multilevel studies
u 
Estimating Equation gives an approach to model the dataset parameters in
order to predict such as mean for instance.
u 
u 
It is based on maximum likelihood calculations which assumes mutual independent
observations.
GEE gives an approach on how to find “maximum likelihood estimators” for
datasets, where the measurements/observations are not mutually
independent.
Estimating equations for normal linear
model
u 
The density for the normal distribution
(
f y µ, σ 2
u 
)
"$ ( y − µ )2 &$
=
exp #−
'
2
2
$% 2σ $(
2πσ
1
The likelihood function
2&
"$ 1
$
y
−
µ
(
)
i
2
2
L (µ, σ y1,..., yn ) = ∏ exp #− ln ( 2πσ ) −
'=
2
2σ $(
$% 2
i=1
2 -&
"$ n *
$
y
−
µ
1
(
)
i
2
/'
exp #∑,− ln ( 2πσ ) −
2
2σ /.$(
$% i=1 ,+ 2
n
Estimating equations for normal linear
model (continued)
u 
Given a panel i ,the task is to model the expected value µi of the
measured quantity, with some of the individual characteristics, for example
biomarkers (vector xi ).
u 
The linear combination of the covariates is called the linear predictor
ηi = xi β
g (µi ) = ηi = xi β
u 
u 
Here the function g identifies whether the model is linear, log-linear etc.
This is naturally related to the range restriction of the model.
For instance if g ( x ) = ln ( x ) indicates that µ = exp (η ) > 0 which is natural if
the expected value is always expected positive.
Estimating equations for normal linear
model (continued)
u 
For the aforementioned modelling purposes we now subtitute
likelihood function:
µ = xi β in the
2&
"$ 1
$
y
−
x
β
(
)
i
i
2
2
L ( β, σ X, y1,..., yn ) = ∑ #− ln ( 2πσ ) −
'
2
2σ
$ 2
i=1 %
($
n
u 
By differentiation we obtain a sort of ”maximum likelihood estimate” for beta
and sigma. In this case, these are the estimating equations.
n
1$& ∂δ
4
(&
1
3%
6
= ∑ xij 2 ( yi − xi β ))
&* j=1,..., p 6
3&' ∂β j i=1 σ
3
6
2
n +
3 ∂δ
yi − xi β ) . 6
1
(
0 6
3 2 = −∑-- 2 +
4
0
2σ
32 ∂σ
i=1 , 2σ
/ 65
( p+1) x1
= [ 0 ]( p+1) x1
Estimating equations for GLM
u 
n
Let’s consider a more general case with i panels consisting of i
observations. Each parameter it is modeled with a regression model
g (µit ) = xit β . We can also replace the normal density with a more general
family of exponential distributions
µ
"$ y θ − b (θ )
&$
it it
it
exp #
− c ( yit , φ )'
a (φ )
$%
$(
u 
Here E ( yit ) = b' (θ it ) = µit , V yit = b'' θ it a φ
and c is a normalizing
constant to guarantee that the integral of the density = 1.
( )
( ) ( )
Estimating equations for GLM
(continued)
u 
With this model the estimating equation will be
0)+ n ni
3
-+
"
%
y
−
µ
∂
µ
5 = [0]
ψ ( β ) = 2*∑∑ it it $ ' x jit .
px1
21+, i=1 t=1 a (φ ) V (µit ) # ∂η &it +/
54
j=1,..., p px1
∂µ
u 
= 1 , but
Observe that previously, in the case of normal linear model we had
∂
η
in general it needs to be included in the formula.
u 
Using a matrix notation, the above can be rewritten as
/( n
2
! y − µ $,*
"
%
*
−1
∂
µ
4 = [0]
ψ ( β ) = 1)∑ x Tji D $ '!"V (µi )#$ ## i i &&px1
1*+ i=1
4
∂η &
a (φ ) %*.
#
"
0
j=1,..., p 3 px1
u 
where
u 
Here the identity matrix I corresponds to the assumption that the observations are
mutually independent.
1/2
1/2
V (µi ) = "$ D (V (µit )) Ι (ni xni ) D (V (µit )) %'
#
&ni xni
Generalized estimating equations
u 
Our aim is to model the expected value with mutually dependent
observations. Formally, we may do this by replacing the identity matrix with
the corresponding correllation matrix R(alpha), where alpha is a parameter
vector used to represent the correllation data:
1/2
1/2
V (µi ) = !# D (V (µit )) R (α )(n xn ) D (V (µit )) $&
"
%ni xni
i i
u 
Plugging this into the aforementioned estimating equations, we obtain the
generalized estimating equations:
(. n
*
" y − µ %20
"
%
0
−1
∂
µ
6 = [0]
ψ ( β ) = 5/∑ x Tji D $ '()V (µi )*+ $$ i i ''3
px1
501 i=1
6
∂η &
a (φ ) &04
#
#
)
j=1,..., p + px1
u 
While the derivation of the estimating equations, we explained, is not valid
anymore due to not having independent covariates, it turns out that the
equations can still be justified (Liang, Zeger 1986). By solving the equations
we find the coefficient matrix beta which models the expected value µit
via g (µit ) = xit β
Basic ideas of GEE
u 
Objective:
u 
u 
u 
Fit a model to repeated categorical responses, that is correlated
and clustered responses
Variables:
u 
A response variable Yit can be either continuos or categorical (i for
subjects and t for different time points)
u 
X = ( x1, x2 ,..., xk ) as a set of explanatory variables which can be
discrete, continous, or a combination. ( xi as in nixk matrix of
covariates)
Model
u 
As said before, its form is like GLM, but full specification of the
joint distribution is not required, and thus no likelihood function:
g (µi ) = xiT β
Basic ideas of GEE (continued)
u 
Random component:
u 
u 
Systematic component:
u 
u 
A linear predictor of any combination of continuous and discrete
variables
Link function:
u 
u 
Any distribution of the response that we can use for GLM, e.g.,
binomial, multinomial, normal, etc.
Can be any
g (µi ) , e.g., identity, log, logit, etc.
Covariance structure:
u 
Correlated data are modeled using the same link function and
linear predictor setup (systematic component) as in the
independence case, but the covariance structure of the correlated
responses must also be specified and modeled
Basic ideas of GEE (continued)
u 
Assumptions:
u 
The responses Y1,Y2 ,...,Ynare correlated or clustered, i.e., cases don’t
need to be independent.
u 
The homogeneity of variances does not need to be satisfied.
u 
Errors can be correlated.
u 
It uses quasi-likelihood estimation rather than maximum likelihood
estimation (MLE) or ordinary least squares (OLS) to estimate the
parameters, but at times these will coincide.
u 
Covariance specification. These are typically four or more correlation
structures that we assume apriori:
1. 
Independence: Correlation between time points is independent
2. 
Exchangable: All measurements on the same unit are equally correlated
3. 
Autoregressive order 1 (AR1): Correlation depends on time or distance
between measurements
4. 
Unstructured: No assumptions about the correlation
Basic ideas of GEE (continued)
u 
Parameter Estimation:
u 
GEE estimates of model parameters are valid even if the
covariance is mis-specified, because they depend on the first
moment, e.g. mean. However, if the correlation structure is misspecified, the standard errors are not good, and some adjustments
based on the data are needed to get more appropriate standard
errors.
Basic ideas of GEE (continued)
u 
Model Fit:
u 
We don’t test for the model of the GEE, because this is really an
estimating procedure; there is no likelihood function.
u 
We look at the empirical estimates of the standard errors and the
covariance.
u 
Compare the empirical estimates with the model-based estimates.
u 
Overdispersion
Basic ideas of GEE (continued)
u 
u 
Advantages:
u 
Computationally more simple than MLE for categorical data.
u 
Does not require multivariate distribution.
u 
Consistent estimation even with mis-specified correlation structure.
Limitations:
u 
There is no likelihood function since the GEE does not specify completely
the joint distribution.
u 
Likelihood-based methods are not available for testing fit, comparing
models, and conducting inferences about parameters.
u 
Empirical based standard errors underestimate the true ones, unless very
large sample size.
Choosing the best model
u 
GLM
u 
u 
AIC
GEE
u 
QIC
Example with R
u 
Data ”Epilepsy” from a clinical trial of 59 epileptics.
u 
For a baseline, patients were observed for 8 weeks and the number of
seizures recorded.
u 
The patients were then randomized to treatment by the drug Progabide (31
patients) or to the placebo group (28 patients).
u 
They were observed for four 2-week periods and the number of seizures were
recorded.
u 
Primary question: Does Progabide reduce the rate of seizures?
R code for the example
# load the 'faraway' package:
library(faraway)
# Loading the 'gee' package:
library(gee)
# loading the 'epilepsy' dataset:
data(epilepsy)
# attaching the 'epilepsy' dataset:
attach(epilepsy)
epilepsy[1:10,]
with(epilepsy,by(seizures/timeadj,list(treat,expind),mean))
# Plotting the square root of seizure counts against period
# treatment group = solid, placebo group = dashed
y <- matrix(epilepsy$seizures, nrow=5)
matplot(1:4, sqrt(y[-1,]),type='l', lty=epilepsy$treat[5*(1:59)]+1, xlab="Period",
ylab="Sqrt(Seizures)")
R code for the example
# Finding the weekly mean of the seizure count (removing the baseline period) for each column
my <- apply(y[-1,],2,mean)/2
# Plotting the square root of the weekly mean seizure count for the experimental period
# against the square root of the weekly mean seizure count for the baseline period
plot(sqrt(epilepsy$seizures[epilepsy$expind==0]/8), sqrt(my), pch=epilepsy$treat[5*(1:59)]+2,
xlab='Sqrt(Baseline seizures)', ylab='Sqrt(Experimental seizures)'); abline(0,1)
# treatment group = '+', placebo group = triangle
#Now we fit the GEE model.
seiz.gee <- gee(seizures ~ offset(log(timeadj)) + expind + treat + I(expind*treat), id,
family=poisson, corstr="AR-M", Mv=1, data = epilepsy, subset=(id!=49))
summary(seiz.gee)
# We fit a Poisson-type rate model with the offset term accounting for the
# number of weeks the seizures were counted for in each observation
# The code: corstr="AR-M", Mv=1
# indicates a first-order autoregressive [AR(1)] correlation structure
#The 49th observation is excluded because it has an extreme number of seizures:
plot(unique(id),tapply(seizures,id,sum),xlab='ID',ylab='Total seizures')
R code for the example
#p-values of the estimators
pvalue_intercept<-2*min(pnorm(8.2187244), 1-pnorm(8.2187244))
pvalue_intercept
pvalue_expind<-2*min(pnorm(1.3257816), 1-pnorm(1.3257816))
pvalue_expind
pvalue_treat<-2*min(pnorm(-0.4027256), 1-pnorm(-0.4027256))
pvalue_treat
pvalue_both<-2*min(pnorm(-2.2421009), 1-pnorm(-2.2421009))
pvalue_both
Example with R (continued)
Seizures per week in epilepsy patients
BASELINE
EXPERIMENT
PLACEBO
3.85
4.30
TREATMENT
3.96
3.98
References
u 
Hardin, James W. and Hilbe, Joseph M. Generalized estimating equations,
second edition. Chapman and Hall/CRC 2012.
u 
Introduction to Generalized Estimating Equations. The Pennsylvania state
University, Department of Statistics Online Programs. 2015. Available at:
https://onlinecourses.science.psu.edu/stat504/node/180
u 
Faraway, Julian J. Extending the linear model with R: generalized linear,
mixed effects and nonparametric regression models. CRC press, 2005.
Exercise
u 
The data is from a 1996 study (Gregoire, Kumar Everitt, Henderson and Studd)
on the efficacy of estrogen patches in treating postnatal depression.
u 
Women were randomly assigned to either a placebo control group (group=0,
n=27) or estrogen patch group (group=1, n=34).
u 
Prior to the first treatment all patients took the Edinburgh Postnatal
Depression Scale (EPDS). EPDS data was collected monthly for six months once
the treatment began. Higher scores on the EDPS are indicative of higher
levels of depression.
u 
Variables: id=identifying number; subj=subject number; group= (0=placebo,
1=treatment); visit=time of monthly visit; dep=depression score (higher scores
indicate higher level of depression); depd=dichotomous variable of depression
scale (<11=0, >11=1)
u 
Question: Use GEE to model and determine does estrogen patches reduce the
level of depression?