Meinshausen et al. (2009)

p -Values for High-Dimensional Regression
Author(s): Nicolai Meinshausen, Lukas Meier and Peter Bühlmann
Source: Journal of the American Statistical Association, Vol. 104, No. 488 (December 2009), pp.
1671-1681
Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association
Stable URL: http://www.jstor.org/stable/40592371
Accessed: 30-03-2015 20:49 UTC
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact [email protected].
Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve and extend
access to Journal of the American Statistical Association.
http://www.jstor.org
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions
p-Values forHigh-DimensionalRegression
NicolaiMeinshausen,LukasMeier, and PeterBühlmann
cannotguard
efficient
selectionalgorithms
Most computationally
is challenging.
in high-dimensional
regression
Assigningsignificance
and
arenotavailable.An exceptionis a recentproposalbyWasserman
valid/7-values
againstinclusionofnoisevariables.Asymptotically
ofvariablesis thenreducedto a manageablesize usingthefirst
Roederthatsplitsthedataintotwoparts.The number
split,whileclassical
error
can be appliedto theremaining
variableselectiontechniques
variables,usingthedatafromthesecondsplit.This yieldsasymptotic
Resultsaresensitive
to thisarbitrary
randomsplitofthedata,however.
Thisinvolvesa one-time
controlunderminimalconditions.
choice,
acrossmultiple
randomsplits
results.Herewe showthatinference
toreproduce
andmakesitdifficult
whichamountstoa "p-valuelottery"
controlovertheinclusionof noisevariables.We showthattheresulting
whilemaintaining
can be aggregated
p-valuescan be
asymptotic
is shownto improvepowerwhile
rate.In addition,
theproposedaggregation
wiseerrorandfalsediscovery
usedforcontrolofbothfamilyoffalselyselectedvariablessubstantially.
thenumber
reducing
enorrate;High-dimensional
variableselection;Multiplecomparisons.
False discovery
KEY WORDS: Data splitting;
rate;Family-wise
discussthe
This articleis organizedas follows.We briefly
method
of
Wasserman
and
Roeder
(2009) in Secsingle-split
on theartion2, notingthattheresultscan dependstrongly
of
a
random
We
choice
bitrary
samplesplit. proposea multisplitmethod,whicheliminatesthisdependence.In Section3
we proveFWER and FDR controlof the multisplit
method,
thatforsimulatedand
and in Section4 we shownumerically
thanthesingle-split
realdatasets,themethodis morepowerful
versionwhilesignificantly
reducingthenumberof falsedisoftheproposed
coveries.We outlinesomepossibleextensions
in
Section
5.
methodology
1. INTRODUCTION
variableselectionhas reThe problemof high-dimensional
in thelastdecade.Sparseestimaceivedtremendous
attention
torslike theLasso (Tibshirani1996) and extensionsthereof
(Zou 2006; Meinshausen2007) have been shownto be very
data
becausetheyare suitableforhigh-dimensional
powerful
results.
setsandbecausetheylead to sparse,interpretable
variableselecforhigh-dimensional
In theusual workflow
totheir
tionproblems,
theusersetspotential
tuning
parameters
estimator
as
prediction
optimalvaluesand uses theresulting
thefinalresult.In theclassicallow-dimensional
setup,someeris a widelyused standardin all
rorcontrolbased on /7-values
2. SAMPLE SPLITTING AND HIGH-DIMENSIONAL
are notavailablein highareas of sciences.So far,/7-values
VARIABLESELECTION
dimensional
situations,
exceptfortheproposalof Wasserman
andRoeder(2009). An ad hoc solutionforassigningrelevance
linearregression
We considerthe usual high-dimensional
oftheselectedpre- setupwitha responsevectorY = (Y' , . . . , Yn) and an n x p
toanalyzethestability
is tousethebootstrap
dictors
andfocusonthoseselectedmostoften(orevenalways). fixeddesignmatrix
X suchthat
Bach (2008) and Meinshausenand Bühlmann(2008) showed
modelselectionprothatfortheLasso,thisleadstoa consistent
case.
thanforthenonbootstrap
cedureunderfewerrestrictions
wheree = (e',...,en) is a randomerrorvectorwithei iid
erhasbeenmadeinobtaining
someprogress
Morerecently,
vector.Extensionsto
and ß e W is theparameter
and Af(Q,a2)
andBühlmann
rorcontrol
2008; Wasserman
(Meinshausen
othermodelsaregivenin Section5.
Roeder2009). Here we buildon theapproachof Wasserman
Denoteby
oftheir"screen
andRoeder(2009) andshowthatan extension
leads to a morepowerfulvariableseand clean" algorithm
s = {/;##()}
errorrate(FWER)
lectionprocedure.Moreover,family-wise
and similarly
whereas thesetof activepredictors,
and false discoveryrate (FDR) can be controlled,
byN = Sc = {/;ßj =
for
Wasserman
and Roeder(2009) focusedon variableselection 0} thesetofnoisevariables.Ourgoal is to assign/?-values
=
We also ex- thenullhypotheses
via /7-values.
ratherthanassigningsignificance
infer
versus
and
to
O
Hoj '-ßj
Haj '-ßj^O
to controlof the false discoveryrate thesetS froma setofn observations
tendthe methodology
(X,-,Y¿),i - 1,. . . , n. We
data. allow forpotentially
(Benjaminiand Hochberg1995) forhigh-dimensional
designs,thatis,p^ n.
high-dimensional
inference
An approach
Althoughthemainapplicationof our procedureis forhigh- Thismakesstatistical
verychallenging.
dimensional
data,wherethenumber
and Roeder(2009) is to splitthedata
p of variablescan greatly proposedbyWasserman
exceedsamplesize n, we showthatthemethodalso is quite intotwoparts,reducing
thedimensionality
ofpredictors
on one
errorcontrolforn > p settings, partto a manageablenumberofpredictors
withmorestandard
the
competitive
(keeping imporindeedoftenproviding
betterdetection
powerin thepresence tantvariableswithhighprobability),
and thenassign/7-values
ofhighlycorrelated
variables.
andmakea finalselectionon thesecondpartofthedata,using
classicalleastsquaresestimation.
NicolaiMeinshausen
is University
of Statistics,
UniLecturer,
Department
stats.ox.
versityof Oxford,OxfordOX1 3TG, U.K. (E-mail:meinshausen®
Semiac.uk). Lukas Meier is Ph.D. Student,PeterBühlmannis Professor,
narfürStatistik,
ETH Zurich,8092 Zurich,Switzerland.
NicolaiMeinshausen
shownduringhis stayat
thegeneroussupportand hospitality
acknowledges
fürMathematik
atETH Zürich.
Forschungsinstitut
© 2009 American Statistical Association
Journal of the American Statistical Association
December 2009, Vol. 104, No. 488, Theory and Methods
DOI: 10.1 198/jasa.2009.tm08647
1671
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions
Journalofthe AmericanStatisticalAssociation,December 2009
1672
2.1 Family-Wise ErrorRate Control With
the Single-Split Method
¥b) .
thesetofactivepredictors,
2. Usingonlyü'b) , estimate
3. (a) UsingonlyD(b¿,fittheselectedvariablesin S(b) with
leastsquaresand calculatethecorrespondordinary
ing/7-values,
Pf' forj e S(b).
to 1,thatis,
(b) Set theremaining
/7-values
to
andRoeder(2009) attempts
ofWasserman
Theprocedure
as theprobrate(FWER), defined
error
thefamily-wise
control
The methodrelies
abilityofmakingatleastonefalserejection.
variableselectionand dimenon samplesplitting,
performing
P{b)= h
jt&b'
on onepartofthedataandclassicalsignifireduction
sionality
into
are
on theotherpart.The data splitrandomly
cancetesting
as
4. Definetheadjusted(nonaggregated)
/7-values
twodisjointgroups,Din= (X,-n,
Y;„) and Dout= (XoutiYout),
(2.1)
¿=l,...,p.
ofequal size. Let S be a variableselectionor screening
procePJ*)=min(^)|S<»|>l)>
notation
active
thesetof
durethatestimates
Abusing
predictors.
overtheB /7-values
Then Finally,aggregate
Pjb' as discussedlater.
we also denotebyS thesetofselectedpredictors.
slightly,
foreachpredicB
of
to
a
total
leads
This
/7-values
procedure
is basedonD/n;
reduction
variableselectionanddimensionality
=
statistics
suitable
out
that
It
will
turn
...
tor
summary
j 1, ,/7.
thatis, we applyS onlyon D[n.This includestheselectionof
define
For
€
are
quantiles. y (0, 1)
involvedin S. The idea is to break
tuningparameters
potential
to
a
variables
of
the
down largenumber,
p, potential
predictor
(2.2)
Qj(Y) = nan{l,qy({P¡b)/Y'b=l,...,B})},
of n, while
k <&p, withk at mosta fraction
smallernumber,
function.
and whereqY{-) is the(empirical)y-quantile
coefficients
variables.The regression
keepingall relevant
=
.
.
. ,/7is thengivenby
for
each
A
1,
j
the
selected
of
predictor
.
.
.
/7-value
thecorresponding
Pi , ,Pp,
predic/7-values,
leastsquares Q/(y),foranyfixed0 < y < 1. In Section3 we showthatthis
basedon Doutusingordinary
torsaredetermined
correct
/7-value,
adjustedformultiplicity.
on thesetS and settingPj = 1 forall j £ S. If the is an asymptotically
estimation
Errorcontrolis not
difficult.
be
y
may
selecting
^
Properly
selectedmodelS containsthetruemodelS (i.e., S S), then
of y. We propose
value
for
the
best
if
search
we
guaranteed
basedon Doutareunbiased.Finally,each/7-value,
the/7-values
a suitablevalue
selects
that
version
an
instead
use
to
adaptive
of
Pj, is adjustedby a factor|S| to correctforthemultiplicity
Let
G
the
data.
based
on
of
the
(0,
1) be a lower
ymin
quantile
thetesting
problem.
and
define
bound
for
0.05,
y,
typically
The selectedmodelis givenby all variablesin S forwhich
a e (0, 1),
is belowa cutoff,
theadjusted/7-value
(2.3)
Py= min(l,(l-log)/min) inf Qj(y)'.
= {jeS:Pj'S'<a}.
Ssingie
ensuresthatthe
The extracorrection
factor,1 - log/min,
Undersuitableassumptions
(discussedlater),thisyieldsasymp- FWER remainscontrolledat level a
of the adaptive
despite
toticcontrol
againstinclusionofvariablesinN (falsepositives) searchforthebest
For
the
recommended
Sec.
3).
(see
quantile
inthesensethat
= 0.05,thisfactor
is upper-bounded
choiceof/min
by4; infact,
> 1] <a,
1 - log(0.05) « 3.996.
limsup¥['ND Ssingie'
betweentheproposedadontherelation
Wecomment
briefly
thatis, controloftheFWER. The methodis easyto implement justment
totheFDR (BenjaminiandHochberg1995;Benjamim
The andYekutieli2001) or FWER (Holm 1979) controlling
underweakassumptions.
control
andyieldstheasymptotic
proceand
into
on
an
relies
method
Dout, dures.Whilewe providea familyandas such
arbitrary
split D[n
wiseerrorcontrol
single-split
if thissplitis use union-bound
and theresultscan changedrastically
however,
as donebyHolm(1979),thedeficorrections
becausethen nitionofthe
is
This
in
itself
chosendifferently.
unsatisfactory,
(2.3) anditsgraphical
representaadjusted/7-value
theresultsarenotreproducible.
oftheFDR procedure,
tioninFigure1 arevaguelyreminiscent
of
ifandonlyiftheempiricaldistribution
hypotheses
rejecting
2.2 Family-Wise ErrorRate Control With
distribThe
linear
bound.
a
certain
crosses
empirical
/7-values
the New MultisplitMethod
utionin (2.3) is takenforonlyone predictor
variable,though,
to a singlearbitrary
An obviousalternative
toa multiple-testing
splitis to divide whichis eitherin S orN. Thiscorresponds
Foreach split,we end up witha setof situation
thesamplerepeatedly.
withmula singlehypothesis
in whichwe aretesting
theresultsis notobvi- tiplestatistics.
Howtocombineandaggregate
/7-values.
Figure1 showsan example.Panel (a) presents
Herewe describea possibleapproach.Foreach a
ous,however.
=
oftheadjusted/7-values,
Pjb), forb 1, . . . ,B, of
is obtainedforrandom histogram
of /7-values
distribution
a
hypothesis,
in
in
real
data
theselectedvariable the
example Section4.3.
We proposethaterrorcontrolcan be basedon The
samplesplitting.
is
method equivalentto pickingone of these
single-split
We showempirically
thequantilesofthisdistribution.
that,posand selectingthevariableif thisrandomly
randomly
is morepowerful /7-values
theresulting
procedure
siblyunsurprisingly,
lotsmall.To avoidthis"/7-value
is
chosen/7-value sufficiently
methodalso makes
method.The multisplit
thanthesingle-split
distribution
the
method
the
computes empirical
of tery," multisplit
ifthenumber
atleastapproximately
theresultsreproducible,
= 1, . . . ,B, and rejectsthenullhyb
for
of all /7-values,
Pjb'
randomsplitsis chosentobe verylarge.
pothesisHo :ßj = 0 (thusselectingvariablej and includingit
methodusesthefollowing
The multisplit
procedure:
crossesthebrointothemodel) if theempiricaldistribution
Forfc=l,...,£:
of thelatteris as
ken line in Figurel(b). A shortderivation
1. Randomly
splittheoriginaldataintotwodisjointgroups, follows.Variablej is selectedifand onlyifPj <a, whichoccursif and onlyif thereexistssome y e (0.05, 1) suchthat
D{£ andD{b¿,ofequal size.
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions
1673
Meinshausen,Meier,and Bühlmann:p-Values forHigh-DimensionalRegression
(b)
(a)
ä
i
°-
'
*
a s - // /
. f/
o J fctlh
■ !■-
I
I
I
I
I
I
I
0.0
0.2
0.4
0.6
0.8
1.0
J I 1 I M 1 .- r-T~l l~~1l~T~r~l
q
o
I
I
I
I
0.000
0.005
0.010
0.015
ADJUSTED P-VALUE
ADJUSTED P-VALUES
in themotif
variable
dataexampleofSection4.3. The
ofadjusted
regression
p-values,P- ' fortheselected
Figure1. (a) A histogram
andrejects
ifitis belowa. Forthemultisplit
wereject
method
method,
randomly
picksoneofthesep-values(a "p-valuelottery")
single-split
crossesthebroken
line[which
function
oftheadjusted
ifandonlyiftheempirical
distribution
(3.996/a)p}]for
p-values
is/(p) = max{0.05,
is
theboundis indeedexceeded,
andthevariable
linefora = 0.05in(b). Forthisexample,
as a broken
somep G(0,1). Thisboundis shown
thusselected.
usingdefQj(y) < oi/(I log0.05) « a/3.996. Equivalently,
oftheadjustedp-values,qy(Pj ),
inition(2.2), they-quantile
mustbe smallerthanor equal to ay/3.996. This in turnis
of
wheretheempiricaldistribution
to thesituation
equivalent
=
above
.
.
is
for
b
.
theadjustedp-values,Pjb'
1, , B, crossing
thebound/(p) = max{0.05,(3.996/a)p} forsomep e (0, 1).
Thisboundis shownas a brokenlinein Figurel(b).
Theresulting
adjustedp-values,Pj,j = 1, . . . ,p, can thenbe
used forbothFWER and FDR control.For FWER controlat
levela e (0, 1), simplyall p-valuesbelowa are rejected,and
theselectedsubsetis
The originalFDR controlling
procedureof Benjaminiand
first
orders
the
observedp-valuesas P(') <
Hochberg(1995)
P(2) <-"<
P(p) and defines
k = max]i : P(/)< -q ' .
(2.6)
P J
I
withthesmallestkvalItthenrejectsall variablesorhypotheses
ues, withno rejectionmade if theset in (2.6) is empty.FDR
is controlledin thisway at level q underthe conditionthat
all p-valuesare independent.
Benjaminiand Yekutieli(2001)
is conservative
undera widerrange
showedthatthisprocedure
ofdependencies
betweenp-values(see BlanchardandRoquain
= {j:Pj<<x}.
(2.4) 2008 forrelatedwork).A greatleap of faithwould be reSmuiti
foroursetting
of highquiredto assumeanysuchassumption
In Section3.2 we showthatindeed,asymptotically,
P(V > 0) < dimensionalregression,
however.For generaldependencies,
HN' is thenumber
offalselyselectedvaria, whereV = 'Smuiti
BenjaminiandYekutieli(2001) showedthatcontrolis guaranablesundertheproposedselection(2.4). Besidesbetterrepro- teedat level
q £?=1 r1 «^(1/2 + log(p)).
themulti- The standard
wise errorcontrol,
andasymptotic
familyducibility
FDR procedureis to work with the raw
morepowerful
thanthe
splitversionis, maybeunsurprisingly,
on
which
are
assumedto be uniformly
distributed
p-values,
selectionmethod.
single-split
in
is
for
true
null
The
division
[0, 1]
hypotheses.
byp (2.6) an
for
But
the
effective
correction
multiplicity.
proposedmultisplit
2.3 False Discovery Rate Control With
methodproducesalreadyadjustedp-values,as in(2.3). Because
the MultisplitMethod
we are alreadyworkingwithmultiplicity-corrected
p-values,
Controlof theFWER oftenis consideredtoo conservative. thedivisionbyp in (2.6) turnsoutto be superfluous.
Instead,
If manyrejections
are made,Benjaminiand Hochberg(1995) we canorderthecorrected
p-values,Pj,j = 1,. . . ,p, inincreasof false ing order,P(') < P@) £ • • • £ P(p)->an(*selecttheh variables
theexpectedproportion
proposedinsteadcontrolling
=
fi
Let
be
the
number
of false withthesmallestp-values,where
the
FDR.
V
'S N'
rejections
=
and
let
R
be
thetofor
a
selection
method
S
'S'
rejections
h = max{/: P(0 < iq).
(2.7)
tal numberof rejections.
The FDR is definedas theexpected
The set of variablesselectedis denoted,withthevalue of h
offalserejections,
proportion
givenin (2.7), by
(2.5)
E(ß), withß = V/max{l,/?}.
= {j'Pj<P(h)},
(2.8)
Smulti;FDR
R = 0, thedenominator
Forno rejections,
ensuresthatthefalse
= 0> if P(i) > iq for all / =
withthedefinition
of withno rejections,SmuitiFDR
Q, is 0, conforming
discovery
proportion,
1, ...,/?.
BenjaminiandHochberg(1995).
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions
Journalofthe AmericanStatisticalAssociation,December 2009
1674
The proofis givenin theAppendix.
The procedure(2.8) will achieve FDR controlat level
**
FDR
at
level
To
control
value of thequani~l
Theorem3.1 is validforanypredefined
4q,
tf(!/2
log/?).
get
qY?i='
tile
the
we replaceq in (2.7) by q/(J^=l i~l), completely
However,
y.
adjustedp-values,Qj(y), involvethesomeanalogous
of y9whichcouldpossiblypose a probhow
choice
FDR procedure
of
underarbitrary
to thestandard
arbitrary
dependence
lem
for
In
of
and
Yekutieli
the
next
sec(2001).
practicalapplications.Thus we proposetheadjusted
the/7-valuesBenjamini
we
demonstrate
we
error
control.
tion, prove
Later, empirically
p-values,Pj, thatsearchfortheoptimalvalueof y adaptively.
theadvantages
oftheproposedmultisplit
versionoverboththe
Theorem3.2. Assumethat(Al) and (A2) apply.Let a e
FDR controlling
and standard
procedures,
providsingle-split
1). If thenullhypothesis
(0,
Hoj :ßj = 0 is rejectedwhenever
results.
ingnumerical
controlled
at levela,
Pj < a, thentheFWER is asymptotically
thatis,
ANDCONSISTENCY
3. ERRORCONTROL
limsupP min/*,
J-< a J' -< a,
„_*o/ ijeN
3.1 Assumptions
andRoeder
To achieveasymptotic
errorcontrol,
Wasserman
P is as in Theorem3.1.
aboutthecrucialrequirementswheretheprobability
(2009) madea fewassumptions
forthevariableselectionprocedure
S:
The proofis givenin theAppendix.
A briefremarkregarding
the asymptotic
natureof there=
1.
lim^-^ooP[S I2 S]
(Al) Screening
property:
in
order.
The
controlrelies
sults
seems
to
be
error
proposed
(A2) Sparsity
'S' <n/2.
property:
on all trulyimportant
variablesbeingselectedin thescreening
variables
The screening
(Al) ensuresthatall relevant
property
This is our screening
propstagewithveryhighprobability.
are retained.Irrelevant
noise variablesare allowedto be seerty(Al). LetA be theeventS ç S. Theresultsfortheexample
lectedas well,as longas thereare nottoo many,as required in Theorem3.2 can be formulated
in a nonasymptotic
way as
(A2). A violationofthesparsity
property
bythesparsity
prop- ¥[A fi
<
<
->
and
a,
1,
P(A)
a}]
typically
expo{minjeN
Pj
ertywouldmakeit impossibleto applyclassicaltestson the
to Theofast,forn -» oo. Analogousremarks
nentially
apply
retained
variables.
rems3.1 and3.3.
The Lasso (Tibshirani1996) is an important
examplethat
satisfies
conditions
discussed 3.3 False Discovery Rate Control
(Al) and (A2) underappropriate
and
Bühlmann
Zhao
Yu
Meinshausen
and
(2006),
(2006),
by
The adjusted/7-values
can be used forFDR control,as laid
vande Geer(2008), Meinshausen
and Yu (2009), and Bickel,
out
in
The
Section
2.3.
set
ofselectedvariables,
was
Smuiti;FDR,
Lasso
and
The
Ritov,
(Zou 2006;
Tsybakov(2009).
adaptive
in
show
FDR
defined
Here
we
that
is
indeed
controlled
at
(2.8).
and
also
satisfies
and
under
suit(Al)
(A2)
Zhang Huang2008)
able conditions.
Otherexamplesinclude,assumingappropriate thedesiredratewiththisprocedure.
conditions,
2001; Bühlmann2006), or¿2 boosting(Friedman
Theorem3.3. Assumethat(Al) and (A2) apply.Let q > 0
and
Gilbert2007), and sure and
thogonalmatching
pursuit(Tropp
be the set of selectedvariables,as definedin
Smuiti;FDR
(Fan andLv 2008).
independence
screening
with
a
cutoff
value of q = q/Y%=ii~l in (2.7). Let
(2.8),
We typicallyuse the Lasso (and extensionsthereof)as
H N' and R = 'Smulti;FDR'.
The FDR (2.5) with
V = 'Smuiti,FDR
are possible as well.
a screeningmethod.Otheralgorithms
= y/max{l,R] is thenasymptotically
controlled
at levelq,
Q
Wasserman
andRoeder(2009) studiedvariousscenariosunder
thatis,
aresatisfied
fortheLasso, dependwhichthesetwoproperties
on
the
choice
of
the
We refrain
regularization
ing
parameter.
limsupE(ß) <q.
n-+oo
fromrepeating
theseandsimilararguments,
andoperateon the
that
that
we
have
a
selection
satisfies
S,
in
The
is
procedure,
assumption
proof given theAppendix.
andthesparsity
boththescreening
As withFWER control,we could use, foranyfixedvalue
property
property.
of y, thevaluesQj(y),j = 1,...,/? insteadofPj, j = 1, . . . , n.
3.2 Family-Wise ErrorRate Control
We refrainfromgivingthe fulldetailshere,because in our
We proposetwoversionsof multiplicity-adjusted
theforegoing
p-values: experience,
adaptiveversionworksreliablyand
in
of
as
defined
which
relies
on
a
choice
e
an
a
does
not
(0,
1),
(2.2),
y
Qj(y),
require priorichoiceofthequantiley thatis necand the adaptiveversionPj definedin (2.3), whichmakes essaryotherwise.
an adaptivechoice of y. We show thatbothquantitiesare
FWER er- 3.4 Model Selection Consistency
/7-values
providing
asymptotic
multiplicity-adjusted
rorcontrol.
If we let level a = an ->-0 forn ->►oo, thentheprobabila noisevariablevanishesbecauseofthe
Theorem
3.1. Assumethat(Al) and (A2) apply.Let a, y e ityoffalselyincluding
=
modelseTo
of consistent
results.
whenIf
the
null
0
preceding
gettheproperty
(0, 1).
getsrejected
hypothesis
Hojißj
behaviorofthepower.
controlled
at lection,we mustanalyzetheasymptotic
everQj(y) < a, thentheFWER is asymptotically
fromthesingle-split
is inherited
It turnsoutthatthisproperty
level«, thatis,
method.
limsupP minQj(y) <a' <ct,
be theselectedmodelofthesingle3.1. LetSsingle
Corollary
whereP is withrespectto thedata sampleand thestatement splitmethod.Assumethatan ->-0 can be chosenforn -> oo
holdsforanyoftheB randomsamplesplits.
at a ratesuch thatlim^^ooP^^/^ = S] = 1. Then,forany
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions
Meinshausen,Meier,and Bühlmann:p-Values forHigh-DimensionalRegression
1675
methodis also modelselection- setto values 1, . . . , |5|. The errorvariancea2 is adjustedsuch
[see (2.3)], themultisplit
Xmin
= {/'€ thatthesignal-to-noise
at a desired
ratio(SNR) is maintained
fora suitablesequencean; thatis, forSmuiti
consistent
for
50 simulations
level at each simulationrun.We perform
S; Pj < an}, itholdsthat
each setting.
lim F[Smuiti= S] = l.
on
is donesuchthatthemodelis trained
The sample-splitting
- 1)/2J,and the/7-values
are calculated
a
data
set
of
size
l(n
thatenandRoeder(2009) discussedconditions
Wasserman
unbalancedschemepredataset.Thisslightly
= S] = 1 forvariousvariableselec- on theremaining
surethatlim^oo F[Ssingle
wherethefullmodelmightbe selectedon the
cludes
situations
variablesesuchas theLasso or someforward
tionmethods,
wouldnotbe possibleon
dataset.Calculationof/7-values
first
lectionscheme.
We use a totalofB = 50
in
a
situation.
data
such
the
remaining
true.The mulofCorollary
3.1 is notnecessarily
Thereverse
run.
for
each
simulation
FollowingWasserman
methodis samplesplits
if the single-split
tisplitmethodcan be consistent
usforall procedures
Roeder(2009),we compute/7-values
and
of thesingle-split
not.A necessaryconditionforconsistency
simiare
The
results
a
normal
qualitatively
approximation.
methodis limsup,^^ P[PJ*}< a] = 1 forally e S, wherethe ing
instead.
larwhenusinga t distribution
is withrespectto boththe data and the random
probability
We comparetheaveragenumberof truepositivesand the
that
thereis a positiveprobability
becauseotherwise
split-point,
methodsforthethree
and multisplit
FWER forthesingle-split
variablej will notbe selectedwiththesingle-split
approach. simulation
of 0.25, 1, 4, and 16
SNRs
(A)-(C),
using
settings
Forthemultisplit
method,on theotherhand,we need onlya
R2
values
of
to
0.2, 0.5, 0.8, and
(corresponding population
from
boundon quantilesof P. overb = l,...,B. We refrain
The numberof relevantvariables,'S', is
0.94, respectively).
goingintomoredetailhereand insteadshow,withnumerical either5 or 10. As the initialvariableselectionor screening
than method,S, we use three
methodis indeedmorepowerful
thatthemultisplit
results,
approaches,all based on theLasso
Bonferroni
cor- (Tibshirani
that
the
We
also
remark
thesingle-split
analog.
usesthe
The
first
denotedbySfixed,
1996).
approach,
theraw/7-values
rectionin (2.1), multiplying
by thenumber, Lasso and selectsthose'n/6' variablesthatappearmostoften
using in theregularization
'S^', of selectedvariables,possiblycouldbe improved
pathwhenvaryingthepenaltyparameter.
ideasofHothorn,
Bretz,andWestfall
(2008),further
increasing The constantnumberof 'n/6' variablesis chosen,somewhat
thepoweroftheprocedure.
to ensurea reasonablylargeset of selectedcoeffiarbitrarily,
cientson theone handand on theotherhand,to ensurethat
4. NUMERICAL RESULTS
leastsquaresestimation
will workreasonablywell on thesecofthe ond halfof thedatawithsamplesize 'n/2' . Whilethechoice
In thissectionwe comparetheempiricalperformance
andrealdatasets.Simulated seemsto workwell in practiceand can be implemented
on simulated
different
estimators
very
evaluationof themodelselectionprop- easilyand efficiently,
dataallowa thorough
it is stillslightly
arbitrary.
Avoidingany
thatwe can findsignals suchchoicesof non-data-adaptive
erties.The realdatasetdemonstrates
thesectuningparameters,
in datawithourproposedmethodthatwouldnotbe pickedup ondmethod,
chosen
uses
the
Lasso
with
Scv,
penaltyparameter
=
value
of
a
0.05
We
use
a
default
method.
the
the
variables
whose
corre10-fold
by single-split
cross-validation,
by
selecting
estimated
coefficients
are
different
than
0.
everywhere.
sponding
regression
The thirdmethod,Sadap,is theadaptiveLasso of Zou (2006),
4.1 Simulations
inwhichregularization
arechosenbasedon 10-fold
parameters
with
the
Lasso
solution
used as theinitialesWe use thefollowing
simulation
cross-validation,
settings:
timator
fortheadaptiveLasso. The selectedvariablesareagain
datasetwithn = 100,p = 100,anda Toeplitz those
(A) Simulated
whosecorresponding
estimated
are
regression
parameters
multivariate
normatrix
design
comingfroma centered
different
than0.
mal distribution
withcovariancepV~k' betweenvariandmulFigures2 and3 showresultsforboththesingle-split
ablesj andk,withp = 0.5.
=
methods
with
the
default
0.05.
tisplit
settingym{n
Usingthe
(B) As in (A), butwithn = 100 and/7= 1000.
the
of
true
number
method, average
multisplit
positives(i.e.,the
(C) Real datasetwithn = 71 andp = 4088 forthedesign variablesinS whichare
is
increased,
selected)typically slightly
matrix
X andartificial
responseY.
whilethe FWER (i.e., theprobability
of includingvariables
methodoftenhas a
in in N) is reducedsharply.The single-split
The data setin (C) is fromgeneexpressionmeasurements
=
variablesare log- FWER abovethelevela = 0.05 at whichit is asymptotically
Bacillus subtilis.The p 4088 predictor
whileforthemultisplit
theFWER is above
transformed
andthereis a responsemeasur- controlled,
method,
geneexpressions,
in
the
nominal
level
a
few
scenarios.
The
conrateofriboflavin
in B. suboftheproduction
only
asymptotic
ingthelogarithm
trol
seems
to
a
in
control
with
Prodtilis.The datawerekindlyprovidedbyDSM Nutritional
give good
finite-sample
settings
Because thetruevariablesare notknown, themultisplit
on
ucts,Switzerland.
method,
possiblyapartfromthemethodSfixed
we considera linearmodelwithdesignmatrixfromreal data theveryhigh-dimensional
dataset(C). The single-split
method,
and simulatea sparseparameter
vectorß as follows.In each in contrast,
selectstoo manynoisevariables,exceedingthedevectorß is createdby either siredFWER sometimes
simulation
innearlyall settings.
This
run,a newparameter
substantially,
"uniform"
or"varying-strength"
sam- suggeststhattheasymptotic
Underuniform
errorcontrolseemsto workbetter
sampling.
chosencomponents
of ß are setto 1, and forfinitesamplesizes forthemultisplit
method.Even though
pling,|5| randomly
theremaining
are setto 0. Undervarying- themultisplit
methodis moreconservative
thanthesingle-split
p - 'S' components
of ß are method(havinga substantially
chosencomponents
lowerFWER), thenumberof
strength
sampling,'S' randomly
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions
Journalofthe AmericanStatisticalAssociation,December 2009
1676
:
I
;Q
-i
s- s|
0.00
i
- i-Hll^s___
I W"M-
s
'
S
g
0.00
0.02
0.04
0.06
0.08
P( FALSE POSITIVES
0.10
0.12
0.14
> 0)
| I x:--s^
¡
m- i'"""^Vg-... f
s "--s J--
0.10
0.05
0.00
!
-M
> 0)
"¡'"^
';■■■"---..
0.00
0.20
0.15
P( FALSE POSITIVES
> 0)
P( FALSE POSITIVES
Vk.
'
0.15
M T*""'^vs.
i"~
^"^^^ q
;
o-|m- s- g.-;-----s
|
0.10
0.05
»-
2
''...Q
0.15
0.10
0.05
Q
*
P( FALSE POSITIVES
m-...;i s
0.00
w ---.
;
> 0)
"s
¡Is;
^'
o-JM-M^s-t^
0.15
0.10
0.05
P( FALSE POSITIVES
s
._
°HVl-~^:
a
"s ""M^El^T-s
-M^sr-
0.00
:
I
*
-M^d.3";-"s
°Hm
;^'^
M-..
o
:
2
°°-
a
_q
:
S1VL
0.05
> 0)
0.10
0.20
0.15
P( FALSE POSITIVES
> 0)
wiseerror
resultsforsetting
(A) in thetopand(B) in thebottomrow.Averagenumberoftruepositivesvs. thefamilyFigure2. Simulation
at a = 0.05 forboth
version('M'). FWER is controlled
rate(FWER) forthesinglesplitmethod('S') againstthemulti-split
(asymptotically)
of
Resultsofa uniquesetting
andthisvalueis indicated
methods
bya brokenverticalline.Fromlefttorightareresultsfor5/^, Scvand5fl£/fl/7.
and
broken
otherwise.
'uniform'
follow
the
if
the
coefficients
which
is
solid
a
are
and
Increasing
sampling
SNR, sparsity design joinedby line,
SNR is indicated
symbolsize.
byincreasing
truediscoveriesoftenis increased.We notethatfordata (C),
oftrue
withp = 4088,andingeneralforlow SNRs,thenumber
familypositivesis low,becausewe controltheverystringent
levelof a = 0.05. As an
at a significance
wise errorcriterion
errormeasuresis posless conservative
alternative,
controlling
sible,as discussedin Section5.
:„
■a
^
a •
s |
gJM-^^§f-s-s
0.00
0.05
0.10
0.15
0.20
P( FALSE POSITIVES
0.25
> 0)
0.30
0.35
selectorwiththeadaptive
Here we comparethemultisplit
Lasso (Zou 2006). We haveused theadaptiveLasso as a varimethod.Usuable selectionmethodin ourproposedmultisplit
A
few
choicesmust
itself.
is
used
Lasso
the
by
ally, adaptive
s^
!
«•
M. M- * .
4.2 Comparisons Withthe AdaptiveLasso
^^
:
ka '
0.05
0.10
0.15
AA
--_ o q
-
1 °jiMm^s^=s*-
gjMHiá^i^Pi^
0.00
!
I o
0.20
P( FALSE POSITIVES
0.25
0.30
0.0
> 0)
setup(C).
Figure3. Resultsofsimulation
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions
0.1
0.2
P( FALSE POSITIVES
0.3
> 0)
s
1677
Meinshausen,Meier,and Bühlmann:p-Values forHigh-DimensionalRegression
ofthemultisplit
madeusingtheadaptive
Lassoanda CV
Table1. Comparison
method
withCV-Lassoselection,
Sadapiandtheselection
=
=
for
a
with
n
100
and
200
choiceoftheinvolved
p
penalty
parameters setting
E(truepositives)
Uniform
sampling
NO
NO
NO
NO
YES
YES
YES
YES
NO
NO
NO
NO
YES
YES
YES
YES
'S'
10
10
10
10
10
10
10
10
5
5
5
5
5
5
5
5
SNR
0.25
1
4
16
0.25
1
4
16
0.25
1
4
16
0.25
1
4
16
Lasso
Multisplit Adaptive
0.00
2.30
0.58
6.32
4.14
8.30
7.20
9.42
0.02
2.52
0.10
7.46
2.14
9.96
9.92
10.00
0.06
1.94
1.50
3.86
3.52
4.58
4.40
4.98
2.22
0.02
0.82
4.64
5.00
4.90
5.00
5.00
E(falsepositives)
Lasso
Multisplit Adaptive
0
9.78
0
20.00
0
25.58
0.02
30.10
0
10.30
0.02
21.70
0
28.46
0.04
30.66
0
11.58
0.02
19.86
0.02
23.56
0
27.26
0
12.16
0.02
22.18
0
24.48
0
28.06
> 0)
P(falsepositives
Lasso
Multisplit Adaptive
0
0.76
1
0
0
1
0.02
1
0
0.72
0.02
1
1
0
1
0.04
0
0.84
1
0.02
1
0.02
0
1
0
0.8
0.02
1
0
1
0
1
be madewhenusingtheadaptiveLasso; we makethe same errorcontrolwillchannelmoreof theavailableresourcesinto
choicesas previously.
The initialestimator
is obtainedas the experiments
morelikelytobe successful.
Lasso solutionwitha 10-foldcross-validation
(CV) choiceof
thepenaltyparameter.
The adaptiveLasso penaltyis also ob- 4.3 MotifRegression
tainedby 10-foldCV.
We applythemultisplit
methodto a real data setrelatedto
consistency
Despitedesirableasymptotic
properties
(Huang, motifregression
(Conlonet al. 2003). For a totalof n = 287
Ma, andZhang2008), theadaptiveLasso does notoffererror DNA segments,
we have thebindingintensity
of a proteinto
in thesamewayas Theorem3.1 does forthemultisplit eachofthe
control
Theseareourresponsevalues,Y',...,Yn.
segments.
method.In fact,theFWER (i.e., theprobability
of selecting
Moreover,forp = 195 candidatewords("motifs"),we have
at leastone noise variable)is veryclose to 1 withtheadapthatmeasurehow well theythmotifis represented
scores,jc//,
tiveLasso in all of thesimulations
thatwe haveseen.In con- in the/th
DNA sequence.The motifsaretypically
5- to 15-bpmethodoffersasymptotic
trast,our multisplit
control,which
forthetruebindingsiteoftheprotein.
Thehope
longcandidates
was verywell matchedby theempiricalFWER in thevicin- is thatthetrue
siteis includedin thelistof significant
binding
resultsforthe
ityofa = 0.05. Table 1 comparesthesimulation
variableswiththestrongest
betweenmotifscore
relationship
method
and
the
Lasso
multisplit
usingSadap
adaptive
by itself and
a linearmodelwithSadap,themulbinding
intensity.
Using
fora simulation
settingwithn = 100,p - 200, and thesame
methodidentifies
onepredictor
variableatthe5% signifas
in
The adaptiveLasso se- tisplit
(A) and (B) otherwise.
settings
icancelevel.In contrast,
the
methodcannotidentify
single-split
lectsroughly
20 noisevariables(outofp = 200 variables),
even
a singlesignificant
In view of theasymptotic
error
predictor.
thenumber
oftruly
relevant
variablesisjust5 or 10.The
though
controlandtheempiricalresultsin Section4, thereis substanoffalsepositivesis atmost0.04 andoftensimaveragenumber
tial evidenceindicating
thattheselectedvariablecorresponds
0
with
the
method.
ply
proposedmultisplit
to a truebindingsite.Forthisspecificapplication,
itseemsdeThereis clearlya pricetopayforcontrolling
theFWER. Our
sirableto pursuea conservative
with
low
FWER. As
approach
methoddetectsfewertrulyrelevantvariproposedmultisplit
mentioned
we
could
control
less
conservative
erearlier,
other,
ablesthantheadaptiveLasso onaverage.Thedifference
is most
ror
as
discussed
in
Section
5.
measures,
forverylow SNRs. The multisplit
methodgenerpronounced
selects
neither
correct
nor
incorrect
variables
forSNR = 4.4
ally
Comparison With Standard Low-Dimensional
0.25, whiletheadaptiveLasso averagesbetween2 and 3 corFalse Discovery Rate Control
rectselections,among9-12 wrongselections.Dependingon
We mentioned
thatcontrolof FDR can be an attractive
altheobjectivesof the study,eitheroutcomeis preferred.
For
to FWER if a sizeable numberof rejectionsis exmethod
detectsalmostas manytruly ternative
largerSNRs,themultisplit
variablesas theadaptiveLasso, whilestillreducing pected.Usingthecorrected
p-valuesPi , . . . , Pp,slsimpleFDRimportant
thenumberof falselyselectedvariablesfrom20 or moreto controlling
procedurewas derivedin Section2.3, and its ascontrolof FDR was shownin Theorem3.3. We now
0.
ymptotic
roughly
The multisplit
evaluatethebehaviorof theresulting
methodseems to be beneficialin settings empirically
methodand
wherethecostofmakingan erroneous
selectionis rather
variables,
interesting
high. itspowertodetecttruly
usingthestandard
Forexample,expensivefollow-up
are usuallyre- Lasso withCV intheinitialscreening
experiments
step.Turning
againtothe
andstricter simulation
quiredtovalidateresultsinbiomedicaiapplications,
setting(A), we varythesamplesize n, thenumber
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions
1678
Journalofthe AmericanStatisticalAssociation,December 2009
FDR control
of
simulations
forthemultisplit
method
4. Results
ofFDRcontrolling
bar).Thesettings
(darkbar)andstandard
(light
Figure
of
variables.
of
the
bars
to
the
number
selected
and
are
below
each
simulation.
The
SNR
important
n,p,p, |5|,
height
corresponds average
given
> n,thestandard
0.
breaks
andthecorresponding
barsaresettoheight
method
down,
For/?
trimsthetotalnumbetweenneighboring ables. The multisplit
of variablesp, theSNR, thecorrelation
method,in contrast,
in onehalfof
smallernumber
berofvariablesto a substantially
variabless.
oftrulyinteresting
variablesp, andthenumber
less fromincreasedvarianceinthe
thatthe multisplit
methodis thesamplesandthensuffers
We previously
demonstrated
inthesecondhalfofthesamples.Repeatcoefficients
to thesingle-split
method.Herewe are moreinter- estimated
preferable
FDR- ingthisovermultiplesplitsthusleadsto a surprisingly
traditional
withwell-understood
estedin a comparison
powerevenforlow-dimensional
data.
For p < n, the standardapproachis fulvariableselectionprocedure
controlling
procedures.
we believethatthemainapplicationwill be in
once forthefulldata Nevertheless,
to computetheleast squaresestimator
set. For each variable,a p-value is obtained,and the FDR- high-dimensional
data,forwhichthestandard
approachbreaks
down
This
in
can
be
completely.
(2.6)
applied.
apcontrolling
procedureas
proachobviouslybreaksdownforp > n. Our proposedap5. EXTENSIONS
(p < n) and
proachcan be appliedto bothlow-dimensional
(p>n) settings.
Because of the genericnatureof our proposedmethodolhigh-dimensional
FDR ofourmethod(notshown) ogy,extensionsto any situationwhere(asymptotically
In all settings,
theempirical
valid)
valueofq = p-values,Pj, forhypotheses
is oftencloseto0 andalwaysbelowthecontrolled
=
.
.
.
are
available
1,
,p)
Hoj (j
0.05 (wherethecorrection
factor,
class ofexamplescomprises
An important
Y?i=i *~l> ^as alreadybeen arestraightforward.
takenintoaccount).Resultsforpowerare shownin Figure4 generalized
modlinearmodels(GLMs), orGaussiangraphical
forcontrolat q = 0.05.
form
involves
some
els. Thedimension-reduction
steptypically
the multisplitmethodtracksthe of shrinkage
An exampleforGaussiangraphical
estimation.
Possiblyunexpectedly,
FDR controlling
quiteclosely modelsis therecently
procedure
powerofthestandard
proposed"graphicalLasso" (Friedman,
data withp < n. In fact,the multi-splitHastie,andTibshirani
forlow-dimensional
2008). The secondsteprelieson classiifn/pis below,say,1.5 or cal tests(e.g.,likelihood
better
methodis doingconsiderably
ratio)appliedtotheselectedsubmodel,
is
An
intuitive
the
tests
thecorrelation
forlinearregression.
to
the
explana- analogous
among
large.
proposedmethodology
tionforthisbehavioris that,as p approachesn, thevariance
In some settings,
controlof FWER at, say,a = 0.05 is too
least conservative.
vectorunderthe ordinary
in each estimatedcoefficient
One can eitherresortto controlling
FDR, as alin
This
turn
is
estimate
to
control
the
or
FWER
control
luded
to
increasing
substantially.
squares(OLS)
earlier, adjust
expected
increasesthevarianceof all OLS components
ßy,j = 1,...,/?, numberof falserejections.As an example,considerthe advari- justed/7-value
theabilityto selectthetrulyimportant
and diminishes
Pj definedin (2.3). Variablej is rejectedif and
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions
Meinshausen,Meier,and Bühlmann:p-Values forHigh-DimensionalRegression
1679
Defineform€ (0, 1) thequantity
ofbootstrap
onlyifPj < a. [Inwhatfollows,assumethatadjustedp-values,
ttj(u)as thefraction
as defined
in (2.1), arenotcappedat 1. Thisis a technicalde- samplesthatyieldKj^ less thanorequal to w,
tailonly;it does notmodifytheproposedFWER-controlling
Variablej is rejectedifandonlyifPj < a controls
procedure.]
FWER at levela. Alternatively,
one can rejectvariablesifand
b='
<
>
if
K
1
is
where
a correction
factor.Call the
a,
only Pj/K
Note thattheevents{Qj(y) < a} and {7^(ay) > y} are equivalent.
number
offalselyrejectedvariablesV, andcalculateitas
Thus
<
p[minß;(y)<ex]^E[l{ß;(y) <«}]
jeN
jeN
Thentheexpectednumberof falsepositivesis controlledat
=
^]E[l{^(ay)>y}].
levellimsup^^ E[V] < aK. A proofofthisresultfollowsdijeN
fromtheproofofTheorem3.2. Ofcourse,we can equivrectly
alentlysetk = aK andobtaina control,
limsup^^ E[V] < k. Usinga Markovinequality,
Forexample,setting
k = 1 offers
a muchlessconservative
error
control
theFWER,ifthisis desired.
>y}]<comparedwithcontrolling
J2^[nxj(<xy)
£E[jij(ay)].
y
jeN
6. DISCUSSION
(A.2)
jeN
(•),
By the definitionof 7Ty
We haveproposeda multisplit
methodforassigningstatistical significance
andconstructing
conservative
p-valuesforhyI^E[^(«y)] = II¿
£ E[l{jf <«y)].
for
wherethenumpothesis
testing high-dimensional
problems
ber of predictor
variablesmay be muchlargerthansample
size.Ourmethodis an extension
ofthesingle-split
approachof Moreover,
ofK- ' in (A.I),
usingthedefinition
Wasserman
andRoeder(2009) andis extended
toFDR control.
theresultsofmultiple
datasplits,basedonquantiles
Combining
E[!{K¡» <ay}]<F[pM <ay'S£sM] = ^.
as summary
statistics,
improves
reproducibility
comparedwith
thesingle-split
method.The multisplit
and single-split
methThis is a consequenceof theuniform
distribution
of PJ ^ givenS ç
ods sharetheproperties
ofasymptotic
errorcontrolandmodel
theseresults,
we get
selectionconsistency.
We argueempirically
thatthemultisplit S^ . Summarizing
D
methodusuallyselectsmuchfewerfalse positivesthanthe
with
a
number
of
true
method,
single-split
slightly
higher
posp[minß;(y)<cxl<~yEr
Y -^-]<a,
ljeN^jyrJ- 1-yBf^b=' I ^
LStfOlJ"
'
'
itives.The mainarea of applicationwill be high-dimensional
LjeNnsW
variablesexceedssample
data,wherethenumber
p ofpredictor
whichcompletestheproof.
size n,becausestandard
approachesrelyon leastsquaresestimationandthusfailinthissetting.
Wehaveshownthatthemul- ProofofTheorem3.2
method
is also an interesting
alternative
to standard
FDR
tisplit
As in theproofofTheorem3.1, herewe workwithAT-^ insteadof
andFWER controlin lower-dimensional
becausethe
settings,
insteadof
we workwith
Forany
Pf'
kjb'
kjb)
proposedFDR controlcan be morepowerful
ifp is reasonably Pf' Analogously,
N
e
and
a
e
(0,
1),
with;
largebutsmallerthansamplesizen. Themethodis verygeneric
andcanbe usedin a broadspectrum
oferror-controlling
proceJ< a.
duresin multiple
linearmodelsandGLMs.
E rl{K¡b)<ay}i
testing,
including
(A3)
APPENDIX: PROOFS
Furthermore,
ProofofTheorem3.1
r
Fortechnical
reasons,we define
K¡b)=P¡b)l{Sc~S^}+ l{S^S^},
(A.1)
r
JE max HAf^ayh
<eV
Y
UdV
J
L^
<ay]-i
l[KJb)
JY
'
where
activeset conKJ are theadjustedp-valuesif theestimated
tainsthetrueactiveset.Otherwise,
all p-valuesare setto 1. Because
ofassumption
(Al), forfixedB, F[KJb)= P^b) forall b = 1, . . . , B] on
a setAn withP[An]-> 1. Thuswe can defineall of thequantities
in- andthus,with(A.3) andusingthedefinition
(A.I) ofK^b'
also
with
and
under
this
altered
,
volvingPj
slightly
procedure,
KJ
r ^-.
F
l{K¡b)<ay]l
al
itis sufficient
to showthat
E max 1
<E
V
^77- < a.
<a.
p[minß/(y)<al
J
J
ijeN
Fora randomvariableU takingvaluesin [0, 1],
In particular,
herewe can omitthelimessuperior.
Fortheproofs,we also omitthefunction
from
the
definimin{l,•}
U^a
l{U<ay) = i°
tionsof Qj(y) andPj in (2.2) and(2.3). The selectedsetsofvariables
< U< a
sup
a/U
aymin
{
areclearlyunaffected,
andthenotation
is simplified
Y
considerably.
reOwD
I 1/Kmin U <ctymin.
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions
(A.4)
Journalofthe AmericanStatisticalAssociation,December 2009
1680
on [0,1], then
distribution
if U has a uniform
Moreover,
E
r
ray™« _!
'{u<ay}i
sup
Lye(Kmin,l)
= /
J JO
Y
fa
_i J
axldx
/
JaYmin
ym>+
similartothatoftoBenjaminiand
Withthisresult,we use a argument
Yekutieli(2001),
= a(l-logymin).
'
on [0, 1] for
distribution
Thus,usingthefactthatkj has a uniform
on
S
conditional
€
ç 5(¿),
all; N,
E
r
i{k¡b)<ay}l
l{^}<ay}
< E
sup - Jsup - ^
^
-I
Ly€(ymin,l) V
L)/e(ymin,l)
= o¿(l-'ogynún).
r
fW-YsYsPifr
ieNk='
<«(1 -logymin).
sup l{7T/(ay)>y}l<a(l-logymin),
/€(3/min.l)
inf
Qj(y) <a]<
a{' - logymin),
7= 1.--..P.
p
j
j~l
7=2 V=l
7=1
/=1
yj
^eG-^E/o-^^E/o-')^/=1
7=1
•/^1//=1
^
(A-7)
ca.«
Notethat,analogouslytoeq. (27) ofBenjaminiandYekutieli(2001),
¿>0* =
- l)î,/g])
n Cf
g[(/'
p({P;
(y ^
= P(P/G[(/-D^^])
andthus
P
f®=E E^ =Hp(Piel(j- l)qJq]ï>
ieN
ieNk=l
whichimpliesthat
Y] P[
<a-6>
1
' /
'
E(ß)< £ -f(j)=/(d+ j^ ;- E^> - E/i^>
;
as in theproofofTheorem3.1.
where7^(0 is defined
Because theevents{Qj(y) < a] and {nj(ay) > y] are equivalent,
itfollowsthat
VP!"
J k=l
as
Equation(A.6) can thenbe rewritten
Againusinga Markovinequality,
£e[
ieNj=l
=EtEE^J
7=1 ieNk='
p
J
J
P
overall bootstrap
samplesyields
Averaging
jeN
p ' p
We denote
W6b)<ay}l
-_ r
J^«(l-log/min).
sup
2] E
Y
J
*-ye(yminA)
jeN
K
ieNj=l k=j
P P j
ieNj=l k=j
1
S ç ,S(¿7)
J
to (A.4), we thencan deducethat
Analogously
SUP
X,E
;eAT Lyetymin,!)
ieNk=l j='
inf Qj{y){'- logymin)
<«]<«.
fromwhichitfollowsby(A.5) in theproofofTheorem3.2 that
ofPj in (2.3),
Usingthedefinition
j
J]P[P/<a]<a,
jeN
(A.5)
E-W^E^^^faieN
/=1
Usingthisin (A.8), we obtain
andthus,bytheunionbound,
P[minP/<ûf]<ûf,
jeN
theproof.
whichcompletes
ProofofTheorem3.3
use a corAs in theproofsofTheorems3.1 and 3.2, we implicitly
is identical
ournotation
as in (A.I) forallp-values.Otherwise,
rection
tothatin theproofoftheorem1.3 ofBenjamimandYekutieli(2001). whichcompletestheproof.
An exceptionis ouruse of thevalue q insteadof q/min theFDR- Proofof
Corollary3.1
values.
withadjusted/?becausewe areworking
controlling
procedure,
it
methodis modelselection-consistent,
Because the single-split
Let
musthold thatF[maxjesPj'S' < an] -► 1 forn -> oo. Usingmultipijk= ¥({Pi e [(/ Dqjq]) andcf),
holdsforeach of theB splits,and thus
ple data splits,thisproperty
<
-► 1, implying
that,withprobability
<xn]
thenk - 1 other PtmaxjesmaxfcPJ^IS^I
whereQ is theeventthatifvariablei wererejected,
->
the
1
n
to
for
oo,
quantilemax^s g/(l) is boundedfrom
variableswererejectedas well.Now,as shownin eq. (10) as well as converging
over
all j e S of theadjusted/7-values,
maximum
The
above
by an.
ineq. (28) ofBenjaminiandYekutieli(2001),
=
i) ßy(y), is thusboundedfromaboveby
Pj (1 logymin)
infK€()/min,
to 1 forn -+ oo.
with
converging
(1 log/min)««,again
probability
ieNk=ì j=ì
2008. RevisedJuly2009.]
[ReceivedNovember
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions
Meinshausen,Meier,and Bühlmann:p-Values forHigh-DimensionalRegression
REFERENCES
Lasso Estimation
Bach, F. (2008), "Bolasso: Model Consistent
Throughthe
inICML '08: Proceedings
ofthe25thInternational
Conference
Bootstrap,"
onMachineLearning,
NewYork:ACM, pp. 33-40.
Y. (1995),"Controlling
theFalseDiscoveryRate:
Y.,andHochberg,
Benjamini,
A Practicaland PowerfulApproachto MultipleTesting,"
Journalof the
Ser.B, 57, 289-300.
RoyalStatistical
Society,
D. (2001), "The Controlof theFalse Discovery
Y, and Yekutieli,
Benjamini,
TheAnnalsofStatistics,
Ratein MultipleTestingUnderDependency,"
29,
1165-1188.
A. (2009),"Simultaneous
Bickel,P.,Ritov,Y, andTsybakov,
AnalysisofLasso
TheAnnalsofStatistics,
andDantzigSelector,"
37, 1705-1732.
for
Conditions
Blanchard,
G., andRoquain,E. (2008), "TwoSimpleSufficient
FDR Control,"
Electronic
JournalofStatistics,
2, 963-992.
P. (2006), "BoostingforHigh-Dimensional
LinearModels,"The
Bühlmann,
AnnalsofStatistics,
34, 559-583.
MoConlon,E., Liu, X., Lieb,J.,and Liu, J.(2003), "Integrating
Regulatory
tifDiscoveryandGenome-Wide
Proceedingsofthe
ExpressionAnalysis,"
NationalAcademyofScience,100,3339-3344.
DiFan,J.,and Lv, J.(2008), "SureIndependence
ScreeningforUltra-High
mensionalFeatureSpace,"JournaloftheRoyalStatisticalSociety,Ser.B,
70,849-911.
A GradientBoosting
J.(2001), "GreedyFunctionApproximation:
Friedman,
Machine,"TheAnnalsofStatistics,
29, 1189-1232.
R. (2008), "SparseInverseCovariance
Friedman,
J.,Hastie,T.,andTibshirani,
Estimation
WiththeGraphicalLasso,"Biostatistics,
9, 432.
Holm,S. (1979), "A SimpleSequentially
RejectiveMultipleTestProcedure,"
Scandinavian
JournalofStatistics,
6, 65-70.
1681
P. (2008), "Simultaneous
Inference
inGenHothorn,
T.,Bretz,F.,andWestfall,
eralParametric
Models,"Biometrical
Journal,50, 346-363.
Huang,J.,Ma, S., andZhang,C.-H. (2008), "AdaptiveLasso forSparseHighDimensionalRegressionModels,"StatisticaSinica,18, 1603-1618.
N. (2007), "RelaxedLasso," Computational
Statistics
and Data
Meinshausen,
Analysis,52, 374-393.
Meinshausen,N., and Bühlmann,P. (2006), "High-Dimensional
Graphsand
VariableSelectionWiththe Lasso," The Annalsof Statistics,34, 14361462.
ofOxford.
Selection,"
(2008), "Stability
University
preprint,
Meinshausen,
N., and Yu, B. (2009), "Lasso-TypeRecoveryof SparseRepresentations
forHigh-Dimensional
Data," TheAnnalsofStatistics,
37, 246270.
R. (1996), "RegressionShrinkageand Selectionvia the Lasso,"
Tibshirani,
JournaloftheRoyalStatistical
Society,Ser.B, 58, 267-288.
Tropp,J.,and Gilbert,A. (2007), "SignalRecoveryFromRandomMeasurementsvia Orthogonal
IEEE Transactions
on InformaMatchingPursuit,"
tionTheory,
53 (12), 4655^666.
vande Geer,S. (2008), "High-Dimensional
GeneralizedLinearModelsandthe
Lasso,"TheAnnalsofStatistics,
36, 614-645.
Wasserman,
L., andRoeder,K. (2009), "HighDimensionalVariableSelection,"
TheAnnalsofStatistics,
37, 2178-2201.
andBias oftheLasso SelecZhang,C.-H.,andHuang,J.(2008), "TheSparsity
tionin High-Dimensional
LinearRegression,"
TheAnnalsofStatistics,
36,
1567-1594.
ofLasso,"JourZhao,P.,andYu,B. (2006), "On Model SelectionConsistency
nal ofMachineLearningResearch,7, 2541-2563.
Journalofthe
Zou, H. (2006), "TheAdaptiveLasso andItsOracleProperties,"
AmericanStatistical
Association,101, 1418-1429.
This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC
All use subject to JSTOR Terms and Conditions