p -Values for High-Dimensional Regression Author(s): Nicolai Meinshausen, Lukas Meier and Peter Bühlmann Source: Journal of the American Statistical Association, Vol. 104, No. 488 (December 2009), pp. 1671-1681 Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association Stable URL: http://www.jstor.org/stable/40592371 Accessed: 30-03-2015 20:49 UTC Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions p-Values forHigh-DimensionalRegression NicolaiMeinshausen,LukasMeier, and PeterBühlmann cannotguard efficient selectionalgorithms Most computationally is challenging. in high-dimensional regression Assigningsignificance and arenotavailable.An exceptionis a recentproposalbyWasserman valid/7-values againstinclusionofnoisevariables.Asymptotically ofvariablesis thenreducedto a manageablesize usingthefirst Roederthatsplitsthedataintotwoparts.The number split,whileclassical error can be appliedto theremaining variableselectiontechniques variables,usingthedatafromthesecondsplit.This yieldsasymptotic Resultsaresensitive to thisarbitrary randomsplitofthedata,however. Thisinvolvesa one-time controlunderminimalconditions. choice, acrossmultiple randomsplits results.Herewe showthatinference toreproduce andmakesitdifficult whichamountstoa "p-valuelottery" controlovertheinclusionof noisevariables.We showthattheresulting whilemaintaining can be aggregated p-valuescan be asymptotic is shownto improvepowerwhile rate.In addition, theproposedaggregation wiseerrorandfalsediscovery usedforcontrolofbothfamilyoffalselyselectedvariablessubstantially. thenumber reducing enorrate;High-dimensional variableselection;Multiplecomparisons. False discovery KEY WORDS: Data splitting; rate;Family-wise discussthe This articleis organizedas follows.We briefly method of Wasserman and Roeder (2009) in Secsingle-split on theartion2, notingthattheresultscan dependstrongly of a random We choice bitrary samplesplit. proposea multisplitmethod,whicheliminatesthisdependence.In Section3 we proveFWER and FDR controlof the multisplit method, thatforsimulatedand and in Section4 we shownumerically thanthesingle-split realdatasets,themethodis morepowerful versionwhilesignificantly reducingthenumberof falsedisoftheproposed coveries.We outlinesomepossibleextensions in Section 5. methodology 1. INTRODUCTION variableselectionhas reThe problemof high-dimensional in thelastdecade.Sparseestimaceivedtremendous attention torslike theLasso (Tibshirani1996) and extensionsthereof (Zou 2006; Meinshausen2007) have been shownto be very data becausetheyare suitableforhigh-dimensional powerful results. setsandbecausetheylead to sparse,interpretable variableselecforhigh-dimensional In theusual workflow totheir tionproblems, theusersetspotential tuning parameters estimator as prediction optimalvaluesand uses theresulting thefinalresult.In theclassicallow-dimensional setup,someeris a widelyused standardin all rorcontrolbased on /7-values 2. SAMPLE SPLITTING AND HIGH-DIMENSIONAL are notavailablein highareas of sciences.So far,/7-values VARIABLESELECTION dimensional situations, exceptfortheproposalof Wasserman andRoeder(2009). An ad hoc solutionforassigningrelevance linearregression We considerthe usual high-dimensional oftheselectedpre- setupwitha responsevectorY = (Y' , . . . , Yn) and an n x p toanalyzethestability is tousethebootstrap dictors andfocusonthoseselectedmostoften(orevenalways). fixeddesignmatrix X suchthat Bach (2008) and Meinshausenand Bühlmann(2008) showed modelselectionprothatfortheLasso,thisleadstoa consistent case. thanforthenonbootstrap cedureunderfewerrestrictions wheree = (e',...,en) is a randomerrorvectorwithei iid erhasbeenmadeinobtaining someprogress Morerecently, vector.Extensionsto and ß e W is theparameter and Af(Q,a2) andBühlmann rorcontrol 2008; Wasserman (Meinshausen othermodelsaregivenin Section5. Roeder2009). Here we buildon theapproachof Wasserman Denoteby oftheir"screen andRoeder(2009) andshowthatan extension leads to a morepowerfulvariableseand clean" algorithm s = {/;##()} errorrate(FWER) lectionprocedure.Moreover,family-wise and similarly whereas thesetof activepredictors, and false discoveryrate (FDR) can be controlled, byN = Sc = {/;ßj = for Wasserman and Roeder(2009) focusedon variableselection 0} thesetofnoisevariables.Ourgoal is to assign/?-values = We also ex- thenullhypotheses via /7-values. ratherthanassigningsignificance infer versus and to O Hoj '-ßj Haj '-ßj^O to controlof the false discoveryrate thesetS froma setofn observations tendthe methodology (X,-,Y¿),i - 1,. . . , n. We data. allow forpotentially (Benjaminiand Hochberg1995) forhigh-dimensional designs,thatis,p^ n. high-dimensional inference An approach Althoughthemainapplicationof our procedureis forhigh- Thismakesstatistical verychallenging. dimensional data,wherethenumber and Roeder(2009) is to splitthedata p of variablescan greatly proposedbyWasserman exceedsamplesize n, we showthatthemethodalso is quite intotwoparts,reducing thedimensionality ofpredictors on one errorcontrolforn > p settings, partto a manageablenumberofpredictors withmorestandard the competitive (keeping imporindeedoftenproviding betterdetection powerin thepresence tantvariableswithhighprobability), and thenassign/7-values ofhighlycorrelated variables. andmakea finalselectionon thesecondpartofthedata,using classicalleastsquaresestimation. NicolaiMeinshausen is University of Statistics, UniLecturer, Department stats.ox. versityof Oxford,OxfordOX1 3TG, U.K. (E-mail:meinshausen® Semiac.uk). Lukas Meier is Ph.D. Student,PeterBühlmannis Professor, narfürStatistik, ETH Zurich,8092 Zurich,Switzerland. NicolaiMeinshausen shownduringhis stayat thegeneroussupportand hospitality acknowledges fürMathematik atETH Zürich. Forschungsinstitut © 2009 American Statistical Association Journal of the American Statistical Association December 2009, Vol. 104, No. 488, Theory and Methods DOI: 10.1 198/jasa.2009.tm08647 1671 This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions Journalofthe AmericanStatisticalAssociation,December 2009 1672 2.1 Family-Wise ErrorRate Control With the Single-Split Method ¥b) . thesetofactivepredictors, 2. Usingonlyü'b) , estimate 3. (a) UsingonlyD(b¿,fittheselectedvariablesin S(b) with leastsquaresand calculatethecorrespondordinary ing/7-values, Pf' forj e S(b). to 1,thatis, (b) Set theremaining /7-values to andRoeder(2009) attempts ofWasserman Theprocedure as theprobrate(FWER), defined error thefamily-wise control The methodrelies abilityofmakingatleastonefalserejection. variableselectionand dimenon samplesplitting, performing P{b)= h jt&b' on onepartofthedataandclassicalsignifireduction sionality into are on theotherpart.The data splitrandomly cancetesting as 4. Definetheadjusted(nonaggregated) /7-values twodisjointgroups,Din= (X,-n, Y;„) and Dout= (XoutiYout), (2.1) ¿=l,...,p. ofequal size. Let S be a variableselectionor screening procePJ*)=min(^)|S<»|>l)> notation active thesetof durethatestimates Abusing predictors. overtheB /7-values Then Finally,aggregate Pjb' as discussedlater. we also denotebyS thesetofselectedpredictors. slightly, foreachpredicB of to a total leads This /7-values procedure is basedonD/n; reduction variableselectionanddimensionality = statistics suitable out that It will turn ... tor summary j 1, ,/7. thatis, we applyS onlyon D[n.This includestheselectionof define For € are quantiles. y (0, 1) involvedin S. The idea is to break tuningparameters potential to a variables of the down largenumber, p, potential predictor (2.2) Qj(Y) = nan{l,qy({P¡b)/Y'b=l,...,B})}, of n, while k <&p, withk at mosta fraction smallernumber, function. and whereqY{-) is the(empirical)y-quantile coefficients variables.The regression keepingall relevant = . . . ,/7is thengivenby for each A 1, j the selected of predictor . . . /7-value thecorresponding Pi , ,Pp, predic/7-values, leastsquares Q/(y),foranyfixed0 < y < 1. In Section3 we showthatthis basedon Doutusingordinary torsaredetermined correct /7-value, adjustedformultiplicity. on thesetS and settingPj = 1 forall j £ S. If the is an asymptotically estimation Errorcontrolis not difficult. be y may selecting ^ Properly selectedmodelS containsthetruemodelS (i.e., S S), then of y. We propose value for the best if search we guaranteed basedon Doutareunbiased.Finally,each/7-value, the/7-values a suitablevalue selects that version an instead use to adaptive of Pj, is adjustedby a factor|S| to correctforthemultiplicity Let G the data. based on of the (0, 1) be a lower ymin quantile thetesting problem. and define bound for 0.05, y, typically The selectedmodelis givenby all variablesin S forwhich a e (0, 1), is belowa cutoff, theadjusted/7-value (2.3) Py= min(l,(l-log)/min) inf Qj(y)'. = {jeS:Pj'S'<a}. Ssingie ensuresthatthe The extracorrection factor,1 - log/min, Undersuitableassumptions (discussedlater),thisyieldsasymp- FWER remainscontrolledat level a of the adaptive despite toticcontrol againstinclusionofvariablesinN (falsepositives) searchforthebest For the recommended Sec. 3). (see quantile inthesensethat = 0.05,thisfactor is upper-bounded choiceof/min by4; infact, > 1] <a, 1 - log(0.05) « 3.996. limsup¥['ND Ssingie' betweentheproposedadontherelation Wecomment briefly thatis, controloftheFWER. The methodis easyto implement justment totheFDR (BenjaminiandHochberg1995;Benjamim The andYekutieli2001) or FWER (Holm 1979) controlling underweakassumptions. control andyieldstheasymptotic proceand into on an relies method Dout, dures.Whilewe providea familyandas such arbitrary split D[n wiseerrorcontrol single-split if thissplitis use union-bound and theresultscan changedrastically however, as donebyHolm(1979),thedeficorrections becausethen nitionofthe is This in itself chosendifferently. unsatisfactory, (2.3) anditsgraphical representaadjusted/7-value theresultsarenotreproducible. oftheFDR procedure, tioninFigure1 arevaguelyreminiscent of ifandonlyiftheempiricaldistribution hypotheses rejecting 2.2 Family-Wise ErrorRate Control With distribThe linear bound. a certain crosses empirical /7-values the New MultisplitMethod utionin (2.3) is takenforonlyone predictor variable,though, to a singlearbitrary An obviousalternative toa multiple-testing splitis to divide whichis eitherin S orN. Thiscorresponds Foreach split,we end up witha setof situation thesamplerepeatedly. withmula singlehypothesis in whichwe aretesting theresultsis notobvi- tiplestatistics. Howtocombineandaggregate /7-values. Figure1 showsan example.Panel (a) presents Herewe describea possibleapproach.Foreach a ous,however. = oftheadjusted/7-values, Pjb), forb 1, . . . ,B, of is obtainedforrandom histogram of /7-values distribution a hypothesis, in in real data theselectedvariable the example Section4.3. We proposethaterrorcontrolcan be basedon The samplesplitting. is method equivalentto pickingone of these single-split We showempirically thequantilesofthisdistribution. that,posand selectingthevariableif thisrandomly randomly is morepowerful /7-values theresulting procedure siblyunsurprisingly, lotsmall.To avoidthis"/7-value is chosen/7-value sufficiently methodalso makes method.The multisplit thanthesingle-split distribution the method the computes empirical of tery," multisplit ifthenumber atleastapproximately theresultsreproducible, = 1, . . . ,B, and rejectsthenullhyb for of all /7-values, Pjb' randomsplitsis chosentobe verylarge. pothesisHo :ßj = 0 (thusselectingvariablej and includingit methodusesthefollowing The multisplit procedure: crossesthebrointothemodel) if theempiricaldistribution Forfc=l,...,£: of thelatteris as ken line in Figurel(b). A shortderivation 1. Randomly splittheoriginaldataintotwodisjointgroups, follows.Variablej is selectedifand onlyifPj <a, whichoccursif and onlyif thereexistssome y e (0.05, 1) suchthat D{£ andD{b¿,ofequal size. This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions 1673 Meinshausen,Meier,and Bühlmann:p-Values forHigh-DimensionalRegression (b) (a) ä i °- ' * a s - // / . f/ o J fctlh ■ !■- I I I I I I I 0.0 0.2 0.4 0.6 0.8 1.0 J I 1 I M 1 .- r-T~l l~~1l~T~r~l q o I I I I 0.000 0.005 0.010 0.015 ADJUSTED P-VALUE ADJUSTED P-VALUES in themotif variable dataexampleofSection4.3. The ofadjusted regression p-values,P- ' fortheselected Figure1. (a) A histogram andrejects ifitis belowa. Forthemultisplit wereject method method, randomly picksoneofthesep-values(a "p-valuelottery") single-split crossesthebroken line[which function oftheadjusted ifandonlyiftheempirical distribution (3.996/a)p}]for p-values is/(p) = max{0.05, is theboundis indeedexceeded, andthevariable linefora = 0.05in(b). Forthisexample, as a broken somep G(0,1). Thisboundis shown thusselected. usingdefQj(y) < oi/(I log0.05) « a/3.996. Equivalently, oftheadjustedp-values,qy(Pj ), inition(2.2), they-quantile mustbe smallerthanor equal to ay/3.996. This in turnis of wheretheempiricaldistribution to thesituation equivalent = above . . is for b . theadjustedp-values,Pjb' 1, , B, crossing thebound/(p) = max{0.05,(3.996/a)p} forsomep e (0, 1). Thisboundis shownas a brokenlinein Figurel(b). Theresulting adjustedp-values,Pj,j = 1, . . . ,p, can thenbe used forbothFWER and FDR control.For FWER controlat levela e (0, 1), simplyall p-valuesbelowa are rejected,and theselectedsubsetis The originalFDR controlling procedureof Benjaminiand first orders the observedp-valuesas P(') < Hochberg(1995) P(2) <-"< P(p) and defines k = max]i : P(/)< -q ' . (2.6) P J I withthesmallestkvalItthenrejectsall variablesorhypotheses ues, withno rejectionmade if theset in (2.6) is empty.FDR is controlledin thisway at level q underthe conditionthat all p-valuesare independent. Benjaminiand Yekutieli(2001) is conservative undera widerrange showedthatthisprocedure ofdependencies betweenp-values(see BlanchardandRoquain = {j:Pj<<x}. (2.4) 2008 forrelatedwork).A greatleap of faithwould be reSmuiti foroursetting of highquiredto assumeanysuchassumption In Section3.2 we showthatindeed,asymptotically, P(V > 0) < dimensionalregression, however.For generaldependencies, HN' is thenumber offalselyselectedvaria, whereV = 'Smuiti BenjaminiandYekutieli(2001) showedthatcontrolis guaranablesundertheproposedselection(2.4). Besidesbetterrepro- teedat level q £?=1 r1 «^(1/2 + log(p)). themulti- The standard wise errorcontrol, andasymptotic familyducibility FDR procedureis to work with the raw morepowerful thanthe splitversionis, maybeunsurprisingly, on which are assumedto be uniformly distributed p-values, selectionmethod. single-split in is for true null The division [0, 1] hypotheses. byp (2.6) an for But the effective correction multiplicity. proposedmultisplit 2.3 False Discovery Rate Control With methodproducesalreadyadjustedp-values,as in(2.3). Because the MultisplitMethod we are alreadyworkingwithmultiplicity-corrected p-values, Controlof theFWER oftenis consideredtoo conservative. thedivisionbyp in (2.6) turnsoutto be superfluous. Instead, If manyrejections are made,Benjaminiand Hochberg(1995) we canorderthecorrected p-values,Pj,j = 1,. . . ,p, inincreasof false ing order,P(') < P@) £ • • • £ P(p)->an(*selecttheh variables theexpectedproportion proposedinsteadcontrolling = fi Let be the number of false withthesmallestp-values,where the FDR. V 'S N' rejections = and let R be thetofor a selection method S 'S' rejections h = max{/: P(0 < iq). (2.7) tal numberof rejections. The FDR is definedas theexpected The set of variablesselectedis denoted,withthevalue of h offalserejections, proportion givenin (2.7), by (2.5) E(ß), withß = V/max{l,/?}. = {j'Pj<P(h)}, (2.8) Smulti;FDR R = 0, thedenominator Forno rejections, ensuresthatthefalse = 0> if P(i) > iq for all / = withthedefinition of withno rejections,SmuitiFDR Q, is 0, conforming discovery proportion, 1, ...,/?. BenjaminiandHochberg(1995). This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions Journalofthe AmericanStatisticalAssociation,December 2009 1674 The proofis givenin theAppendix. The procedure(2.8) will achieve FDR controlat level ** FDR at level To control value of thequani~l Theorem3.1 is validforanypredefined 4q, tf(!/2 log/?). get qY?i=' tile the we replaceq in (2.7) by q/(J^=l i~l), completely However, y. adjustedp-values,Qj(y), involvethesomeanalogous of y9whichcouldpossiblypose a probhow choice FDR procedure of underarbitrary to thestandard arbitrary dependence lem for In of and Yekutieli the next sec(2001). practicalapplications.Thus we proposetheadjusted the/7-valuesBenjamini we demonstrate we error control. tion, prove Later, empirically p-values,Pj, thatsearchfortheoptimalvalueof y adaptively. theadvantages oftheproposedmultisplit versionoverboththe Theorem3.2. Assumethat(Al) and (A2) apply.Let a e FDR controlling and standard procedures, providsingle-split 1). If thenullhypothesis (0, Hoj :ßj = 0 is rejectedwhenever results. ingnumerical controlled at levela, Pj < a, thentheFWER is asymptotically thatis, ANDCONSISTENCY 3. ERRORCONTROL limsupP min/*, J-< a J' -< a, „_*o/ ijeN 3.1 Assumptions andRoeder To achieveasymptotic errorcontrol, Wasserman P is as in Theorem3.1. aboutthecrucialrequirementswheretheprobability (2009) madea fewassumptions forthevariableselectionprocedure S: The proofis givenin theAppendix. A briefremarkregarding the asymptotic natureof there= 1. lim^-^ooP[S I2 S] (Al) Screening property: in order. The controlrelies sults seems to be error proposed (A2) Sparsity 'S' <n/2. property: on all trulyimportant variablesbeingselectedin thescreening variables The screening (Al) ensuresthatall relevant property This is our screening propstagewithveryhighprobability. are retained.Irrelevant noise variablesare allowedto be seerty(Al). LetA be theeventS ç S. Theresultsfortheexample lectedas well,as longas thereare nottoo many,as required in Theorem3.2 can be formulated in a nonasymptotic way as (A2). A violationofthesparsity property bythesparsity prop- ¥[A fi < < -> and a, 1, P(A) a}] typically expo{minjeN Pj ertywouldmakeit impossibleto applyclassicaltestson the to Theofast,forn -» oo. Analogousremarks nentially apply retained variables. rems3.1 and3.3. The Lasso (Tibshirani1996) is an important examplethat satisfies conditions discussed 3.3 False Discovery Rate Control (Al) and (A2) underappropriate and Bühlmann Zhao Yu Meinshausen and (2006), (2006), by The adjusted/7-values can be used forFDR control,as laid vande Geer(2008), Meinshausen and Yu (2009), and Bickel, out in The Section 2.3. set ofselectedvariables, was Smuiti;FDR, Lasso and The Ritov, (Zou 2006; Tsybakov(2009). adaptive in show FDR defined Here we that is indeed controlled at (2.8). and also satisfies and under suit(Al) (A2) Zhang Huang2008) able conditions. Otherexamplesinclude,assumingappropriate thedesiredratewiththisprocedure. conditions, 2001; Bühlmann2006), or¿2 boosting(Friedman Theorem3.3. Assumethat(Al) and (A2) apply.Let q > 0 and Gilbert2007), and sure and thogonalmatching pursuit(Tropp be the set of selectedvariables,as definedin Smuiti;FDR (Fan andLv 2008). independence screening with a cutoff value of q = q/Y%=ii~l in (2.7). Let (2.8), We typicallyuse the Lasso (and extensionsthereof)as H N' and R = 'Smulti;FDR'. The FDR (2.5) with V = 'Smuiti,FDR are possible as well. a screeningmethod.Otheralgorithms = y/max{l,R] is thenasymptotically controlled at levelq, Q Wasserman andRoeder(2009) studiedvariousscenariosunder thatis, aresatisfied fortheLasso, dependwhichthesetwoproperties on the choice of the We refrain regularization ing parameter. limsupE(ß) <q. n-+oo fromrepeating theseandsimilararguments, andoperateon the that that we have a selection satisfies S, in The is procedure, assumption proof given theAppendix. andthesparsity boththescreening As withFWER control,we could use, foranyfixedvalue property property. of y, thevaluesQj(y),j = 1,...,/? insteadofPj, j = 1, . . . , n. 3.2 Family-Wise ErrorRate Control We refrainfromgivingthe fulldetailshere,because in our We proposetwoversionsof multiplicity-adjusted theforegoing p-values: experience, adaptiveversionworksreliablyand in of as defined which relies on a choice e an a does not (0, 1), (2.2), y Qj(y), require priorichoiceofthequantiley thatis necand the adaptiveversionPj definedin (2.3), whichmakes essaryotherwise. an adaptivechoice of y. We show thatbothquantitiesare FWER er- 3.4 Model Selection Consistency /7-values providing asymptotic multiplicity-adjusted rorcontrol. If we let level a = an ->-0 forn ->►oo, thentheprobabila noisevariablevanishesbecauseofthe Theorem 3.1. Assumethat(Al) and (A2) apply.Let a, y e ityoffalselyincluding = modelseTo of consistent results. whenIf the null 0 preceding gettheproperty (0, 1). getsrejected hypothesis Hojißj behaviorofthepower. controlled at lection,we mustanalyzetheasymptotic everQj(y) < a, thentheFWER is asymptotically fromthesingle-split is inherited It turnsoutthatthisproperty level«, thatis, method. limsupP minQj(y) <a' <ct, be theselectedmodelofthesingle3.1. LetSsingle Corollary whereP is withrespectto thedata sampleand thestatement splitmethod.Assumethatan ->-0 can be chosenforn -> oo holdsforanyoftheB randomsamplesplits. at a ratesuch thatlim^^ooP^^/^ = S] = 1. Then,forany This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions Meinshausen,Meier,and Bühlmann:p-Values forHigh-DimensionalRegression 1675 methodis also modelselection- setto values 1, . . . , |5|. The errorvariancea2 is adjustedsuch [see (2.3)], themultisplit Xmin = {/'€ thatthesignal-to-noise at a desired ratio(SNR) is maintained fora suitablesequencean; thatis, forSmuiti consistent for 50 simulations level at each simulationrun.We perform S; Pj < an}, itholdsthat each setting. lim F[Smuiti= S] = l. on is donesuchthatthemodelis trained The sample-splitting - 1)/2J,and the/7-values are calculated a data set of size l(n thatenandRoeder(2009) discussedconditions Wasserman unbalancedschemepredataset.Thisslightly = S] = 1 forvariousvariableselec- on theremaining surethatlim^oo F[Ssingle wherethefullmodelmightbe selectedon the cludes situations variablesesuchas theLasso or someforward tionmethods, wouldnotbe possibleon dataset.Calculationof/7-values first lectionscheme. We use a totalofB = 50 in a situation. data such the remaining true.The mulofCorollary 3.1 is notnecessarily Thereverse run. for each simulation FollowingWasserman methodis samplesplits if the single-split tisplitmethodcan be consistent usforall procedures Roeder(2009),we compute/7-values and of thesingle-split not.A necessaryconditionforconsistency simiare The results a normal qualitatively approximation. methodis limsup,^^ P[PJ*}< a] = 1 forally e S, wherethe ing instead. larwhenusinga t distribution is withrespectto boththe data and the random probability We comparetheaveragenumberof truepositivesand the that thereis a positiveprobability becauseotherwise split-point, methodsforthethree and multisplit FWER forthesingle-split variablej will notbe selectedwiththesingle-split approach. simulation of 0.25, 1, 4, and 16 SNRs (A)-(C), using settings Forthemultisplit method,on theotherhand,we need onlya R2 values of to 0.2, 0.5, 0.8, and (corresponding population from boundon quantilesof P. overb = l,...,B. We refrain The numberof relevantvariables,'S', is 0.94, respectively). goingintomoredetailhereand insteadshow,withnumerical either5 or 10. As the initialvariableselectionor screening than method,S, we use three methodis indeedmorepowerful thatthemultisplit results, approaches,all based on theLasso Bonferroni cor- (Tibshirani that the We also remark thesingle-split analog. usesthe The first denotedbySfixed, 1996). approach, theraw/7-values rectionin (2.1), multiplying by thenumber, Lasso and selectsthose'n/6' variablesthatappearmostoften using in theregularization 'S^', of selectedvariables,possiblycouldbe improved pathwhenvaryingthepenaltyparameter. ideasofHothorn, Bretz,andWestfall (2008),further increasing The constantnumberof 'n/6' variablesis chosen,somewhat thepoweroftheprocedure. to ensurea reasonablylargeset of selectedcoeffiarbitrarily, cientson theone handand on theotherhand,to ensurethat 4. NUMERICAL RESULTS leastsquaresestimation will workreasonablywell on thesecofthe ond halfof thedatawithsamplesize 'n/2' . Whilethechoice In thissectionwe comparetheempiricalperformance andrealdatasets.Simulated seemsto workwell in practiceand can be implemented on simulated different estimators very evaluationof themodelselectionprop- easilyand efficiently, dataallowa thorough it is stillslightly arbitrary. Avoidingany thatwe can findsignals suchchoicesof non-data-adaptive erties.The realdatasetdemonstrates thesectuningparameters, in datawithourproposedmethodthatwouldnotbe pickedup ondmethod, chosen uses the Lasso with Scv, penaltyparameter = value of a 0.05 We use a default method. the the variables whose corre10-fold by single-split cross-validation, by selecting estimated coefficients are different than 0. everywhere. sponding regression The thirdmethod,Sadap,is theadaptiveLasso of Zou (2006), 4.1 Simulations inwhichregularization arechosenbasedon 10-fold parameters with the Lasso solution used as theinitialesWe use thefollowing simulation cross-validation, settings: timator fortheadaptiveLasso. The selectedvariablesareagain datasetwithn = 100,p = 100,anda Toeplitz those (A) Simulated whosecorresponding estimated are regression parameters multivariate normatrix design comingfroma centered different than0. mal distribution withcovariancepV~k' betweenvariandmulFigures2 and3 showresultsforboththesingle-split ablesj andk,withp = 0.5. = methods with the default 0.05. tisplit settingym{n Usingthe (B) As in (A), butwithn = 100 and/7= 1000. the of true number method, average multisplit positives(i.e.,the (C) Real datasetwithn = 71 andp = 4088 forthedesign variablesinS whichare is increased, selected)typically slightly matrix X andartificial responseY. whilethe FWER (i.e., theprobability of includingvariables methodoftenhas a in in N) is reducedsharply.The single-split The data setin (C) is fromgeneexpressionmeasurements = variablesare log- FWER abovethelevela = 0.05 at whichit is asymptotically Bacillus subtilis.The p 4088 predictor whileforthemultisplit theFWER is above transformed andthereis a responsemeasur- controlled, method, geneexpressions, in the nominal level a few scenarios. The conrateofriboflavin in B. suboftheproduction only asymptotic ingthelogarithm trol seems to a in control with Prodtilis.The datawerekindlyprovidedbyDSM Nutritional give good finite-sample settings Because thetruevariablesare notknown, themultisplit on ucts,Switzerland. method, possiblyapartfromthemethodSfixed we considera linearmodelwithdesignmatrixfromreal data theveryhigh-dimensional dataset(C). The single-split method, and simulatea sparseparameter vectorß as follows.In each in contrast, selectstoo manynoisevariables,exceedingthedevectorß is createdby either siredFWER sometimes simulation innearlyall settings. This run,a newparameter substantially, "uniform" or"varying-strength" sam- suggeststhattheasymptotic Underuniform errorcontrolseemsto workbetter sampling. chosencomponents of ß are setto 1, and forfinitesamplesizes forthemultisplit method.Even though pling,|5| randomly theremaining are setto 0. Undervarying- themultisplit methodis moreconservative thanthesingle-split p - 'S' components of ß are method(havinga substantially chosencomponents lowerFWER), thenumberof strength sampling,'S' randomly This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions Journalofthe AmericanStatisticalAssociation,December 2009 1676 : I ;Q -i s- s| 0.00 i - i-Hll^s___ I W"M- s ' S g 0.00 0.02 0.04 0.06 0.08 P( FALSE POSITIVES 0.10 0.12 0.14 > 0) | I x:--s^ ¡ m- i'"""^Vg-... f s "--s J-- 0.10 0.05 0.00 ! -M > 0) "¡'"^ ';■■■"---.. 0.00 0.20 0.15 P( FALSE POSITIVES > 0) P( FALSE POSITIVES Vk. ' 0.15 M T*""'^vs. i"~ ^"^^^ q ; o-|m- s- g.-;-----s | 0.10 0.05 »- 2 ''...Q 0.15 0.10 0.05 Q * P( FALSE POSITIVES m-...;i s 0.00 w ---. ; > 0) "s ¡Is; ^' o-JM-M^s-t^ 0.15 0.10 0.05 P( FALSE POSITIVES s ._ °HVl-~^: a "s ""M^El^T-s -M^sr- 0.00 : I * -M^d.3";-"s °Hm ;^'^ M-.. o : 2 °°- a _q : S1VL 0.05 > 0) 0.10 0.20 0.15 P( FALSE POSITIVES > 0) wiseerror resultsforsetting (A) in thetopand(B) in thebottomrow.Averagenumberoftruepositivesvs. thefamilyFigure2. Simulation at a = 0.05 forboth version('M'). FWER is controlled rate(FWER) forthesinglesplitmethod('S') againstthemulti-split (asymptotically) of Resultsofa uniquesetting andthisvalueis indicated methods bya brokenverticalline.Fromlefttorightareresultsfor5/^, Scvand5fl£/fl/7. and broken otherwise. 'uniform' follow the if the coefficients which is solid a are and Increasing sampling SNR, sparsity design joinedby line, SNR is indicated symbolsize. byincreasing truediscoveriesoftenis increased.We notethatfordata (C), oftrue withp = 4088,andingeneralforlow SNRs,thenumber familypositivesis low,becausewe controltheverystringent levelof a = 0.05. As an at a significance wise errorcriterion errormeasuresis posless conservative alternative, controlling sible,as discussedin Section5. :„ ■a ^ a • s | gJM-^^§f-s-s 0.00 0.05 0.10 0.15 0.20 P( FALSE POSITIVES 0.25 > 0) 0.30 0.35 selectorwiththeadaptive Here we comparethemultisplit Lasso (Zou 2006). We haveused theadaptiveLasso as a varimethod.Usuable selectionmethodin ourproposedmultisplit A few choicesmust itself. is used Lasso the by ally, adaptive s^ ! «• M. M- * . 4.2 Comparisons Withthe AdaptiveLasso ^^ : ka ' 0.05 0.10 0.15 AA --_ o q - 1 °jiMm^s^=s*- gjMHiá^i^Pi^ 0.00 ! I o 0.20 P( FALSE POSITIVES 0.25 0.30 0.0 > 0) setup(C). Figure3. Resultsofsimulation This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions 0.1 0.2 P( FALSE POSITIVES 0.3 > 0) s 1677 Meinshausen,Meier,and Bühlmann:p-Values forHigh-DimensionalRegression ofthemultisplit madeusingtheadaptive Lassoanda CV Table1. Comparison method withCV-Lassoselection, Sadapiandtheselection = = for a with n 100 and 200 choiceoftheinvolved p penalty parameters setting E(truepositives) Uniform sampling NO NO NO NO YES YES YES YES NO NO NO NO YES YES YES YES 'S' 10 10 10 10 10 10 10 10 5 5 5 5 5 5 5 5 SNR 0.25 1 4 16 0.25 1 4 16 0.25 1 4 16 0.25 1 4 16 Lasso Multisplit Adaptive 0.00 2.30 0.58 6.32 4.14 8.30 7.20 9.42 0.02 2.52 0.10 7.46 2.14 9.96 9.92 10.00 0.06 1.94 1.50 3.86 3.52 4.58 4.40 4.98 2.22 0.02 0.82 4.64 5.00 4.90 5.00 5.00 E(falsepositives) Lasso Multisplit Adaptive 0 9.78 0 20.00 0 25.58 0.02 30.10 0 10.30 0.02 21.70 0 28.46 0.04 30.66 0 11.58 0.02 19.86 0.02 23.56 0 27.26 0 12.16 0.02 22.18 0 24.48 0 28.06 > 0) P(falsepositives Lasso Multisplit Adaptive 0 0.76 1 0 0 1 0.02 1 0 0.72 0.02 1 1 0 1 0.04 0 0.84 1 0.02 1 0.02 0 1 0 0.8 0.02 1 0 1 0 1 be madewhenusingtheadaptiveLasso; we makethe same errorcontrolwillchannelmoreof theavailableresourcesinto choicesas previously. The initialestimator is obtainedas the experiments morelikelytobe successful. Lasso solutionwitha 10-foldcross-validation (CV) choiceof thepenaltyparameter. The adaptiveLasso penaltyis also ob- 4.3 MotifRegression tainedby 10-foldCV. We applythemultisplit methodto a real data setrelatedto consistency Despitedesirableasymptotic properties (Huang, motifregression (Conlonet al. 2003). For a totalof n = 287 Ma, andZhang2008), theadaptiveLasso does notoffererror DNA segments, we have thebindingintensity of a proteinto in thesamewayas Theorem3.1 does forthemultisplit eachofthe control Theseareourresponsevalues,Y',...,Yn. segments. method.In fact,theFWER (i.e., theprobability of selecting Moreover,forp = 195 candidatewords("motifs"),we have at leastone noise variable)is veryclose to 1 withtheadapthatmeasurehow well theythmotifis represented scores,jc//, tiveLasso in all of thesimulations thatwe haveseen.In con- in the/th DNA sequence.The motifsaretypically 5- to 15-bpmethodoffersasymptotic trast,our multisplit control,which forthetruebindingsiteoftheprotein. Thehope longcandidates was verywell matchedby theempiricalFWER in thevicin- is thatthetrue siteis includedin thelistof significant binding resultsforthe ityofa = 0.05. Table 1 comparesthesimulation variableswiththestrongest betweenmotifscore relationship method and the Lasso multisplit usingSadap adaptive by itself and a linearmodelwithSadap,themulbinding intensity. Using fora simulation settingwithn = 100,p - 200, and thesame methodidentifies onepredictor variableatthe5% signifas in The adaptiveLasso se- tisplit (A) and (B) otherwise. settings icancelevel.In contrast, the methodcannotidentify single-split lectsroughly 20 noisevariables(outofp = 200 variables), even a singlesignificant In view of theasymptotic error predictor. thenumber oftruly relevant variablesisjust5 or 10.The though controlandtheempiricalresultsin Section4, thereis substanoffalsepositivesis atmost0.04 andoftensimaveragenumber tial evidenceindicating thattheselectedvariablecorresponds 0 with the method. ply proposedmultisplit to a truebindingsite.Forthisspecificapplication, itseemsdeThereis clearlya pricetopayforcontrolling theFWER. Our sirableto pursuea conservative with low FWER. As approach methoddetectsfewertrulyrelevantvariproposedmultisplit mentioned we could control less conservative erearlier, other, ablesthantheadaptiveLasso onaverage.Thedifference is most ror as discussed in Section 5. measures, forverylow SNRs. The multisplit methodgenerpronounced selects neither correct nor incorrect variables forSNR = 4.4 ally Comparison With Standard Low-Dimensional 0.25, whiletheadaptiveLasso averagesbetween2 and 3 corFalse Discovery Rate Control rectselections,among9-12 wrongselections.Dependingon We mentioned thatcontrolof FDR can be an attractive altheobjectivesof the study,eitheroutcomeis preferred. For to FWER if a sizeable numberof rejectionsis exmethod detectsalmostas manytruly ternative largerSNRs,themultisplit variablesas theadaptiveLasso, whilestillreducing pected.Usingthecorrected p-valuesPi , . . . , Pp,slsimpleFDRimportant thenumberof falselyselectedvariablesfrom20 or moreto controlling procedurewas derivedin Section2.3, and its ascontrolof FDR was shownin Theorem3.3. We now 0. ymptotic roughly The multisplit evaluatethebehaviorof theresulting methodseems to be beneficialin settings empirically methodand wherethecostofmakingan erroneous selectionis rather variables, interesting high. itspowertodetecttruly usingthestandard Forexample,expensivefollow-up are usuallyre- Lasso withCV intheinitialscreening experiments step.Turning againtothe andstricter simulation quiredtovalidateresultsinbiomedicaiapplications, setting(A), we varythesamplesize n, thenumber This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions 1678 Journalofthe AmericanStatisticalAssociation,December 2009 FDR control of simulations forthemultisplit method 4. Results ofFDRcontrolling bar).Thesettings (darkbar)andstandard (light Figure of variables. of the bars to the number selected and are below each simulation. The SNR important n,p,p, |5|, height corresponds average given > n,thestandard 0. breaks andthecorresponding barsaresettoheight method down, For/? trimsthetotalnumbetweenneighboring ables. The multisplit of variablesp, theSNR, thecorrelation method,in contrast, in onehalfof smallernumber berofvariablesto a substantially variabless. oftrulyinteresting variablesp, andthenumber less fromincreasedvarianceinthe thatthe multisplit methodis thesamplesandthensuffers We previously demonstrated inthesecondhalfofthesamples.Repeatcoefficients to thesingle-split method.Herewe are moreinter- estimated preferable FDR- ingthisovermultiplesplitsthusleadsto a surprisingly traditional withwell-understood estedin a comparison powerevenforlow-dimensional data. For p < n, the standardapproachis fulvariableselectionprocedure controlling procedures. we believethatthemainapplicationwill be in once forthefulldata Nevertheless, to computetheleast squaresestimator set. For each variable,a p-value is obtained,and the FDR- high-dimensional data,forwhichthestandard approachbreaks down This in can be completely. (2.6) applied. apcontrolling procedureas proachobviouslybreaksdownforp > n. Our proposedap5. EXTENSIONS (p < n) and proachcan be appliedto bothlow-dimensional (p>n) settings. Because of the genericnatureof our proposedmethodolhigh-dimensional FDR ofourmethod(notshown) ogy,extensionsto any situationwhere(asymptotically In all settings, theempirical valid) valueofq = p-values,Pj, forhypotheses is oftencloseto0 andalwaysbelowthecontrolled = . . . are available 1, ,p) Hoj (j 0.05 (wherethecorrection factor, class ofexamplescomprises An important Y?i=i *~l> ^as alreadybeen arestraightforward. takenintoaccount).Resultsforpowerare shownin Figure4 generalized modlinearmodels(GLMs), orGaussiangraphical forcontrolat q = 0.05. form involves some els. Thedimension-reduction steptypically the multisplitmethodtracksthe of shrinkage An exampleforGaussiangraphical estimation. Possiblyunexpectedly, FDR controlling quiteclosely modelsis therecently procedure powerofthestandard proposed"graphicalLasso" (Friedman, data withp < n. In fact,the multi-splitHastie,andTibshirani forlow-dimensional 2008). The secondsteprelieson classiifn/pis below,say,1.5 or cal tests(e.g.,likelihood better methodis doingconsiderably ratio)appliedtotheselectedsubmodel, is An intuitive the tests thecorrelation forlinearregression. to the explana- analogous among large. proposedmethodology tionforthisbehavioris that,as p approachesn, thevariance In some settings, controlof FWER at, say,a = 0.05 is too least conservative. vectorunderthe ordinary in each estimatedcoefficient One can eitherresortto controlling FDR, as alin This turn is estimate to control the or FWER control luded to increasing substantially. squares(OLS) earlier, adjust expected increasesthevarianceof all OLS components ßy,j = 1,...,/?, numberof falserejections.As an example,considerthe advari- justed/7-value theabilityto selectthetrulyimportant and diminishes Pj definedin (2.3). Variablej is rejectedif and This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions Meinshausen,Meier,and Bühlmann:p-Values forHigh-DimensionalRegression 1679 Defineform€ (0, 1) thequantity ofbootstrap onlyifPj < a. [Inwhatfollows,assumethatadjustedp-values, ttj(u)as thefraction as defined in (2.1), arenotcappedat 1. Thisis a technicalde- samplesthatyieldKj^ less thanorequal to w, tailonly;it does notmodifytheproposedFWER-controlling Variablej is rejectedifandonlyifPj < a controls procedure.] FWER at levela. Alternatively, one can rejectvariablesifand b=' < > if K 1 is where a correction factor.Call the a, only Pj/K Note thattheevents{Qj(y) < a} and {7^(ay) > y} are equivalent. number offalselyrejectedvariablesV, andcalculateitas Thus < p[minß;(y)<ex]^E[l{ß;(y) <«}] jeN jeN Thentheexpectednumberof falsepositivesis controlledat = ^]E[l{^(ay)>y}]. levellimsup^^ E[V] < aK. A proofofthisresultfollowsdijeN fromtheproofofTheorem3.2. Ofcourse,we can equivrectly alentlysetk = aK andobtaina control, limsup^^ E[V] < k. Usinga Markovinequality, Forexample,setting k = 1 offers a muchlessconservative error control theFWER,ifthisis desired. >y}]<comparedwithcontrolling J2^[nxj(<xy) £E[jij(ay)]. y jeN 6. DISCUSSION (A.2) jeN (•), By the definitionof 7Ty We haveproposeda multisplit methodforassigningstatistical significance andconstructing conservative p-valuesforhyI^E[^(«y)] = II¿ £ E[l{jf <«y)]. for wherethenumpothesis testing high-dimensional problems ber of predictor variablesmay be muchlargerthansample size.Ourmethodis an extension ofthesingle-split approachof Moreover, ofK- ' in (A.I), usingthedefinition Wasserman andRoeder(2009) andis extended toFDR control. theresultsofmultiple datasplits,basedonquantiles Combining E[!{K¡» <ay}]<F[pM <ay'S£sM] = ^. as summary statistics, improves reproducibility comparedwith thesingle-split method.The multisplit and single-split methThis is a consequenceof theuniform distribution of PJ ^ givenS ç ods sharetheproperties ofasymptotic errorcontrolandmodel theseresults, we get selectionconsistency. We argueempirically thatthemultisplit S^ . Summarizing D methodusuallyselectsmuchfewerfalse positivesthanthe with a number of true method, single-split slightly higher posp[minß;(y)<cxl<~yEr Y -^-]<a, ljeN^jyrJ- 1-yBf^b=' I ^ LStfOlJ" ' ' itives.The mainarea of applicationwill be high-dimensional LjeNnsW variablesexceedssample data,wherethenumber p ofpredictor whichcompletestheproof. size n,becausestandard approachesrelyon leastsquaresestimationandthusfailinthissetting. Wehaveshownthatthemul- ProofofTheorem3.2 method is also an interesting alternative to standard FDR tisplit As in theproofofTheorem3.1, herewe workwithAT-^ insteadof andFWER controlin lower-dimensional becausethe settings, insteadof we workwith Forany Pf' kjb' kjb) proposedFDR controlcan be morepowerful ifp is reasonably Pf' Analogously, N e and a e (0, 1), with; largebutsmallerthansamplesizen. Themethodis verygeneric andcanbe usedin a broadspectrum oferror-controlling proceJ< a. duresin multiple linearmodelsandGLMs. E rl{K¡b)<ay}i testing, including (A3) APPENDIX: PROOFS Furthermore, ProofofTheorem3.1 r Fortechnical reasons,we define K¡b)=P¡b)l{Sc~S^}+ l{S^S^}, (A.1) r JE max HAf^ayh <eV Y UdV J L^ <ay]-i l[KJb) JY ' where activeset conKJ are theadjustedp-valuesif theestimated tainsthetrueactiveset.Otherwise, all p-valuesare setto 1. Because ofassumption (Al), forfixedB, F[KJb)= P^b) forall b = 1, . . . , B] on a setAn withP[An]-> 1. Thuswe can defineall of thequantities in- andthus,with(A.3) andusingthedefinition (A.I) ofK^b' also with and under this altered , volvingPj slightly procedure, KJ r ^-. F l{K¡b)<ay]l al itis sufficient to showthat E max 1 <E V ^77- < a. <a. p[minß/(y)<al J J ijeN Fora randomvariableU takingvaluesin [0, 1], In particular, herewe can omitthelimessuperior. Fortheproofs,we also omitthefunction from the definimin{l,•} U^a l{U<ay) = i° tionsof Qj(y) andPj in (2.2) and(2.3). The selectedsetsofvariables < U< a sup a/U aymin { areclearlyunaffected, andthenotation is simplified Y considerably. reOwD I 1/Kmin U <ctymin. This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions (A.4) Journalofthe AmericanStatisticalAssociation,December 2009 1680 on [0,1], then distribution if U has a uniform Moreover, E r ray™« _! '{u<ay}i sup Lye(Kmin,l) = / J JO Y fa _i J axldx / JaYmin ym>+ similartothatoftoBenjaminiand Withthisresult,we use a argument Yekutieli(2001), = a(l-logymin). ' on [0, 1] for distribution Thus,usingthefactthatkj has a uniform on S conditional € ç 5(¿), all; N, E r i{k¡b)<ay}l l{^}<ay} < E sup - Jsup - ^ ^ -I Ly€(ymin,l) V L)/e(ymin,l) = o¿(l-'ogynún). r fW-YsYsPifr ieNk=' <«(1 -logymin). sup l{7T/(ay)>y}l<a(l-logymin), /€(3/min.l) inf Qj(y) <a]< a{' - logymin), 7= 1.--..P. p j j~l 7=2 V=l 7=1 /=1 yj ^eG-^E/o-^^E/o-')^/=1 7=1 •/^1//=1 ^ (A-7) ca.« Notethat,analogouslytoeq. (27) ofBenjaminiandYekutieli(2001), ¿>0* = - l)î,/g]) n Cf g[(/' p({P; (y ^ = P(P/G[(/-D^^]) andthus P f®=E E^ =Hp(Piel(j- l)qJq]ï> ieN ieNk=l whichimpliesthat Y] P[ <a-6> 1 ' / ' E(ß)< £ -f(j)=/(d+ j^ ;- E^> - E/i^> ; as in theproofofTheorem3.1. where7^(0 is defined Because theevents{Qj(y) < a] and {nj(ay) > y] are equivalent, itfollowsthat VP!" J k=l as Equation(A.6) can thenbe rewritten Againusinga Markovinequality, £e[ ieNj=l =EtEE^J 7=1 ieNk=' p J J P overall bootstrap samplesyields Averaging jeN p ' p We denote W6b)<ay}l -_ r J^«(l-log/min). sup 2] E Y J *-ye(yminA) jeN K ieNj=l k=j P P j ieNj=l k=j 1 S ç ,S(¿7) J to (A.4), we thencan deducethat Analogously SUP X,E ;eAT Lyetymin,!) ieNk=l j=' inf Qj{y){'- logymin) <«]<«. fromwhichitfollowsby(A.5) in theproofofTheorem3.2 that ofPj in (2.3), Usingthedefinition j J]P[P/<a]<a, jeN (A.5) E-W^E^^^faieN /=1 Usingthisin (A.8), we obtain andthus,bytheunionbound, P[minP/<ûf]<ûf, jeN theproof. whichcompletes ProofofTheorem3.3 use a corAs in theproofsofTheorems3.1 and 3.2, we implicitly is identical ournotation as in (A.I) forallp-values.Otherwise, rection tothatin theproofoftheorem1.3 ofBenjamimandYekutieli(2001). whichcompletestheproof. An exceptionis ouruse of thevalue q insteadof q/min theFDR- Proofof Corollary3.1 values. withadjusted/?becausewe areworking controlling procedure, it methodis modelselection-consistent, Because the single-split Let musthold thatF[maxjesPj'S' < an] -► 1 forn -> oo. Usingmultipijk= ¥({Pi e [(/ Dqjq]) andcf), holdsforeach of theB splits,and thus ple data splits,thisproperty < -► 1, implying that,withprobability <xn] thenk - 1 other PtmaxjesmaxfcPJ^IS^I whereQ is theeventthatifvariablei wererejected, -> the 1 n to for oo, quantilemax^s g/(l) is boundedfrom variableswererejectedas well.Now,as shownin eq. (10) as well as converging over all j e S of theadjusted/7-values, maximum The above by an. ineq. (28) ofBenjaminiandYekutieli(2001), = i) ßy(y), is thusboundedfromaboveby Pj (1 logymin) infK€()/min, to 1 forn -+ oo. with converging (1 log/min)««,again probability ieNk=ì j=ì 2008. RevisedJuly2009.] [ReceivedNovember This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions Meinshausen,Meier,and Bühlmann:p-Values forHigh-DimensionalRegression REFERENCES Lasso Estimation Bach, F. (2008), "Bolasso: Model Consistent Throughthe inICML '08: Proceedings ofthe25thInternational Conference Bootstrap," onMachineLearning, NewYork:ACM, pp. 33-40. Y. (1995),"Controlling theFalseDiscoveryRate: Y.,andHochberg, Benjamini, A Practicaland PowerfulApproachto MultipleTesting," Journalof the Ser.B, 57, 289-300. RoyalStatistical Society, D. (2001), "The Controlof theFalse Discovery Y, and Yekutieli, Benjamini, TheAnnalsofStatistics, Ratein MultipleTestingUnderDependency," 29, 1165-1188. A. (2009),"Simultaneous Bickel,P.,Ritov,Y, andTsybakov, AnalysisofLasso TheAnnalsofStatistics, andDantzigSelector," 37, 1705-1732. for Conditions Blanchard, G., andRoquain,E. (2008), "TwoSimpleSufficient FDR Control," Electronic JournalofStatistics, 2, 963-992. P. (2006), "BoostingforHigh-Dimensional LinearModels,"The Bühlmann, AnnalsofStatistics, 34, 559-583. MoConlon,E., Liu, X., Lieb,J.,and Liu, J.(2003), "Integrating Regulatory tifDiscoveryandGenome-Wide Proceedingsofthe ExpressionAnalysis," NationalAcademyofScience,100,3339-3344. DiFan,J.,and Lv, J.(2008), "SureIndependence ScreeningforUltra-High mensionalFeatureSpace,"JournaloftheRoyalStatisticalSociety,Ser.B, 70,849-911. A GradientBoosting J.(2001), "GreedyFunctionApproximation: Friedman, Machine,"TheAnnalsofStatistics, 29, 1189-1232. R. (2008), "SparseInverseCovariance Friedman, J.,Hastie,T.,andTibshirani, Estimation WiththeGraphicalLasso,"Biostatistics, 9, 432. Holm,S. (1979), "A SimpleSequentially RejectiveMultipleTestProcedure," Scandinavian JournalofStatistics, 6, 65-70. 1681 P. (2008), "Simultaneous Inference inGenHothorn, T.,Bretz,F.,andWestfall, eralParametric Models,"Biometrical Journal,50, 346-363. Huang,J.,Ma, S., andZhang,C.-H. (2008), "AdaptiveLasso forSparseHighDimensionalRegressionModels,"StatisticaSinica,18, 1603-1618. N. (2007), "RelaxedLasso," Computational Statistics and Data Meinshausen, Analysis,52, 374-393. Meinshausen,N., and Bühlmann,P. (2006), "High-Dimensional Graphsand VariableSelectionWiththe Lasso," The Annalsof Statistics,34, 14361462. ofOxford. Selection," (2008), "Stability University preprint, Meinshausen, N., and Yu, B. (2009), "Lasso-TypeRecoveryof SparseRepresentations forHigh-Dimensional Data," TheAnnalsofStatistics, 37, 246270. R. (1996), "RegressionShrinkageand Selectionvia the Lasso," Tibshirani, JournaloftheRoyalStatistical Society,Ser.B, 58, 267-288. Tropp,J.,and Gilbert,A. (2007), "SignalRecoveryFromRandomMeasurementsvia Orthogonal IEEE Transactions on InformaMatchingPursuit," tionTheory, 53 (12), 4655^666. vande Geer,S. (2008), "High-Dimensional GeneralizedLinearModelsandthe Lasso,"TheAnnalsofStatistics, 36, 614-645. Wasserman, L., andRoeder,K. (2009), "HighDimensionalVariableSelection," TheAnnalsofStatistics, 37, 2178-2201. andBias oftheLasso SelecZhang,C.-H.,andHuang,J.(2008), "TheSparsity tionin High-Dimensional LinearRegression," TheAnnalsofStatistics, 36, 1567-1594. ofLasso,"JourZhao,P.,andYu,B. (2006), "On Model SelectionConsistency nal ofMachineLearningResearch,7, 2541-2563. Journalofthe Zou, H. (2006), "TheAdaptiveLasso andItsOracleProperties," AmericanStatistical Association,101, 1418-1429. This content downloaded from 152.14.136.96 on Mon, 30 Mar 2015 20:49:01 UTC All use subject to JSTOR Terms and Conditions
© Copyright 2024