Effective Geographic SampleSizein thePresenceof SpatialAutocorrelation DanielA. Griffith AshbelSmith SchoolofSocialSciences, Professor, University ofTexasat Dallas contained As spatialautocorrelation latentin georeferenced theamountofduplicateinformation data increases, in thesedataalsoincreases. Thisproperty theresearchquestionaskingwhatthenumberofindependent suggests observations, sample sayn*,is thatis equivalentto thesamplesize,n,ofa dataset.Thisis thenotionofeffective size. Intuitively whenzerospatialautocorrelation speaking, prevails,n*= n; whenperfectpositivespatialautocorrelation prevailsin a univariateregionalmean problem, n*= 1. Equationsare presentedforestimating withthe goal of obof a samplemean or samplecorrelation coefficient n*based on the samplingdistribution level of precision, modelspecifications: (1) tainingsomepredetermined usingthe following spatialstatistical simultaneous These equationsare evaland (3) spatialfilter. (2) geostatistical autoregressive, semivariogram, uatedwithsimulation and are illustrated withselectedempirical experiments examplesfoundin theliterature. redundant KeyWords: geographic sample, geostatistics, spatialautoregression. information, spatialautocorrelation, fordata-gathering purposesmustaddress Sampling questions askinghow and what to sample (Levy and Lemeshow 1991), and it is the foundationof much empiricalsocial science research,whetherquantitativeor qualitativemethodologiesare employed.One distinction between these two methodological approachesis thatquantitativeresearchfrequently requires relativelylarge sample sizes to collect somewhatsuperficial, albeit important,attributeinformationthat is generalizableto a population,whilequalitativeresearch to relativelysmallsamplesizesin order oftenis restricted to collect large quantitiesof in-depth,detailed information from subjects or case studies. Quantitative analysis generalizationis achieved through a sound random-sampling design (i.e., how to sample); qualitative analysisgeneralization,if desired,may be achieved Considerable throughsuch techniquesas triangulation. efforthas been devoted to geographicsamplingdesigns for quantitative investigations (e.g., Stehman and Overton 1996)-translating the what into where to sample-designs that exploit random sampling error. Impacts of spatial autocorrelationin this context are partiallyunderstoodand are the topicofthisarticle.One of an arrayof purposivesamplingstrategies(i.e., how to sample) can be employedin qualitative research (see Marshall and Rossman 1999, 78). The goal often is and adherenceto selectedtheoretical representativeness as well as convenience.Impactsofspatial considerations, autocorrelationin this latter context are almost completelyunknown,althougha spatial researchershould realize that it still will come into play. For example, a snowball samplingstrategywill be impacted by spatial ifsubjectsare fromnearbylocationsand autocorrelation because ofthe waythe autocorrelation social network by extreme-cases an is And strategy generated. sample could be impacted by the existence of geographically nonrandom"hot spots" or "cold spots,"whicharisebecause ofspatialautocorrelation. Findingsreportedin this articleforquantitativemethodologiesofferat least some speculativeinsightsinto qualitativesample sizes,too. ImportantSampleProperties oftenin termsofstatistical Sample sizedetermination, power calculations, frequentlyis a valuable step in planninga sample-based,quantitativestudy.Most instatisticstextbooksdiscusshypothesistesting troductory in the contextof appropriatesamplesize determination, with or without statisticalpower specification.The popularityand cumbersomenessof these calculations have resulted in web-based interactivecalculators to executethe necessarycomputationsforresearchers(e.g., For the case http://calculators.stat.ucla.edu/powercalc/). of independentobservations,Flores,Martinez,and Ferrer(2003) furnishsome insightsinto sample-sizedeforarithmeticmeans of georeferenced termination data, but for systematicsampling designs rather than the tessellated random samplingdesign promotedin this article.As this literatureillustrates,calculatingan appropriatesamplesize unavoidablyinvolvesmathematical notation, which accordinglyappears in the ensuing discussion. C AnnalsoftheAssociation ofAmerican 95(4), 2005, pp. 740-760 2005 byAssociationofAmericanGeographers Geographers, 2005 Initialsubmission, December2004; finalacceptance,February April2004; revisedsubmission, PublishedbyBlackwellPublishing, 350 Main Street,Malden,MA 02148, and 9600 Garsington Road,OxfordOX4 2DQ, U.K. EffectiveGeographicSample Size in the Presenceof Spatial Autocorrelation Statisticalpower (Tietjen 1986, 38) is the probability-frequentlydenoted by 1- P3,where 3 is the probabilityof failingto reject the null hypothesiswhen the alternativehypothesisis true(i.e., a TypeII error)-that a test will reject a false null hypothesis(i.e., the complementof a Type II error).The higherthe power,the greaterthe chance of obtaininga statistically significant resultwhen a null hypothesisis false.The powerof all statisticaltests is dependent on the followingdesign parameters:significancelevel selected for a statistical test; sample size; the tolerablemagnitudeof difference between a sample statisticand its corresponding population parameter;and natural variabilityfor the phenomenonunderstudy. Spatial autocorrelation,which may arise fromcommon variablesassociated with locations or fromdirect interactionbetween locations (see Griffith1992), has an impact on significancelevels, detectabledifferences in attributemeasuresfora population,and measuresof attributevariability(see, e.g., Arbia, Griffith, and Haining 1998, 1999). These impacts motivatedClifford, Richardson,and H~mon (1989) to apply the phrase "effectivedegreesof freedom"l-the equivalentnumber ofdegreesoffreedomforspatiallyunautocorrelated(i.e., independent) observations,exploiting redundant or containedin georeferenced data duplicatedinformation due to the relativelocationsof observations(i.e., spatial -to analyses in which these spatial autocorrelation) autocorrelationeffectsare adjusted for in the case of correlationcoefficients.The duplicate informationin question may arise fromgeographictrendsinduced by commonvariablesor frominformation sharingresulting fromspatialinteraction(e.g., geographicdiffusion). This articlehighlightsthe nearlyequivalentnotion of effective sample size2: the numberof independentobservations, say n*, that is equivalent to a spatially autocorrelateddata set's sample size, n. Intuitively speaking,when zero spatialautocorrelation prevailsand a regionalmean is beingestimated,n* = n; whenperfect positive spatial autocorrelationprevails,n*= 1. The importanceof correctingn to n*may be illustratedby analysisof remotelysensed data forthe High Peak districtof England, for which n = 900 pixels containing markedlyhigh positivespatial autocorrelationis equivalent to n* 5 independent pixels (see the ensuing discussionfordetails).As an aside,Getisand Ord (2000) furnisha similartypeof analysisforthe multipletesting of local indices of spatial autocorrelation, which themselvesare highlyspatiallyautocorrelated byconstruction. Of note is that establishing effective samplesize unamathematical derivations;basic ones voidablyrequires are outlinedin thebodyofthisarticlein orderto establish 741 the soundnessofresults.The validityofreportedfindings is further bolsteredwithsimulationexperimentresults. ImportantConsiderations When Designing a Sampling Network Random samplingin a geographiclandscape requires considerationsmuch like those used when designing a conventional stratifiedrandom sample. Geographic needs to be cast in termsof spatial representativeness to coverage ratherthan in termsof, say, stratification achieve good socioeconomic/demographic coverage. Designinga geographicsamplingnetworkalso needs to protectagainst sample locations being correlatedwith the geographicdistributionto be studied; this specific concern is whypurelysystematicsamplingoftenis not used. Geographic sampling networksenable regional means to be estimated,eitherforpredefinedsets of aggregateareal units (choroplethmaps) or as interpolationsofcontinuoussurfaces(contourmaps). Geographic samplingnetworks,designedforefficientestimationof of inparametersdescribingthe geographicdistribution terest,need to guard against grosslyinefficient spatial prediction (Martin 2001; Miller 2001; Diggle and resultsin Lophaven 2004) and vice versa.This trade-off a compromisebetweena systematicsample,comprising regularlyspaced samplinglocationsin orderto achieve good geographiccoverageand hence good interpolation spaced samplinglocations in accuracy,and irregularly orderto achieve betterestimationof parametersforthe of interest. geographicdistribution A samplingnetworkcan be devised in variousways to satisfythe condition of containingboth regularly and irregularly spaced samplinglocations.One wayis to of the locations systematically n/2 (e.g., on a position and the remainingn/2locaregularsquare tessellation) tions in a random fashion (i.e., randomlyselect eastnorth-southcoordinates).This westand, independently, is the typeof designassociatedwiththe GEOEAS data example (Englund and Sparks 1991; see http:// A secwww.websl1.uidaho.edu/geoe428/data_files.htm). ond method proposedby Diggle and Lophaven (2004) involvespositioningsome samplinglocations on a regular square tessellationgridand the remaininglocations on more finelyspaced regularsquare tessellationgrids withina randomlyselected subset of cells demarcating the coarsergrid-the lattice plus in-filldesign. Diggle and Lophaven also propose a thirddesign,which they thatinvolvespositioningsome samplinglocations prefer, on a regularsquare tessellationgrid,withthe remaining locationsbeing randomlyselected fromconstantradius a randomsubsetof circularbufferzones circumscribing 742 Griffith the systematically positionedlocations-the latticeplus close pairsdesign.Unfortunately, the softwarecurrently available to supporttheirdesigns"would encounterse... with numbersof locations larger rious difficulties thana fewhundred"(Diggleand Lophaven2004, 8). Yet a fourthdesign is the one employedfor this article, which is based on hexagonal-tessellation,stratified, random sampling (Stehman and Overton 1996). A regularhexagonal tessellationcontainingn cells is superimposedon a region-the systematiccomponent. Then a singlelocationis randomlyselectedfromwithin each hexagon-the random component. This design sharesmanysimilarities withthe latticeplus close pairs design. Of note is that these networklayoutissues are centralto debates about geostatisticalsamplingdesigns. Cressie (1991, ?5.6) furnishesa usefuloverviewof numerousspatialsamplingdesigns. Mixing regularlyand irregularly spaced samplinglocations highlightsanotherimportantfeatureof spatial analysis,namelydesigned-basedand model-basedinference. The precedingsamplingdesigns supportdesignbased inference, whichassumesthata givenlocationhas a unique fixedbut unknownvalue for the geographic distributionof interest.The referencesamplingdistribution is constructed,conceptually,by repeatedlysampling froma geographiclandscape and using the same design and calculatingparameterestimateswith each sample. Initially,spatial scientistsbelieved that this used when data constrategycould not be legitimately tain non-zero spatial autocorrelation(Brus and de is to let thevalue Gruijter1993). An alternativestrategy forsome geographicdistribution at a givenlocationvary. In other words, the joint distributionof data values forminga map is one of an infinitenumberof possible realizationsof some stochasticprocess; the total set of Hence, the espossiblemaps is called a superpopulation. sentialtool fordescribinga map is a model,resultingin thisinferential basis beinglabeled model-based. A severe in shortcomingof this latterapproach is the difficulty whether or not model are knowing valid, assumptions necessitatingdiagnostic analyses. But it furnishesan nonranindispensableanalyticaltool forunderstanding domlysampled data such as remotelysensed data and forenablingspatial autocorrelationto be accounted for when devisinga samplingdesign: the model-informed, design-basedperspectiveoutlinedin this article. A ConceptualFramework:The Effective Size of a GeographicSample A basis forestablishingeffectivesample size fornormally distributedgeoreferenceddata is presented in termsof the samplingdistributionof a single sample mean; extensionsexploitingmultiplesample means or the sample correlationcoefficientare presentedin AppendicesA, B, and C. This approach,forwhich the assumptionof a bell-shapedcurve is critical,is directly analogousto thatreportedfortimeseries(e.g.,see the R Documentation) and is indirectlyanalogous to what is models reportedforsurveyweightswithsuperpopulation (Pottchoff, Woodbury,and Manton 1992), wherebyapplyingweightsto sampleresultsaltersthe value of n. Measuringnaturalvariabilityforsome georeferenced phenomenon resultsin an inflatedvariance estimate when spatialautocorrelationis overlooked(see Haining 2003, ?8.1). Suppose the n x n matrixV contains the covariationstructureamong n georeferencedobservations (more precisely,matrixG2V- 1 is the covariance matrix),such that Y = t + e t + V-1/2e*,where Y x attributevaldenotes an n 1 vectorof georeferenced ues, p denotes the population mean of variable Y, 1 denotes an n x 1 vectorof ones, and e and e*, respectively,denote n x 1 vectorsof spatiallyautocorrelated and unautocorrelatederrors.Suppose e* is independent and identicallydistributedN(O, oY),e*) whereN denotes and a.2 denotesthe population the normaldistribution, variance forvariatee*. If V = I, the n x n identitymatrix,then the n observationsare uncorrelated.Using matrixnotation,the populationvarianceestimatebased is upon a sample, and ignoringspatial autocorrelation, given by = E[(Y - l)/n] TR(V1) 2, (lA) l~)T(Y n where 62 denotes the estimateof cy, the variance of attributevariableY, E denotes the calculus of expectations operator,and T and TR respectivelydenote the matrix transpose and trace operators. The quantity factor(VIF), similarto TR(V-' )/nis a varianceinflation in conventional the VIF generatedby multicollinearity analysis;it expressesthedegree multiplelinearregression observations to whichcollinearityamonggeoreferenced dispersed degradesthe precisionofY relativeto similarly spatiallyuncorrelatedvalues. Popular versionsof matrix V include, for spatial autoregressiveparameterp and binarygeographicweightsmatrixC: (I- pC) for the conditional autoregressive(CAR) model; and, [(I- pW)T(I- pW)] for the autoregressiveresponse (AR) and simultaneousautoregressive(SAR) models, versionof mawherematrixW is the row-standardized trixC.3 Cliffand Ord (1981, Ch. 7), Anselin (1988), Griffith (1988), Haining (1990), and Cressie (1991, Ch. 1), amongothers,furnishadditionaldetailsabout these models. E(&) EffectiveGeographicSample Size in the Presenceof Spatial Autocorrelation Again using matrix notation, the variance of the sample mean of variableY, y, ignoringspatial autocorrelation,is givenby (A) 1 o /- 08 08 a E(^&) (1B) 1TV-11/n2 Rearrangingthe termson the right-handside of this equation and makingthe necessaryalgebraicmanipulationsyields TR(V-1) E(^Y2)E(') 2 .. 02 TR(V-1) n(1C) 1TV-'1 00 side ofthisequation The denominatoron the right-hand furnishesthe formulaforeffectivesample size,namely, 0.2 00 0.4 0.6 se-formula 1TV-11 (2) (2) If the n observationsare independent,and hence V = I, thenn* = n, and the VIF becomesTR(V-) = 1. Ifperfect positivespatial autocorrelationprevails,then, conceptually,V - 1 = kl 1T, withk --+oc as positivespatial autocorrelationincreases,and n* = 1. In additionto the mathematicalstatisticaltheoryderivationofEquation(2), itsvaliditycan be assessedthrough A simpleexploratory simulationexperimentation. experiment (100 replications)was conductedforselectedcases in whichvariableY was distributed N(O, 1), spatialautowas embeddedwithan SAR model,and n, the correlation level of positivespatialautocorrelation p, and geographic were varied. a scatterplot connectivity Figure1A portrays of the simulatedstandarderrorversusa standarderror computedwiththe VIF and Equation (2). The goodnessof the regression of-fit line appearingin thisgraphhighlightsthe soundnessof Equation (2), with a noticeable deviationbeingattributable onlyto simulationvariability. Mean-BasedResultsfora Spatial Model Specification: The Autoregression SAR Model here Findingsbased on a SAR model are illuminating as a CAR model because a SAR model can be rewritten models (see Cliffand Ord 1981), whereassemivariogram and Layne 1997). can be directly relatedto it (see Griffith TheCase ofa SingleGeographic Mean Griffith (2003) reportsfindingsforEquation (2) and its extensionto expression(Al) in AppendixA, including the followingconjecture,whichis a slightimprovement and Zhang (1999): on the resultreportedby Griffith 1.0 0.8 (B) 1800 TR(V-') n*- -n. 743 I Variable -f0-i--0- 16800, Griffith Crsie 1400 1200 ~ 1000 E so 800 400 400 .." 6 200 400 600 800 1000 1200 1400 1600 1800 n*fromsimulation of the simulatedstandarderror(100 Figure1. (A): a scatterplot versusa standarderrorcomputedwiththe variance replications) inflation factor(VIF) and Equation(2), denotedby solid circles solid straight grayline denotespredicted ( ); the superimposed valuesproducedbytheestimatedregression equation.(B): a scatterplotofn*computedwithEquation(2) versusfi*computedwith Equation (3), denotedby asterisks(*), and with approximation Cressie's(1991, 15) equation,denotedby open circles(o); the solid straightgrayline denotes predictedvalues superimposed estimatedregression the equationbased upon Equaproducedby and thebrokenstraight tion(3) results, graylinedenotespredicted valuesproducedby the estimatedregression equationbased upon Cressie'sequation'sresults. If georeferencedattributevariableY is normallydisso, and p is the spatial autotributed,or approximately correlation parameter estimate for a SAR model then the effectivesamplesize is givenby specification, 1 x1 n-l(1i1237 - 1 - e1.92369 n 204 (3) Of note,again,is thatthenormality assumptionis critical here. In addition to a nonlinearregressionanalysisof empiricalcases used to calibrateEquation (3), itsvalidity Griffith 744 can be assessedthroughsimulationexperimentation. The experimentused to validate Equation (2) also was used to validateEquation (3). Figure1B portrays a scatterplot of n* computedwith Equation (2) versusfi*computed withEquation (3). The goodness-of-fit of the regression line appearingin thisgraphhighlights the soundnessof Equation (3). Of note is that alteringthe geographic connectivitydefinitionresultsin slightbut perceptible variationabout values calculatedwithEquation (3). Cressie (1991, 15) reportsa comparable effective whichalso was used to predictn* for samplesizeformula, the simulationexperiment(see Figure1A). Equation (3) outperforms Cressie'sequation,yieldinga mean squared error(MSE) of55 thatis substantially less thanthe MSE of 334 forhis equation. TheCase ofTwoGeographic Means Followingthe same logic that establishesEquation (2), also reports,for the bivariateweightedmean Griffith case (see additionaldetailsin AppendixA) specifiedin termsof a SAR model, the conventionalvariance term in expression(Al), namely w2 ?( 22w(1 n -w)pxyvxGy , (4A) where 01 and cro respectivelydenote the variance of variablesX and Y, Pxydenotes the correlationbetween attributevariablesX and Y,w (0 < w < 1) is the weight applied to the mean of variable X, and the term 2w(1 - w) adjusts for the presence of redunpxvyxoy in a bivariategeoreferenced dant attributeinformation dataset. This expressionis multipliedby the VIF appearingin expression(Al), namely, w22TR{ [(I - - 1xW)T(I A closer inspectionof this conventionalvariance expressionreveals that it containsthe individual,weighted, standarderrortermsw2o2/n and (1-w)2o1/n. A closerinspectionofthisVIF termrevealsthatit contains the weighted, individual VIF terms w2&2TR{[(Iand (1-w)2&2TR{[(IxW)]- }/n 5xW)W(I-can be seen in Equawhich I5yW)T(I-yW)]-I}/n, tion (2). One surprisehere is that this VIF expression does not include the cross-productterm involving And a closer inspection of [(I- pxW)T(I1yW)1'. termreveals that it containsthe this means variability factorforeach individualvariablen*,as well as prorating term. a cross-products A simulation experiment based on n = 625, = = 1 and 100 replicationswas conducted to esc- o,the validityofexpression(Al) acrossthe rangeof tablish w and PxY values. Figure2 portraysa scatterplotof the simulatedstandarderrorversus a standarderrorcomof the puted withexpression(Al). The goodness-of-fit regressionline appearingin this graph highlightsthe soundnessofexpression(Al), withnoticeabledeviations being attributableonlyto simulationvariability. empiricalcase studiesthat Graphsfortwo illustrative portraythe curve describedby expression(Al) when 1 #1#l appear in Figure 3. Relevant statisticsfor these two examplesused to constructthese graphsappear in Table 1A. The curvesportrayedin Figure3 may be approximated withthe following equations,whichare equivalentto but simplerin formthan the one reported in Griffith (2003, 85) and demonstratethat the joint n* value essentiallyis a weightedfunctionof the two effectivesamplesizes that can be computedseparatelyfor the individualmeans: 0.5 xW)]-1} + (1 - w)2&2TR{ [(I - iW)T(I- W)-/n, 0.4 (4B) where px and py, respectively, are the spatial autocorrelationparameterestimatesforvariablesX and Y. The resultingproductthen is divided by the termdenoting ofmeansin the presenceofnon-zero samplingvariability spatial autocorrelation,also appearing in expression (Al), namely, .? 1XW)T(I- 15XW)]-11 T _ + (1 - w)2 2 1W)11 lT[(I- _5W)T(I- + 2w(1- w)px 1/n2. 1T[(I- 5xW)T(I15W)]y~x?y (4C) .i-1. a02 S0.1 0.0 W221T[(I / .*0 0.3 ,, 0.0 0.1 0.2 se-formula 0.3 0.4 Figure 2. A scatterplotof the simulated standard errorversus a standarderrorcomputedwithexpression(Al), denotedby solid circles ( * ); the superimposedsolid straightgrayline denotes pre- dictedvaluesproducedbytheestimatedregression equation. EffectiveGeographicSample Size in the Presenceof Spatial Autocorrelation (A)(A)i 13 Murray mean elevation 2.0171 12 + S AsW2.0171 (1 11 10 (1 9 + W2.0171 (1- - 8 = 7 elevationstandarddeviation 6 0.0 0.2 0.4 0.8 0.6 1.0 w (B) 745 7 As 76 75 74 + nPb W)1.3475 - --)W1.3475 W)1.3475'pseudo-R2 0.9994, (5B) where pseudo-R2 is the squared multiple correlation coefficient(R2) betweenn* and fi*.Because of the role played by variance terms,which are specificto cases, onlythe generalformof the equation can be established at thistime.This empiricalanalysisfurther corroborates the validityof expression(Al). As an aside, the extensionof this two-meansto a resultis summarizedin AppendixB. multiple-means 73 TheCase ofa Pearson Product-Moment Correlation Coefficient 72 71 70 69 Pb 68 0.0 0.2 0.4 0.6 0.8 1.0 w 3. Plotsofthebivariate curvefortheillustrative Figure examples; theapproximation curveisdenoted line(...), andthe bythedotted oftheexactn*valuesis denoted scatterplot bysolidcircles( . ). the PuertoRico digitalelevation model(Dem). (B): the (A): Smelter site. Murray PuertoRico n-i nxy 1+-1 n S (1 W.9 q- 1.589968 -- wu)1.5968 (1 - w)1.5968 = Spatial scientistsare often interestedin measuring relationshipsbetween,ratherthan means of,geographicallydistributedvariables.Details forcomputingeffectivesamplesize in thiscontext,again assumingnormally distributedvariables,appear in Appendix C. The followingapproximaterelationshipholds between the individual mean-based univariateeffectivesample sizes and theircorrespondingbivariatecorrelation-basedeffectivesample size: (1 w1.5899+ 0.9998, - ' w)1.5968 x n pseudo-R2 (5A) and + i(Px + PY)1.161 0.04vnxn (6) This approximationresultsin: fiy = n when spatial auis absent;iy asymptotically tocorrelation convergingon 2 frombelow when Px = py~ 1 and Pxy=1; and, fi~y asymptotically converging on roughly 5 when Px = PY 1 and Pxy = 0, highlightingthat it is an Table 1A. Selecteddescriptive statistics forthePuertoRico digitalelevationmodel(DEM) and Murraysmeltersitesoil contaminants Landscape Puerto Rico(n= 73) smelter site(n= 253) Murray Variable DEM meanelevation (e) DEM elevation standard deviation (se) Arsenic (As) Lead(Pb) Standard deviation ? 0.80270 3.54528 3.46316 7.69417 0.83139 0.67638 0.53180 0.49363 Bivariate correlation n* 0.68102 0.74775 6.24 12.69 68.24 76.95 NOTE:Griffith ofthesetwodatasets. TheMurray, is a superfund elevation model. (2003)provides descriptions Utah,landscape digital site.DEM denotes Griffith 746 desiredvalues here are 1 approximation;theoretically, and 2. ResultsforfiveThiessen polygon-basedempirical examplesappear in Table iB; these resultsdemonstrate that Equation (6) furnishesa good approximationfor Equation (C1). These findingshighlightthe implicationthatimpacts of spatial autocorrelationcan be mitigatedto some extentby incorporating redundantgeoreferenced attribute informationinto an analysis,a natural formof which arises in space-time data series. Lahiri (1996, 2003) notes that this is one way of regainingestimatorconsistencywhen employinginfill asymptotics(i.e., the sample size increases by keeping the study area size constantand increasingthe samplingintensity). j# 3 1 2r 1 1 +2 r3 n d i= 1 j=1 n dij r, +'~1+5.12~;1 nexponential 1+ Z i=l j=l <_ (8a) e-dij/r n j#i n-i = 1 (1 + 51.4879) 1+ Model Geostatistical/Semivariogram Z i=1j=l K-Bessel Mean-Based Results for Selected 1.776,and )1.7576'and (8B) (8B) n ' n c~ Kd K1) i=1 j= 1 Specifications j#i Geostatisticalmodels involve definingmatrixVratherthan matrixV. The scalar formof expression(2) forsemivariogram modelsis givenby n spherical n(Co + Ci) n(Co + C1) - C1 Zf(di, r) i=l j=l j#i n, n n (C--C1) i=lj n" (7) r) 1f(dij, j~i wheref(dij,r) denotesa particularsemivariogram model withrangeparameterr,nuggetCo, and slope C1, withdij denotingdistanceseparatinglocationsi and j. The sillin a semivariogrammodel "representsa value that the tends to when distancegets verylarge," semivariogram and hence at such extremelylargedistancesequals the variance of the variableunder study(K. Johnsonet al. 2001, 283), Co+C1. If no spatial autocorrelationis termin the denominatorof present,thenthe right-hand Equation (7) equals (n - 1), and n* = n; as spatial autocorrelationincreases,the denominatorincreases [its right-handterm goes to n(Co + C1)/Co + nC1, since f(dij,r T) -+ 1], resultingin n* decreasingfromn to 1 when Co = 0. The followingare three model-specific special instances of Equation (7), when Co is zero, where the approximationformula is given by 1 + (n - 1/(1 + )c), in which dmax denotes the bd distance,whichis the counterpart maximuminterpoint to Equation (3) forsemivariogram models: n n-i S1+ (1 + 69.6698 1(8C) )1.8601 (8C) whereK1 is a modifiedBessel functionof the firstorder and second kindand the respectiverelativeerrorsumsof are 5.4 x 10-7, squares (RESSs) forthe approximations 8.9 x 10-6, and 7.5 x 10-7. The spherical model findinghelps highlightwhy average nearest-neighbor distance could serve as an informativeindex about spatial autocorrelationhere. For the sphericalmodel, = 1. If n = 2, n* = 2/2if n= 1 then n* =i As d12 0, n* -+ 1; in the limit, (1 dl2 - d2). -when an observationis replicated, only a singleobservationeffectively exists.As dl2 ---+r,n* 2; in thelimit, when two observationsare farenough--apart,theycontain no duplicated information.For the exponential model, again, if n =1 then n*= =1 1=. If n = 2, 0 n* = 2/(1 + e-d12/').As d12- 0, n* - 1; as dl2 oo, n* - 2. For the K-Bessel functionmodel, once more, if n = 1 then n*= 1 = 1. If n = 2, n*= 2/(1+ 1K1 (?i2)). As d12 -+ 0, n* - 1; as d12 -+ 00, n* --+ 2these particularlimitscan be confirmedwith Maple or Mathematica.Graphs of these three functions-based on nonlinearregressiongeneralizationsof a simulation experimentusing 250 randomlyselected pointsfroma unit square geographicregion and with 500 replications-appear in Figure4. These graphssuggestthatthe range parameterof a semivariogrammodel behaves similarto p for an autoregressivemodel; the (practical) range increasesas the degree of spatial autocorrelation increases.These graphsalso suggestthat spatial EffectiveGeographicSample Size in the Presenceof Spatial Autocorrelation (A) models.This correspondenceis closestforgeoreferenced data distributed acrossa regularsquare tessellation(e.g., sensed remotely data). Whereas autoregressivemodels are specifiedin termsof an inversecovariance matrix, models are specifiedin termsof the assemivariogram sociated covariance matrixitself.Therefore,Equation (2), expression(Al), and Equation (C3) also can be employedwithsemivariogram modelingresults. 225 1: 200 150 o1 100 50 0\ 0.0 TheCase ofNegative SpatialAutocorrelation 0.2 0.4 0.2 0.4 0.2 0.4 r/dmax 0.6 0.8 1.0 0.6 0.8 1.0 0.6 0.8 1.0 (B) 250 250O * 200 150 100 50 0.0 r/dmax (C) 250 200 150 c 100 'soi 50 0.0 747 r/dmax Figure4. Effectivesample size (n*) for selectedsemivariogram modelsacrosstheirrespective based uponsimurangeparameters lation experiments (250 replications);solid circles( * ) denote simulation function reresults,and dots (...) denotegeneralized sults. (A): sphericalsemivariogram. (B): exponentialsemivariogram.(C): Besselfunction semivariogram. autocorrelationincreases as a model descriptiongoes from the spherical specification,to the exponential to the K-Besselfunctionspecification. specification, This implication is consistent with Griffithand Csillag's and Layne's (1997, 1999) findings (1993) and Griffith that conceptuallyand numericallylink the exponential and CAR models and the K-Bessel functionand SAR A large majorityof georeferenced data displaymoderate positivespatial autocorrelation.One exceptionis remotelysensed data, whichtend to displayverystrong positive spatial autocorrelation.Negative spatial autocorrelationis not encounteredin practicenearlyas often as is positivespatialautocorrelation. Rare examplesof it are furnished Anselin (1988), and by Griffith, by Wong, and Whitfield(2003). Richardson (1990) notes that when negativespatialautocorrelation is present,n*> n, whichat firstglanceseemscounterintuitive. But negative autocorrelation is more than an antithetic spatial nothing variate (see Hall 1989) in spatial guise, which allows moreto be gleanedfromless,ratherthan less frommore when spatialautocorrelation is positive.This typeof result can be obtained with Equation (2) by lettingthe autoregressiveparameterbe negative (i.e., p<0). For example, forthe Murraysite, n* nearlyreaches 1,200 when p - 0.9. The wave-holespecification is themost popular semivariogrammodel for describingnegative spatial autocorrelation(other possibilitiesinclude the cubic model). But forit, as the rangeincreases,n* decreasesfromn to 1. This fundamental difference between thesetwoclassesofmodelsis attributable to factorsother thancontinuity(e.g.,autoregressive modelsdependupon a rigid neighborhoodstructurefor discretizeddata that is defineda priori,whereasgeostatistiaggregations cal models tend to over-smoothwhen spatialintensities are scatteredin smallclusters)and matrixinversionimpacts (e.g.,edge effectsdue to the shape and size ofboth a regionand the areal unitsintowhichit is partitioned). The wave-hole semivariogram model actuallybehaves morelike the AR(2) timeseriesmodel Yt=- + 4-1gt-1 4 q02Yt-2 St, for which pi1>0 and 4P2<0 (e.g., l=0.66 and (P2= - 0.22). As such, positivespatial autocorrelation stillexistslocallyat relativelyshortdistances.This feature is necessarilyso because in a continuoussituation, changingnearbydistancesby a smallamountcannot be accompaniedby continualdissimilarity. Griffith 748 A Remotely SensedImageExample clude: Baileyand Gatrell(1995,254) analyzeLANDSAT 5 "ThematicMapper"(TM) sensordata froma 1-km2, oftheHighPeakDistrict 30 x 30 pixels(n= 900) portion a Thisimageincludes a mixoflanduses,from inEngland. mixed locatedin the southwest reservoir part,through thesouthandconiferous woodland deciduous inhabiting the andmoorland centralarea,to roughgrazing covering In thesedataare northern partofthisregion. otherwords, hereis in The illustration discussed veryheterogeneous. Band #4 (B4) to spectral termsof the ratioof spectral thenearBand#3 (B3),bandsthat,respectively, represent of the electromagnetic and red wavelengths infrared ofthisratioprovisualization The geographic spectrum. in biomass, with ofspatialvariation videsa goodpicture in B4,which reflecting strongly healthy greenvegetation whereasitsenergy abmeasures thevigorofvegetation, sensedin B3,whichaidsin theidenis strongly sorption ofplantspecies. tification Box-CoxpowertransThe following heterogeneous was appliedto the biomassindexto better formation stabilizethe varianceacrossthe HighPeak geographic indexwitha thetransformed landscape,betteraligning normaldistribution: LN B4 + 29 L[B317 rank/B, 0.44 +1 -0.18] rankB4,/B3 n- 0.98834 and n* = 5.12. In other words, on average,the 900 spatiallyautocorrelated pixels the Peak attain the forming High image approximately statistical as five precision only independsamelevelof entpixels. Eleven semivariogram modelswere fittedto these HighPeakdata. In all butone case (cubic),theCo inestimate wasvery (i.e.,nuggeteffect) tercept parameter close to zero,and thus,forsimplicity, Co was set to zero. Estimation and (practical)rangeresultsappearin Table 2. Mostofthesemivariogram modelsfurnish a verygood of spatialautocorrelation latentin these description data. MostRESSs are roughly 2-3 percent.Semivariogramplots(Figure5) also revealthatthe modelpredictionsclosely track the data; the two poorest are providedby the cubic model,which descriptions tendstoyieldvaluesthataretoohighforshortdistances, andtheGaussianmodel,whichtendstoyieldvaluesthat are too lowforshortdistances. Effectivesample sizes here rangefrom6.75 to 17.42. Althoughthesevaluesare ofthe sameorderofmagnitude as the one producedwiththe SAR model,they tendto be noticeably theK-Bessel Furthermore, higher. whichis thetheoretical model function, semivariogram companionforthe SAR model,does not producethe closestof the set ofn* valuesto 5.12. Of noteis that the SAR value is consistent withwhatwouldbe obtainedwitha timeseriesforwhichj = 0.98834: n* = 900(1 - 0.98834)/(1 + 0.98834) = 5.28. (9) whereLN denotesthe naturallogarithm. SpatialSAR data inresultsforthesetransformed modelestimation == And it is consistent with the value of 5.83 ren- deredby Cressie's(1991, 15) formulae, forwhichthis Variable * gamma 0.09- * spherical 0.08) * 0.07 - & * 4 0.06E 0 E 0.05 ' eassel circular 0.03 0.01 penta-spherical quadratic rational + Gaussian X cubic 0.04 0.02 exponential A stable - 0.00-" 0 0.0 0.1 0.2 0.3 standardized distance 0.4 5. Semivariogram Figure plotsandpredicted valuesforelevengeostatistical modelsdeintheHigh latent scribing spatial dependency Peakbiomass index. EffectiveGeographicSample Size in the Presenceof Spatial Autocorrelation 16 terval (56.3, 74.5) and 68.2 fallingabove the interval (46.0, 67.6). 15 and a SingleMean: The MurraySmelter InfillSampling SiteExample 14 b 12 10 80 6 749 0.25 0.30 0.35 0.40 0.45 range 0.50 0.55 0.60 0.65 6. Modelestimated versus effective Figure (practical) range sample size(n*)for11geostatistical theHighPeakbiomass index. models, precedingtime series resultis the asymptoticlimitin termsof n. the negative relationshipbetween the Interestingly, (practical) range and n* can be detected fromthese modelingresults (see Figure 6). In other words, as a spatial dependencyfield increases in size, more information becomes redundant,and effectivesample size decreases. TheMurray Smelter SiteRevisited To illustrateresultsobtainableusingirregularspaced pointdata and semivariogram depictionsof spatialautocorrelation,the Murraysmeltersite data were analyzed in detail. SAR model descriptionsof arsenic (As) and lead (Pb) are reportedin Table 1B. As previouslyseen with the High Peak data, these SAR-based, effective fromthe rangeof similarsemisample size values differ variogrammodel resultsreportedin Table 3, although theyall are of the same orderof magnitude.Here the values are lower than theirautoregressive counterparts, withthe SAR-based values of 77.0 fallingabove the in- Semivariogrammodels, because of theircontinuous nature,are especiallyusefulforassessingthe situationof infillsampling(i.e., the size of a regionis held constant while samplingincreasingly is moreintensive).In other words,fora given region,as samplingbecomes increasinglymoreintense,whathappensto n*?In thiscontext, n* shouldbecome a functionofthe averagefirstnearestbetween sample locations. neighbordistance,say dmN1, locations decreases, the As distance between nearby overall amount of redundantinformationin a sample containingspatialautocorrelationwill tend to increase. A second exploratory simulationexperiment was conductedwiththe Murraysmeltersitedata, whichis based stratifiedrandom sampling upon hexagonal-tessellation (Stehman and Overton 1996). Sample size was sequentiallyincreasedfromroughly100 to roughly2,000 100. The designforthis by incrementsof approximately typeof samplingis outlinedin AppendixD. The hexagonal tessellations forsamplesofsizen = 104 and n = 2,008 appear in Figure7. Because these samplingschemesare beingused forillustrative purposes,the noncoveredsectionsofthelandscapeare ignored;in reality, thesepartsof thelandscapewouldbe coveredwithpartialhexagonsthat then could be groupedinto a set of artificial piecemeal hexagonswhoseindividualareaswouldequal thatofeach completehexagonforsamplingpurposes. Relationshipsbetween n* and n, by semivariogram model specification(see Table 3) forAs and Pb separately,are portrayedin Figure8A. These graphssuggest asymptoticdiminishingreturnsfor n* in each case. Meanwhile, as n increases, the average firstnearestneighbordistance,dNN1,fora regularhexagonal tessellation will tend to decrease; this value, which can be TablelB. Selecteddescriptive statistics forfiveparticular geographic landscapeexamples Geographic landscape variables x Puerto RicoDEM Kansasoilwells Texas Texas Texas Texas Texas site Murray Minnesota forest stand Elevation mean& variance & % shale Thickness pH & Se u & SO4 Mo & B V & As Bu & SO4 As & Pb basalarea& suitability index 0.8314 0.3743 0.1398 0.3452 0.3218 0.3881 0.4530 0.5318 0.3648 0.6764 0.1093 0.3605 0.6834 0.3022 0.6878 0.6834 0.4936 0.3569 xv n 0.6810 - 0.5463 -0.0031 - 0.2521 - 0.4942 0.7122 - 0.7686 0.7478 0.4200 73 124 127 127 127 127 127 253 513 n n 12.69 6.24 52.95 98.97 95.03 56.82 59.07 20.93 62.59 65.66 52.92 20.56 44.44 20.93 68.24 76.95 219.49 224.27 nxy 30.97 119.38 120.57 90.12 113.98 84.55 77.31 179.08 436.52 xy 29.12 117.25 120.59 91.23 114.26 85.65 80.52 175.41 437.48 750 Griffith Table2. Selectedsemivariogram sensedimagebiomassindex modelingresultsfortheHighPeakremotely C1 r 100x RESS range (practical) n* 0.0858 0.1066 0.0986 0.0868 0.0962 0.0923 0.0837 0.0538 0.0852 0.4772 0.1605 0.3527 0.2133 0.1857 0.4298 0.1301 0.1001 0.1505 0.5002 0.3107 0.0431 0.5837 2.7 2.3 2.2 2.4 4.1 2.8 8.8 45.8 3.5 2.5 4.5 0.3527 0.6399 0.4956 0.4298 0.5671 0.4004 0.2607 0.5002 0.3107 no practical range essentially no range 15.55 6.75 9.15 14.86 9.03 11.95 17.42 fit unacceptable 15.89 NA NA Model Spherical Exponential Stable Penta-spherical Rational quadratic Bessel Gaussian Cubic Circular Cauchy Power and Layne(1999, 468). Griffith Practicalrangesare calculatedfollowing C1 denotesthevarianceestimate(i.e.,Co = 0); r denotesthe rangeparameter. standardizedby the observedmaximuminterpointdistance, yielding , will be a functionof the regular hexagonal tessellationcentroids. Infillsamplingresultsare fora fixedrangeparameter, - following and functionof'""[ maybe expressedas the r, dmax and of the productn dNN1,whichis equivalentto the sum of the individualpoint nearest-neighbor distances divided by the maximuminterpointdistance: r ninfill n 1 + be-O.l3867c-3.46687d x (1 + (10) be-CxdaNN/dmax-dxndNNl/dmax) ,4 whichequals n, as it should,whenthe uniform spacingof all neighboringpoints is at a distance exceeding the is the (practical) range. The quantity 1+be-.sl67c-3.46687d infillasymptoticeffectivesample size discountfactorby whichn needs to be multipliedto calculaten*. Estimates of b, c, d, and the discountfactorforthe Murraydata appear in Table 4. A scatterplotforthe observed and predictedvalues associatedwithEquation (10) appears in Figure8B, corroborating the high pseudo-R2values reportedin Table 4 that implya close correspondence betweenEquation (10) and the infillsamplingresults. Whereas Equation (7) describeshow n* changes as in an attributevariablechanges, spatial autocorrelation Equation (10) describesparticularmodel instances of how n* changes as spatial autocorrelationin a sample ofthe attribute changes,whilethe spatialautocorrelation variableremainsconstant.Of note is that the practical model is less rangeofAs forthe Gaussiansemivariogram than dNN1forthatcase, resultingin n* = n forboth the univariateand bivariatecases, as it shouldbe. anda SingleMean:A Skeetshooting Sampling Infill SiteExample Superfund A thirdexploratorysimulationexperimentwas conductedwithdata collectedfora skeetshooting superfund site. For evaluation purposes,236 surfacesoil samples werecollectedin the superfundsite. A generalizationof of these measuresreveals a the geographicdistribution was intensivelysampled. site that within the singlespot used The samplingnetwork throughoutmostof the site Table3. Selectedsemivariogram standardized resultsforsoilsamplesfromtheMurray siteusinga maximum smelter modeling distanceof0.20; themaximum standardized distanceis 1.05010 Lead(Pb) Arsenic (As) Model Moec0?cc C c0?cc 100 RESS c Range/practical range100x RESS n ny 0.08511 67.0 33.4 73.5 0.091890.90811 52.0 31.0 0.10630 64.6 0.024870.97513 0.12260 30.9 56.3 0.000001.00000 46.0 33.2 0.10371 64.4 73.4 0.088190.91181 31.7 59.6 0.076420.92358 47.2 0.12476 58.2 0.09130 32.0 70.2 0.085610.91439 66.1 35.0 0.07175 74.5 0.185880.81412 33.6 67.6 0.07607 71.9 0.107310.89269 Range/practical range100xRESS n c agepatcalC rCng/pracricalSrange 0.132350.86765 Spherical 0.036610.96339 Exponential Stable 0.002220.99778 Penta-spherical0.113450.88655 Rational 0.075610.92439 quadratic Besselfunction 0.094680.90532 Gaussian 0.215660.78434 Circular 0.162360.83764 0.08139 0.09179 0.10922 0.09566 0.10600 0.08021 0.06706 0.07522 44.9 43.4 43.2 44.9 44.3 44.4 47.2 45.1 751 EffectiveGeographicSample Size in the Presenceof Spatial Autocorrelation (A) (A) 140 120 9*x ** 60 10600 100 2000 150 20 ** (B) n ** ** 1.01.0* * 0.0 0.8 0.0 0.8~ c * 00 4E 0.6 0 0 2 (B)n-hat/n 0.4 0.4 - 0.45 0 1000 0 ? . 1500 2000 0 4 0.2 (B) 0.0 0.0 0.2 0.4 0.6 n*-hat/n 0.8 1.0 sitedata.As smelter resultsfortheMurray Figure8. Infillsampling is denotedbya solidcircle( * ). Pb is denotedbyan asterisk(*). valuesfrom Equa(A): n* versusn. (B): n*/nversuspredicted tion(10). inthe tessellations 7. Thetwoextreme employed Figure hexagonal smelter sitesampling (A): n = 104.(B): n = 2,008. Murray design. reflectsa square gridpattern.The frequencydistribution of Pb concentrationin this geographiclandscape is apThe 236 soil samdistributed. log-normally proximately a to were used locations generate Thiessen polygon ple surfacepartitioningin order to measure spatial autocorrelation.The Moran Coefficient(MC) based on the measures is 0.48873 (test statistic: log-transformed z = 12.7), which is significantand indicates that a moderate tendencyexists for similarvalues of log-Pb concentrationmeasures to be in nearbysample locamodel was foundto tions.The K-Bessel semivariogram furnishthe best descriptionof spatial autocorrelation latentin these data. (See Thayeret al. 2003.) Again forinfillsamplingassessment,sample size was sequentiallyincreasedfrom20 to roughly2,000 by incrementsof approximately100. The hexagonal tessellationsforsamplesofsizen = 20 and n = 1,997 appearin Figure9. Estimates of the K-Bessel functionsemivariogram model parametersand the discount factor prorating 752 Griffith Resultsfora SpatialFilterModel (A) Specification (B) Figure9. Two extremehexagonaltessellations employedin the site samplingdesign.(A): n = 101. (B): skeetshooting superfund n= 1,997. coefficients forthe skeetshootdata appear in Table 5. autocorrelation latent in these Pb data is much Spatial strongerthan that latent in the Murraysite data (see Table 4), a featurereflectedin the smaller discount factor.Furthermore, the goodness-of-fit forthe discount factorEquation (10) impliesjust as close a trackingof the data as is foundforthe Murraysite. Spatialfiltering techniques(Getis 1990, 1995; Griffith 2000, 2003; Borcard and Legendre 2002; Getis and Griffith 2002) allowspatialanalyststo employtraditional regressiontechniqueswhile insuringthat regressionresiduals behave accordingto the traditionalmodel asin theseresiduals. sumptionofno spatialautocorrelation One spatial filtering method exploitsan eigenfunction decompositionassociatedwiththe MC. A spatialfilteris constructedfromthe eigenfunctions of a modifiedgeoof graphicweightsmatrixthat depictsthe configuration areal units in the MC and is used to capture the covariationamong attributevalues of one or more georeferencedrandom variables. The simplestversion of thisweightsmatrixis denoted by the binary0-1 matrix C. Spatial filtering uses such geographicconfiguration to partitiongeoreferenced information data into a syntheticspatialvariatecontainingthe spatialautocorrelation and a syntheticaspatial attributethat is free of spatial autocorrelation. The precedingspatial autoregressiveand geostatistical modelsare nonlinearin form;a spatialfiltermodel is linear in form.In addition, the eigenvectorsused to constructthe aforementionedspatial filtercome from the followingmodifiedversionof matrixC foundin the numeratorof a MC: (I - 11T/n)C(I - 11T/n), where (I - 11iT/n)is the projectionmatrixcommonly found in conventional multivariate and regression analysisthatcentersthen x 1 vectorofattributevalues. The eigenvectorsof thismodifiedformof matrixC are both orthogonaland uncorrelated.Consequently,the of the mean of some attributevarisamplingvariability able Y is givenby the standardresult,c2/n. But when Table4. Coefficient estimates forEquation(10), and the resulting discountfactor, spatialautocorrelation bysemivariogram modelspecification, fortheMurraysmeltersite Arsenic (As) Model Spherical Exponential Stable Penta-spherical Rational quadratic Besselfunction Gaussian Circular Pseudo-R2 b 68.908 40.567 39.583 59.645 43.906 46.542 120.588 81.921 c 6.233 -0.799 - 1.467 4.916 0.010 1.440 10.568 7.365 d 0.223 0.163 0.190 0.207 0.191 0.171 0.268 0.243 0.9969 Lead(Pb) discount factor b c 0.06942 0.03742 0.03830 0.06355 0.04242 0.04536 0.08332 0.07296 67.902 40.722 32.623 56.928 38.716 44.739 66.168 87.816 5.004 - 1.941 - 6.090 3.030 -4.464 - 0.593 4.581 7.172 d 0.223 0.194 0.161 0.207 0.169 0.180 0.218 0.250 0.9980 discount factor 0.06010 0.03549 0.02251 0.05195 0.02442 0.03696 0.05733 0.06822 753 EffectiveGeographicSample Size in the Presenceof Spatial Autocorrelation discountfactor, Table5. Coefficient estimates forEquation(10), and theresulting bysemivariogram spatialautocorrelation site modelspecification forlead(Pb),fortheskeetshooting superfund coefficients Infill prorating sampling coefficients Semivariogram model Co Besselfunction Goodness-of-fit O0 C1 131.122 0.1916 1.0706 100x RESS= 17.5 factor discount d c b Practical range - 0.101 0.375 = 0.9989 Pseudo-R2 0.02686 0. Forsimplicity, wasnotsignificantly from tTheestimated different valueofCo is0.1284,which then,Co wassetto0. is overlooked, then Y2is inflated, spatialautocorrelation as has been shownforautoregressive and geostatistical ThisVIF is givenbythestandard modelspecifications. corresultof 1, whereR is themultiple regression relationcoefficient forattribute variableY regressed H eigenvectors containedin a spatialfilter, on yielding ^2 e*2 In otherwords,the effective samplesizefor (1-R2)-.. ^. linear this modelis givenby n* - (1 - R2)n, (11) R2. Of note whichproducesa linearplotforn*versus is thatthe degreesoffreedom adjustment, n-H-l, occursin thenumerator of ^&2onlyin thecommoncase oftheregression beingunknown, highlightparameters a that is not function of this truly adjustment. ing n* If zerospatialautocorrelation is containedin variable Y,thenR = 0, and n* = n; as R -+ 1,n* goes to one, with an upperlimitofR onlyn- 1 eigenvectorsare since one eia available forconstructing n-; spatial filter, isproportional tovector1,whichis thevector genvector term. forthelinearregression intercept inTable6. valuesofn*arereported Spatialfilter-based then*valuesforAs andPb inTables1 and6 Comparing indicatesthatspatialfiltering producesa moremodest when converting adjustment samplesize to effective size in the of non-zero presence spatialautocorsample relation. Resultstendtobe verysimilar wheneitheran R2 is criterion or a residualMC minimization maximization used.In caseswheretheresidualspatialautocorrelation failsto become trivial(e.g., High Peak biomassand in Table6), the threshold PuertoRico meanelevation, value of MC could be reduced,prominent negative couldbecomecaneigenvectors spatialautocorrelation inclusionprobability didates,the stepwiseregression couldbe couldbe altered,and/or contiguity geographic could redefined (e.g.,a "queen's"connectioncriterion Of fora "rook's"connectioncriterion). be substituted values noteis thattheaverageunivariate semivariogram in Table1A. arecloserto thosevaluesreported Model-Informed Implications: Sampling Geographic Results reviewedand new findingsreportedin this interested to geographers articleare ofgreatimportance dataandprovide in collecting necessary inputformodelinformed designs.Duringtheplansampling geographic a a of may spatialresearcher ningstage study,quantitative on a sample whendeciding thefollowing naively compute sizefora givenpowerin orderto estimatetheregional variableY (Ott 1988,147): mean,pL,ofattribute _ (Z1-r/2 A2Y +1-Z-) 2 used to computen* forthreeparticular Table6. Selectedspatialfilter(cx= 0.10 forinclusion)features geographic landscapeexamples Geographic landscape Rico Puerto variable DEM meanelevation DEM elevation deviation standard smelter site arsenic (As) Murray lead(Pb) HighPeak biomass Eigenvector (MC>0.25) criterionK selection MaxR2 MinMC MaxR2 MinMC MaxR2 MinMC MaxR2 MinMC MaxR2 MinMC MC n R2 Residual ZMcSpatialfilter 2.7356 11 0.7487 2.2617 15 0.7849 0.7685 9 0.5876 12 0.6230 0.4924 19 0.4520 - 0.5589 0.0001 19 0.4317 13 0.3684 -0.1687 0.0021 15 0.3642 8.4710 191 0.9698 8.0830 214 0.9717 0.6155 0.6753 0.6316 0.6856 0.7601 0.8306 0.7331 0.7989 0.8882 0.9062 73 73 73 73 253 253 253 253 900 900 n* 18.34 15.70 30.11 27.52 138.64 143.78 159.79 160.86 27.18 25.47 Griffith 754 representsthe Type I error(i.e., rejecting whereZ1- 0/2 thenullhypothesis whenitis true)probability fora twotailedtest,Zi-_p represents theTypeII errorprobability, and A = I9 - jol, with t and respectively, denoting to, thenullandthealternate means.The valueof hypothesis n rendered seeksto allowa predebythiscomputation desiredlevelof statistical to be obtermined, precision tainedforan analysis. n* resultsare helpful All threeofthepreceding with domainsampling (i.e.,thesizeofa geographic increasing isexpanded inordertoincrease thenumber ofareal region from units).Accordingly, Equations(3), (7) and (11), becomesrelevant.Considera current project sampling soilPb pollutionacrossthe whosegoal is to determine 2002).Then,basedon CityofSyracuse(alsosee Griffith a pilotstudyinvolving167 samplepoints(see Figure valuesLN(Pb+3), a K-Beattribute 10A), transformed model whose practicalrange is ssel semivariogram distanceunits,andan equation 0.09392,in standardized oftheformgivenbyEquation(10), 2 n =(Co + C1) x [1 + 47.9041e-(0.0958)(-7.4247)-70(0.0958)(0.1317) (13) spatialautoregression n 2 n ("e*(Z1-/2-+Z-)2 A A~2 =1 -- 1 -e-212373p+020024V 1 ---1 "92349(12A (12A) 1-e-21-e1.92349 12373p+0.20024V/ n= geostatistics - x + (Co + C1) +Zl-f2 (Z1-/2 (1+b dn substitutions whichcontainsthe following [see Equa= at n value of 70 (effective) [the ranger tion(10)]: n, (A) - C, (12B) and 2 n= spatialfiltering (Zl-0/2 A2 . x * *x (Z1-a/2+Zl-~)2 1-R2 1 2 (12C) FortheHighPeakbiomass exampledata,82 = 0.267012, fromthe SAR model analysis6.2 = 0.074242,from the K-Besselfunctionsemivariogram model analysis 6 . = 0.092342,and fromthe spatialfiltering analysis = 0.024322.Ifthemeanofthenormally distributed ^ transformed biomassindexis hypothesized to be 2, the maximum to be detectedis 0.1 (A), a twodiscrepancy tailedhypothesis testwitha 5 percent levelofsignificance is to be employed(i.e., Zl - ,/2= 1.95996), and statistical poweris setto 0.9 (i.e.,Zi _p= 1.28115),thenratherthan n = 75 (i.e.,approximately a 9 x 9 image)whenspatial is overlooked, autocorrelation the SAR modelresults thatn = 1,236,theK-Bessel model suggest semivariogram results thatn = 2,382,andthespatialfilter model suggest resultssuggestthat n = 963. In otherwords,rather than the 30 x 30 pixelsimagebeingmorethan adetoatleasta 35 x 35 quateinsize,itneedstobe expanded x a 49 49 imagefora geostaimagefora SAR analysis, tisticalanalysis, and a 31 x 31 imagefora spatialfilter analysis. Ifa spatialresearcher wantsto estimatean attribute meanfora particular region,witha specific geographic confidence interval), degreeofprecision(i.e.,a specific thenpowerbecomesirrelevant (Ott 1988,131) andinfill x x.. x. x (B)( (B) S n/nmax dNN1M 1.0 (dNN1) / (as / 08 // 06 f 0.4/ 0.4 // 02 00 00 0.0 oo 0.2 a 2 0.4 n/nmx; 06 AXd d_NN1/ 0.8 810 1.0 NNi) denoted locations, (x),and bycrosses Figure10. (A): soilsample dewithitscentroids, tessellations oneofthehexagonal together notedbysolidcircles( * ), fortheSyracuse, NY,study. (B): scatdenoted ofrelative bysolidcircles( . ), terplots n*versus n/nmax, versus nearest distance standardized andrelative neighbor n/nmax, denoted NY,study. bysolidsquares( ), fortheSyracuse, EffectiveGeographicSample Size in the Presenceof Spatial Autocorrelation for which spatial autocorrelation is negligible], =0.0958, and d= b=47.9041, c=-7.4247, dmax drNN 0.1317 (see Figure10). As was done fortheMurraydata, estimatesofcoefficients b,c, and d wereobtainedhereby estimatingn* forincreasinglydenser hexagonal tessel- p lations. Because power is not of interesthere, Z1 disappearsfromthe formula.Because spatial autocorrelation increases with increasing infill sampling,and because diminishingreturnsforobtainingnew, nonredundant informationare encounteredas n increases, withthe limitingcase (n goes to oo) approachingno new information witheach additionalsampleselection,more and more sampleshave to be taken in orderto acquire less and less new information. A reasonablevalue of A forthe Syracusestudywould be 0.25, indicatinga confidence interval of ji 0.25. For a = 0.05 and KBessel functionsemivariogramcoefficientestimatesof C1 = 0.98432 and Co = 0, Equation (13) indicatesthat achievingthislevel of precisionwould requirea sample size of 2,501. Given that the mean of the pilot study sampleis roughly4.5, thislevel of precisionwould allow a researcherto estimate the log-populationmean to withinroughly? 6% of its actual value. Of note is that the RESS forthe semivariogram model is 0.416, and the descriptionof the experimentfurnishing Equation (10) coefficientestimatesis accompaniedby a pseudo-R2of 0.996. approximately Therefore,findingsreportedin this articlefurnisha methodologyand formulaethatenable the computation of appropriatesamplesizesforquantitativestudieswhen non-zerospatial autocorrelationis presentin georeferenced data. The firststep (Step #1) of thismethodology involvesa pilotstudyto obtaininitialestimatesofspatial autocorrelationand variable variance. If a researcher chooses to obtaina varianceestimatefromthe literature, then assumingmoderatepositivespatialautocorrelation formost variablesand extremelystrongpositivespatial autocorrelationfor remotelysensed images would be reasonable, too. The second step (Step #2) involves selectionof a spatial model specificationto be used in subsequentdata analyses.Althoughthismodel-informed samplingdesign approach is somewhatsensitiveto the is modeled,all three wayin whichspatialautocorrelation alternativemodel specifications indicatethatgeographic studiesrequiresubstantially largersamplesizes than are suggestedby conventionalstatisticaltheory.This particular resultis relevantto qualitativesampling,too. The thirdstep (Step #3) is to computen, superimposethe corresponding hexagonaltessellationoverthe studyarea, and thenrandomlyselecta singlepointfromwithineach hexagon.This is the sampleto be drawn.A usefulpostdata-collectiondiagnosticexercisewould be to compare 755 parameterestimatesbased on the sampledata withthose thesamplingdesign.Of note is thatthe used to formulate contribution of this articleis the development principal of Equations (12A)-(12C) forcalculatingn. Futureresearchneeds to addressextensionsof findings reportedhere to other than means of normally distributedgeoreferenced variables.The previouslylisted UCLA web page furnishescalculatorsformeans of Poisson (which would require the Winsorized autoPoisson model) and exponentialvariables,and correlationcoefficients (whichare touchedupon in thisarticle). Meanwhile, the Web page http://www.stat.uiowa. furnishes Russ Lenth's edu/%7Erlenth/Power/index.html calculators for proportions(which would require the autobinomialmodel), and analysisof variance (multiple means,which are touched upon in this article;also see Cliffand Ord 1975; Griffith 1978). Futureresearchalso needs to outline impactsof spatial autocorrelationexplicitlyforpurposeful samplesused in qualitativestudies. Acknowledgements This material is based on work supportedby the National Science Foundation under Grant #BCS0400559. Executionof computationaland GIS workby MatthewVincent and Marco Millones is gratefully acwas while This research the knowledged. completed author was in the Departmentof Geographyand Regional Studies,Universityof Miami. AppendixA The Case of a WeightedAverageof Two Correlated SampleMeans Sometimesa spatial scientistmay be simultaneously interestedin two variables. One consequence of exofthe tendingEquation (2) resultsto thejointtreatment pairofmeans,x and y,fortwo attributevariables,X and Y, is the presenceof two sourcesof redundantinformation:correlationbetweenthe twoattributevariablesand spatial autocorrelationwithin each attributevariable. Dutilleul (1993) updates the Clifford, Richardson,and H~mon (1989) discussionabout how spatial autocorrelation impacts upon the correlationcoefficient.Extendinghis discussionrevealsthatcovariationalso has a VIF similarto that appearingin Equation (2), withthis factorbeing compensatedforby the individualvariable is computed;thatis, VIFs when a correlationcoefficient mustbe containedin the intera correlationcoefficient val [-1, 1], regardlessof the nature and degree of present.Constructinga weighted spatialautocorrelation Griffith 756 averageofX and Y,say [wX+(1- w)Y] for0 < w < 1, resultsin the sampling distribution varianceofinterest nentsor factorscores(see R. Johnsonand Wichern oflargesampleinference), yields 2002,fora discussion 1Ad d dAd 1 (I~cl d 0 I) V/2 TR(Ad d T)Vd/2"r? (Ad .dAd)TR[ ((Adl @ I)Vd/2)T( ? 1T I)AI) beingforwex+ (1 - w)y. IfvariablesX and Y are independent,thenthisstandarderrorreducesto the theoreticalresultof a weightedsum of the variables'two variances. In thisbivariatemeanscase, effective samplesize becomesa weighted variables' averageoftheindividual effective samplesizesthatis adjustedforthe attribute betweenX andY.The generalexpression correlation for n*becomes + 2w(1 -w)pxyaxcy w2au + (1 - +2(2 1 R(-i w2?T(V (12 ++ (A2 w2T (X wu2 X TV-11 + (11 [w22WTR(Vx1)+ n, (B1) I)V/2) 1 (Al) -w)2yTR(Vy)] l) n,' positivespatialautocorreapproachthe case ofperfect one. If all butone oftheweights lation,n* approaches are zero,thenexpression(B1) reducesto the right-hand valueproducedby sideofEquation(2). The numerical interval definedby in the is contained expression (B1) resultsobtainedwith fortheP individual theextremes Equation(2). 0) 0)(ox Sdd(w _(wox) Considerthe 2-meanscase (see AppendixA). Then Ad d 0 1 - Wery 0 0 Ad d Appendix B od The Case of a LinearCombinationofP > 2 CorrelatedSampleMeans (Al) to a Generalizing Equation(2) and expression multivariate situation P variable which means, involving is particularly relevantto the use of principalcompo- I)Vd'/2 ( where0 denotesKronecker Ad is a P x P diproduct, coeffithelinearcombination agonalmatrixcontaining cient ap in diagonalcell Od is aP x P diagonal P, deviation standard matrix containing cYpin diagonalcell matrixcontaining Vd is an nP x nP block-diagonal P,x n inversecovariancestructure matrixVp'-1 in diagn correlation is a P x P attribute onal block matrix, matrix.If Vd = I, thenexand I is anP,n x n identity pression(B1) reducesto n. As all of the Vp matrices -1/2 (1 ---2w)2o21TV--1 + 2wu(1- w)pxyJxoy1(V)-1/2V whereVx and Vy respectively are the nx n inverse covariancestructure matricescontainingthe spatial autocorrelation among n observationsfor attribute variablesX and Y. If Vx = Vy = I, then thisexpression reducesto n. If w = 0, w = 1, or Vx = Vy, then reducesto theright-hand thisexpression sideofEquation (2). In otherwords,the bivariatemeans effectivesamplesizeis a weighted averageoftheindividual effective univariate samplesizes (i.e., it mustbe containedin the intervaldefinedby them).And as Vx and Vy approachthe case of perfectpositivespatial for both attributevariables,n* apautocorrelation of the twolimiting effecproachesone. The weighting tive samplesizes is determined by both the relative variancesof X and Y and the weightsused in cona linearcombination of X and Y and is imstructing little the attribute correlation, by pacted Pxy,computed X and Y. forvariables d I) 0 x(w W(1 (1 - w)oy) 0 (wx )(1 pxY) ) (1 w- )oxy xpxy 1 ( (10- w)oy xyx Iao -w2o 0 -- W)oxIypxy (1 -- W)23. EffectiveGeographicSample Size in the Presenceof Spatial Autocorrelation = 1TAdiAdiid1 TR(Ad~d correlationcoefficient, r, to its correctsamplingdistribution forgeoreferenceddata (also see Haining 2003, degreesof ?8.2). They develop the notion of effective = the Based on the standardresultof or freedom. standarderrorof the correlationcoefficient r,underthe = 0, the numberof observations, null hypothesisof pxY n, maybe proratedto n* usingthe formula (1 +22 2w)2X2 + 2w(1 - w)pxyxovy dAd) w I)V1/2)[(T = 2 x ( + (1 - 0 w20 ) (Add n*= 1 + 0"- 0 ( ((V1/2 (1)T- w)oyI 0 x 0 = \ 0 ) T (V y1/2 0 (WYx(Vx1/2)T (1 0 I i X= 1 } \PYpxY) ((Ad`d d ?I- ( xl II I)V'/2) T( I x -- W)(Y(V pxyI)e PxyI)I /(d wx(V pxyI I ( d w(1 -w),x 0 1/2)T(Vyl/2 1 = = w2 TR(V1) +- oy(Vy1/2)T (1 - W)Oy (Vy1/2 I)V2) (AdId 1 + (1 - w)221 TVyil w2oX1TVx + 2w(1 W)pxyCxoy1T(VT)-I/2VXl/2 TR [((Ad d 0 I)Vd 9/2)T( 0 II)((Ad I)V/2)] 1T(A(d 0 (wox(Vxl/2T (1-- W) 0 0 ) 0 1))V-/2) 1/2) w(1 - w)axGYpxy(V wi2Vx 0 I) V 0 (/2) (i I) x1/2)T I I)((Ad (C1) denotes the variance of the samplingdistriwhere of the correlationcoefficientr. Equation (C1) butionor is associatedwith indicatesthatlow samplingvariability largervalues of n and sizeable samplingvariabilityis associatedwithsmallervalues ofn. Considertwo variables containingmaximalpositivespatialautocorrelation, which manifestsitselfapproximatelyas a linear data gradientacross a geographiclandscape. This situation relates to three qualitativelydifferentvalues of rxy, namely,rxyM 1 (both gradientsalign), rxy -1 (the gradientsare in exactly opposite directions),and two cases of rxy m0 (the gradientsare orthogonalin two (wxI 0 0(1 - w) lyI (Vy1/2)T x 0 (VxI/2)T 757 (1 - w)2G2TR(Vy1) Therefore,expression(Bi) reduces to expression(Al) when onlytwo means are beingconsidered. AppendixC An Illustration ofOtherPossibleExtensions: The Case ofBivariate Attribute Correlation Coupledwith PositiveSpatialAutocorrelation Clifford,Richardson,and H~mon (1989) and Richardson(1990) use semivariogram modelingto linkthe - YPxy(Vxl/2)T(Vy1/2)) )2(V1/2) different ways). Hence, positivespatial autocorrelation increases o,, resultingin n* decreasing.Meanwhile,if zero spatial autocorrelationis present,then n* = n; if perfectpositivespatial autocorrelationis present,then n* approachesone. Dutilleul (1993) rewritesEquation (C1) usingmatrixnotationand incorporatesimpactsof estimatingmeans and variancesfora correlationcoefficient,an adjustmentthat is set aside here forsimplicity (these typesof adjustmentsalso are outlinedby Griffith and Zhang 1999). These resultsare forthe case of independentattributevariables(i.e., Pxy= 0). The new developmentpresentedhere departsfrom thisearlierworkin orderto incorporatethe entirerange of correlationvalues by beginningwith the following standarderrorof a correlationcoefficient: 1 2 2 x. 1v/n-1 (C2) Followingthe developmentsforsamplemeans and forr under the null hypothesisof Pxy= 0, the logic for 758 Griffith derivingEquation (2) suggests with the aforementioned pseudo-R2values suggeststhe need for further refinement of Equations (C2) and (C3). - TR[(Vl)2]r2 1 - ry x If Pxy= 0, Equation (C3) differs fromthe one report2 nTR(Vx1Vyl) ed and Richardson, Clifford, by Hfmon (1989) and TR(V1')TR(Vy1) r 10.47(1 1 ] -r~tu)+8/[n(rX~ry)6/" Dutilleul the factor by (1993), by multiplicative + 16.40/n) 047(1-r)+8/[n(+rx+r)6/n] )TR(V )-+ TR(V STR(Vx1)TR(Vy1)(1 16.40TR[(Vx)2] whereVx'1 and V7y1, respectively, are the n x n spatial autocorrelation covariancestructure matricesforattribute variablesX and Y This equation was establishedwitha simulationexperimentconducted using regularsquare tessellationsformingrectangularregionsrangingin size from3 x 3 to 40 x 40, combinationsof SAR autocorrelationparameter values,px and py,in the set {0.00, 0.25, 0.50, 0.75, 0.99}, pxy valuesin the set {0.0, 0.2, 0.4, 0.6, Pseudo-R2valuesfor 0.8, 0.999}, and 10,000replications. the threespecialtheoretical cases thatcan be checkedare as follows:0.9984 forpx = py= 0 (and,as a check,0.9957 forthe analyticalresults);0.9874 forPxy= 0; and,0.9814 forPx = PY " 1. The overallpseudo-R2value is 0.9954, withthe performance of thisequationimproving asymptotically.In addition,it has been validatedusingthe irforthe 127 regularThiessenpolygonsurfacepartitonings Texas groundwater locations and 124 Kansas the sample oil welllocations,the 73 municipios ofPuertoRico,and the 513 Minnesota foreststands reportedby Griffith and Layne (1999). These fourirregularsurfacepartitioning data sets were supplementedwith the Murraysmelter Thiessen polygonsdata. The pseudo-R2for resultsobtainedbyreexecuting thesimulation experiment usingthe matricesforthesefiveempirical geographicconfiguration obtainedwith geographiclandscapesand the coefficients the precedingsimulationexperiment is 0.9995. Therefore,followingthe same logic used to establish the commonstatistical Equation (2), as wellas employing practiceofusingr to estimateitsown standarddeviation, y 1+(1-r)n-1 n TR(Vx1)TR(Vy1)r~y TR(Vx'Vy1)- TR[(Vx4)2] TTR(V TR(V-1)TR(VI)(1+ 16.40/n) )TR(Vy1) +16.40TR[(Vx4)2] 0.47(1-rxy)+8/[n(l+rx+rg)6/n] (C3) If the n observationscontain zero spatial autocorrelation, then V1' = Vy 1= I and Equation (C3) becomes 1+ (n- 1) = n; ifthe n observationscontainperfectpositive spatial autocorrelation, then, conceptually, VX' = Vy 1 11T and hence Equation (C3) asymptotically convergeson 2 forPxy= 1-calculation of a correlationcoefficient requiresat least 2 observations-and 5 for equals roughly pxY= 0. This latterresultcoupled n n-1 [ (V--)(+16.0/n TR(Vxl-)T[TR(Vxl)TR(Vy1)+16.4OTR[(Vi4)2]] 047+8/[n(1+rx+ry)6/n] J This adjustmentfactormay relate to the use of sample statistics,ratherthan populationparametervalues, in calculations,a modificationDutilleulintroduces. Equation (C3) links to the spatial autoregressive model specificationsthrough the inverse covariance matrix,ratherthanthe covariancematrixitself(see, e.g., Haining 1991). Because it is specifiedin termsof SAR spatialautocorrelation parameters,it willneed to have a whichpresumasemivariogram counterpartformulated, blyneeds to be specifiedin termsofthe rangeparameter. AppendixD The HexagonalTessellationStratified Sampling Design Because a regularhexagonaltessellationis employed, the radius,Th,of a desiredhexagoncan be approximated where "area" de, notesthe area of the landscapeto be sampled.The value producedbythiscomputation actuallyis an upperlimitfor the desiredradius.Next, starting withan arbitrary (0, 0) pointpositionedoutsideof the studylandscape,a gridof hexagoncentroidcoordinatescan be generatedwiththe formula(3urh, /3vrh/2), u = 0, 1, ... Umax, and v = 0, 1, griddefinedby (Umx,Vmx) S.., Vmax,wheretherectangular extendsbeyondthestudylandscapein all directions. Next, the hexagonscan be generatedwitha standardThiessen polygonprogram.Both of theselast two stepshave been in ArcView3.3 withAvenue scripts. implemented Sample locations (u, v) within a hexagon can be generatedbydrawingpairsofindependentsamplesfrom the uniformdistribution U(0, 1); the coordinatepair is beingdrawnfroma unitsquare,and itsjoint probability distribution will be a Poisson. First,the v-coordinateis selected; then, the u-coordinateis selected such that by calculatingthe quantity 2 u <_ Because the partial hexagon eliminates onequarter of the unit square, roughly25 percentof the selected values of u will be rejected. Next, a random selectionis made fromthe set of integers{1, 2, 3, 4}. If the integer1 is selected,thenthe coordinatepair (u, v) is retained. If 2 is selected, then the coordinate pair becomes (- u, v), a reflectionof the originallyselected EffectiveGeographicSample Size in the Presenceof Spatial Autocorrelation (A) )r'r' i ?*~~~L?Y?)~ f~jrrvri Ir5'( C: 'nt ,~ i'+* 3 3 rtii: Y r?,I ;1VCrr *j;L 1ir'82; 'hl r '"C *u'lr 3)C? rl-:i7,r Ir~L~t131 i hi.rtIlrTIY~-~))I u I~ ~1~?! *r ,I Ic,, ;c,411,, ~cl;tfi ?I~~ ?7Y' 'Iil':CI .rr'r A i r?A-~S)!I~Y I i ~ )5?LI ",ti,~Y?+Iki~ IAS c?:kr(t~?~t~s. r 4 1~4 tir,?!li~i~'I I'r ~7;YilCu%=5 r:nfr ry.~rr Z~Y/r~* . ( ?I*ll Cr~ 5.*CIhY, s!,S iir 3~C~F~ 1~~i4r~~YLI~1~C? ii ii *r, /CI?rr .CIi?? ix~il ~j~l:P 12~3 4:u,.i~ rr j~i!~jI 1 "~!' ,, ~t;~f~C(L,1.1 ?hiLi; r3?'~j~;r titt' '~p~~ tit rr ur rr, 'I II ~ ?;:~n: ,, ru, ,31'~ cU t 1.'r" rJ, r, t4;? (L 'I r 4i7?L ,? ,S; rc~d~Ili~~Ct~I 1~Yci?fii~r r~sr~~,r lII~61i ) (IIrC11 r? (hS I ~ ?*/i 'r,,r ii, ru* b~A.uw4~~r rlJIrrri ;rr1:?5~~' rlliCi ~2+ rr, rrtI~rrllr31 rr t :,? i~li c'Yrc~tI Xt ?rrrr r?I.r~ui3L. I .???iiN r uil~ -rrrr /r rrr rr i~~ 3;*r? r(.?frr ;Ijrpi ylirirce )I( ~,,~~)? ~JCIIi*??';;l j ~C5, u i, .u: '' r ~4~ 1~ lnlr ?y n? ?ZitliF ,t;)~jb~~~~ci ;tCr r?r ?u =n,, ~'3r*~I r? a/? ~i~"'i"r"' '" '"" t i~i~r :A;its~'i?~c I:: "Ic ' ~II r ~t r rS~~Cl~rt Tlr?l; ;~;,.~:~12~5;~-'kA?1PI~. ~? Iii ~~ ~?.'T* i i rr?j~t~S~l~lrf rr rr.*it*.1i ,I ICiGYL 1r~ir~IljSt ~jh~ ?(yr?~Lyl ??'?''~rL~..~i: ~" ~?;?~?I4i rr ;5ic, :tF?i~ rI ~r~,~53 1 fr.r ,(? ~)fr'jrll?~r r4 r 3'Y 'r ,n ,-r 3'( Ct ,r.i[C~j~ L~*: r :~4;? c?1Y1 Z11 'Irr ~(?51 ?'~Tj i Ivil. '~C r~iil. r? r~'~*J' r ?r '~c54~ ~?na,.l, ji *,~ ~( ,,,, " 1; I, 1~?rI, ~.,.. T,'A:r -i*?1;~3ulR, 1 rc~i~fi4 ;f;r ??-?'~4'' ~d?1?r E;'rS~:1 a ?-? ?? 759 (u, v) coordinatealong the verticalaxis. If 3 is selected, thenthecoordinatepairbecomes(- u, - v), a reflection of the originally selected(u, v) coordinateacrossthe origin.And if4 is selected,thenthecoordinatepairbecomes selected (u, v) coof the originally (u, - v), a reflection ordinatealong the horizontalaxis. A simulationexperiment,involving10,000 replicationsand executed using thissamplingdesign,producedFigureD-1A. For a singlesampleofsizen, n samplecoordinatepairs (u, v) are drawn accordingto the precedingprotocol. Each coordinatevalue has to be rescaledbythe radiusof the hexagon,Th,containedin the tessellation.Then, in turn,one ofthe samplecoordinatepairsis added to each hexagon centroid.The resultingsample furnishesgood geographiccoverageand allowsall pointsin a landscape that are covered by the hexagonal tessellationto have of selection.One sampleof thistypeis equal probability in portrayed FigureD-1B. Notes (B) 1. The notionof effective datesto Satdegreesof freedom forthe two-sample terthwaite (1946), whoseadjustment variances areundifference ofmeanstestwhenpopulation in manyintroductory stais popularized equal,forexample, x tisticsbooks. 3( X? ?3u( ? )--~ ~K X )G k >5 r X ~ ~iYu ?X ~c>-~ Y )34( r >G k x a< x /X 2. Thisconceptalsoappearsin thetimeseriesliterature (e.g., see Dawdyand Matalas1964). Additional maybe insight foundin papersbyBox (1954a,b). matrixwhoseentriesare 3. Frequently, C is an n x n binary andcij= 0 otherwise. cij= 1 ifarealunitsi andj areneighbors, is 4. The estimation equation 1 )< i nifly (1 + /n 11+ be-0.13867c-3.46687d max) /dmax-dxdNN/d be3867c346687 be-CXdNNM >t References X ~j~ ~ andmodels. DoMethods Anselin,L. 1988.Spatialeconometrics: theNetherlands: Martinus rdrecht, Nijhoff. andR. Haining.1998.Error Arbia,G., D. Griffith, propagation in rasterGIS: Overlayoperations. International modelling 12:145-67. ofGeographical Systems Journal Information in raster GIS: Ad1999. Error modelling propagation -. & Geographic ditionand ratioing Cartography operations. 26:297-315. Information Systems spatialdataanalysis. Bailey, T., andA. Gatrell.1995.Interactive London:Longman. of 2002.All-scalespatialanalysis Borcard, D., andP Legendre. ofneighcoordinates databymeansofprincipal ecological 153:51-68. bourmatrices. Modelling Ecological on quadraticforms Box,G. 1954a.Sometheorems appliedin I. Effect ofinthestudyofanalysis ofvarianceproblems. Annalsof ofvariance in theone-way classification. equality Mathematical 25:290-302. Statistics - . 1954b.Sometheorems forms on quadratic appliedin II. Effects ofinofvarianceproblems. thestudyofanalysis errors inthe between ofvariance andofcorrelation equality X* Lt k randomsampling FigureD-1. Examplehexagonaltessellation outcomes. from theunitsquarethathave (A): 10,000selections beenconverted to thebasesampling hexagon.(B): a singletessellation stratified random denoted for sample, bycrosses (x),drawn thecaseofn = 104hexagons, whosecentroids aredenoted bysolid circles( * ), covering theMurray smelter site. 760 Griffith Annalsof Mathematical Statistics two-wayclassification. 25:484-98. 1993.Design-based versusmodelBrus,D., andJ.de Gruijter. basedestimates ofspatialmeans:Theory andapplication in environmental science.Environmetrics 4:123-52. of meanswhen A., and J.Ord. 1975.The comparison Cliff, observations. samplesconsistof spatiallyautocorrelated A 7:725-34. Environment andPlanning .1981.Spatialprocesses. London:Pion. Clifford, andD. H~mon.1989.Assessing the E, S. Richardson, ofthecorrelation betweentwospatialprocesssignificance es. Biometrics 45:123-34. Cressie,N. 1991.Statistics forspatialdata.NewYork:Wiley. and probability Dawdy,D., and N. Matalas.1964. Statistical ofhydrologic ofvariance, data,PartIII: Analysis analysis covarianceand timeseries.In Handbook A of hydrology, ed.V.Chow,8.68ofwater-resources compendium technology, 8.90.NewYork:McGraw-Hill. Diggle,E, and S. Lophaven.2004.Bayesian geostatistical design, ofBiostatistics, Working Paper#42.Baltimore: Department JohnsHopkinsUniversity. thet testforassessing thecorreDutilleul, P 1993.Modifying lationbetweentwospatialprocesses. Biometrics 49:305-14. Environmental E., and A. Sparks.1991.Geostatistical Englund, Assessment User'sGuide.LasVegas,NV: EnvironSoftware: mentalMonitoring U.S. EPA. Laboratory, Systems andC. Ferrer. 2003.Systematic deFlores, L.,L. Martinez, sample ofspatial means.Environmetrics 14:4541. signforestimation forspatialdependence in regression Getis,A. 1990.Screening Science Association 69:69-81. analysis. Papers oftheRegional in a regression framework: ex. 1995.Spatialfiltering onregional periments inequality, expenditures, government and urbancrime.In Newdirections inspatialeconometrics, ed. L. AnselinandR. Florax,172-88.Berlin:Springer. 2002.Comparative in Getis,A., andD. Griffith. spatialfiltering 34:130-40. regression analysis. Analysis Geographical tests: AddressGetis, A.,andJ.Ord.2000.Seemingly independent ofmultiple simultaneous anddependent tests. ingtheproblem at the39thAnnualMeeting oftheWestern Paperpresented ScienceAssociation, Kauai,HI, 28 February. Regional D. 1978.A spatially Griffith, adjustedANOVA model.Geo10:296-301. graphical Analysis 1988. Advanced statistics. theNethDordrecht, spatial . erlands:Martinus Nijhoff. Reflections on . 1992.Whatis spatialautocorrelation? thepast25 yearsofspatialstatistics. l'Espace 21:265-80. G.ographique . 2000.A linearregression solution to thespatialautocorrelation 2:141-56. Journal problem. ofGeographical Systems distribution ofsoil-lead concen. 2002.The geographic tration: andconcerns. URISAJournal 14:5-15. Description . 2003.Spatialautocorrelation andspatial Gaining filtering: andscientific visualization. Berunderstanding theory through lin:Springer-Verlag. beGriffith, D., and E Csillag.1993. Exploring relationships tweensemi-variogram and spatialautoregressive models. inRegional Science72:283-96. Papers between Griffith, D.,andL. Layne.1997.Uncovering relationships andspatialautoregressive In the1996 models. geo-statistical onStatistics andtheEnvironment, 91oftheSection Proceedings Association. Statistical 96. Washington, DC: American A dataanalysis: forspatialstatistical -. 1999.A casebook datasets. New thematic of analyses of different compilation Press. York:Oxford University 2003.Exploring reGriffith, D., D. Wong,andT. T. Whitfield. measures ofspabetween theglobalandregional lationships Science tialautocorrelation. 43:683-710. Journal ofRegional Griffith, D., andZ. Zhang.1999.Computational simplifications of spatialstatistical neededforefficient implementation Sciin a GIS. Journal Information ofGeographic techniques ence5:97-105. inthesocialandenvironmendataanalysis R. 1990.Spatial Haining, Press. U.K.:Cambridge talsciences. University Cambridge, andspatialdata.Geographcorrelation S1991.Bivariate icalAnalysis 23:210-27. - . 2003. Spatialdataanalysis: andpractice. CamTheory Press. U.K.:Cambridge University bridge, Biomeforthebootstrap. Hall, P 1989.Antithetic resampling trika76:713-24. andN. Lucas.2001. K., J.verHoef,K. Krivoruchko, Johnson, CA: ESRI. Redlands, Analyst. UsingArcGISGeostatistical multivariate statis2002.Applied R., and D. Wichern. Johnson, Hall. 5thed.UpperSaddleRiver, ticalanalysis, NJ:Prentice underinfill of estimators Lahiri,S. 1996. On inconsistency SeriesA 58:403-17. forspatialdata.Sankhya, asymptotics forweighted sumsunLimitTheorems Central 2003. . der some stochasticand fixedspatialsampling designs. SeriesA 65:356-88. Sankhya, Meth1991.Sampling ofpopulations: Levy,P, andS. Lemeshow. NewYork:Wiley. odsandapplications. reMarshall, C., and G. Rossman.1999. Designing qualitative 3rded. ThousandOaks,CA: Sage. search, someenvironmental andcontrasting R. 2001.Comparing Martin, Environmetrics 12:303-17. andexperimental design problems. Springerspatialdata.Heidelberg: Miller,W. 2001. Collecting Verlag. anddataanalmethods tostatistical Ott,L. 1988.Anintroduction Press. ysis,3rded. Boston:Duxbury and K. Manton.1992."Equivalent R., M. Woodbury, Pottchoff, refinements offreedom" degrees samplesize"and"equivalent modunder forinference survey superpopulation using weights Association Statistical 87:383-96. els.Journal oftheAmerican R Documentation. http://www.maths.Ith.se/help/R/.R/library/ (lastaccessed10 October2003). html/effectiveSize.html ofassociation on thetesting Richardson, S. 1990.Someremarks In Spatialstatistics: betweenspatialprocesses. Past,present, MI: Instied. D. Griffith, andfuture, 277-309.AnnArbor, tuteofMathematical Geography. F.1946.An approximate ofestimates distribution Satterthwaite, Bulletin Biometric ofvariancecomponents. 2:110-14. In PracS., andW Overton.1996.Spatialsampling. Stehman, 31-63. ed. S. Arlinghaus, ticalhandbook ofspatialstatistics, Boca Raton,FL: CRC Press. G. Diamond,andJ.HasW, D. Griffith, Thayer, P Goodrum, to riskassessment. ofgeostatistics sett.2003.Application An International RiskAnalysis: 23:945-60. Journal New York: of statistics. Tietjen,G. 1986. A topicaldictionary ChapmanandHall. TX 75083-0688, SchoolofSocialSciences, ofTexasat Dallas,PO. Box830688,GR31,Richardson, Correspondence: University e-mail: [email protected].
© Copyright 2024