Impact of Wayne Fuller’s Contributions to Sample Survey Theory and Practice

Impact of Wayne Fuller’s Contributions to Sample
Survey Theory and Practice
J.K. Kim
1
Working Group Seminar
September 19, 2011
1
Joint Work with J.N.K. Rao at Carleton University, Ottawa, Canada
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
1 / 29
Contents
1
Brief bio sketch
2
Some early work
3
Regression estimation
4
Regression analysis
5
Quantiles
6
Two-phase sampling
7
Small area estimation
8
Measurement errors
9
Nonresponse and imputation
10
Rejective sampling
11
Other contribution
12
List of ThD Theses directed on survey sampling area
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
2 / 29
Brief Bio Sketch
Ph. D. in Agricultural Economics, Iowa State University, 1959
Thesis title: A non-static model of the beef and pork economy
Thesis advisor: Geoffrey Shepherd
Supervised 29 M.S. and 69 Ph. D. theses in sampling, time series
analysis and measurement errors.
Among former students at least 10 ASA Fellows and one ASA
President. Four of them supervised 15 or more Ph. D. theses.
Three Wiley books on time series analysis, measurement errors and
more recently sample survey theory.
Citations
Unit root tests: JASA paper 8903 citations, Econometrica paper 5352
citations. Measurement Errors book: 2352 citations
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
3 / 29
Some early work on sampling theory
Estimation employing post strata, JASA 1966.
Proposed method permits construction of unbiased estimates for
populations divided into a large number of small post strata: superior
to the customary practice of combining two post strata when one
contains few sample elements.
Sampling with random strata boundaries, JRSSB 1970.
Sampling designs are given that permit unbiased variance estimation
and efficiency approximately equal to 1 per stratum design.
A procedure for selecting non replacement unequal probability
samples, 1971 (unpublished)
Unconditional selection probability at each draw equal to
pi = xi /X ⇒ πi = npi : permits rotation of sample.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
4 / 29
Regression Estimation
Fuller (1968), unpublished report
Reasons for the popularity of ratio estimator: computational
simplicity, regression line often passes close to the origin implying
little loss of efficiency over regression estimator. Both estimators use
single weight for all variables and ensure calibration to known total .
But regression estimator is location and scale invariant unlike ratio
estimator.
For domain totals or means, ratio estimation may be very inefficient
relative to regression estimation: for example, total acres of corn
grown on farms of size less than A.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
5 / 29
Regression Estimation (Cont’d)
Calibration estimation
Fuller was aware of calibration and range restricted weights as early as
1968: M. Husain’s 1968 Master’s thesis: Construction of regression
weights for estimation in sample surveys. Doane Agricultural Services
Inc. used regression weights since 1972 for their syndicated market
research studies.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
6 / 29
Regression Estimation (Cont’d)
Husain’s thesis makes two significant contributions to regression
estimation of a mean under SRS.
1
P
Find weights {wi , i ∈ s} that minimize φ = i∈s wi2 subject to
P
P
¯
i∈s wi = 1 and
i∈s wi xi = X (Calibration constraints: CC)
Note: Under SRS, the objective function φ is equivalent to the
well-known chi-squared distance measure of Deville and S¨arndal (1992).
Husain went further by imposing range restrictions (RR)
a ≤ wi ≤ b, b > a > 0
2
and proposing to solve the problem using quadratic programming.
Relax the calibration constraint
and
P and instead minimize the sum of φ
¯.
a distance measure between i∈s wi xi and the population mean X
This proposal is a forerunner to the more recent work based on ridge
regression (Chambers 1986, Rao and Singh 1997, 2009).
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
7 / 29
Regression Estimation (Cont’d)
E. Hwang’s (1978) thesis and Hwang and Fuller (ASA Proceedings, 1978)
Iterative procedure: weights satisfy CC at each iteration but not
necessarily RR. A good description of the algorithm is given in Fuller
et al. (SM, 1994). Authors note that “It will not always be possible to
construct weights satisfying the specified restrictions in the specified
number of iterations. If the sample is such that the restrictions
cannot be met, the program outputs the weights of the last iteration”.
Proposed estimator has the same asymptotic variance as the
regression estimator.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
8 / 29
Regression Analysis
Regression analysis for sample survey, Sankhya 1975
Finite population is treated as a sample from an infinite population.
Both finite population and infinite population regression coefficient
vectors, B and β are defined.
By defining a sequence of populations and samples, central limit
theorems for the sample regression coefficient vector b are obtained
for SRS and stratified two-stage sampling designs. Consistent
estimators of the asymptotic variance of b are also given.
If the vector of auxiliary variables x = (1, x2 , · · · , xp )0 is replaced by
¯2 , · · · , xp − X
¯ 0
the vector z = (1, x2 − X
Pp ) , then the well-known
regression projection estimator y¯r = i∈s wi yi of the mean of Y¯ is
identical to the intercept in the regression of y on z. Thus the theory
of Fuller (1975) for regression coefficients is applicable to the
regression estimator of the mean.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
9 / 29
Regression Analysis (Cont’d)
Hidiroglou, Fuller and Hickman (1976). SUPER CARP
Authors give a linearization variance estimator which uses the
calibration weights instead of the design weights in defining the
residuals. This follows from Fuller (1975). The same variance
estimator was proposed later in a well-known paper by S¨arndal et al.
(1989).
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
10 / 29
Regression Analysis (Cont’d)
Informative sampling
Fuller (2009). Sampling Statistics, Wiley, sec. 6.3.2
Population model
yi = xi0 β + ei , Em (ei ) = 0, Vm (ei ) = σ 2 , i ∈ U
Survey weighted estimator of β:
!−1
βˆw =
X
i∈s
wi xi xi0
!
X
wi xi yi
i∈s
Estimator βˆw is consistent for β under informative sampling but it
can be inefficient if the weights wi vary considerably. Fuller (2009)
suggested replacing wi by wi Ψi in the expression for βˆw , where
Em (ei | xi , Ψi ) = 0, and then search for optimal Ψi . Pfeffermann and
Sverchkov (1999) suggested using Ψi = 1/w
˜ i where w
˜ i is an
estimator of Em (wi | xi , i ∈ s).
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
11 / 29
Quantiles
Francisco (1987). Ph.D. thesis and Francisco and Fuller (1991).
Estimation of quantiles with survey data, Annals of Statistics
Well-known Bahadur representation of quantiles for SRS is extended to a
more general class of sample designs and the representation is used to
show that weighted sample quantiles for complex samples are normally
distributed in the limit. Confidence intervals based on test inversion also
studied.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
12 / 29
Two-phase sampling
Fuller (2003): Estimation for multiphase samples, Wiley book edited by
Chambers and Skinner.
Domain projection estimator
x observed in a large first-phase sample A1 and both x and y
observed in a smaller subsample A2 . Fuller (2003) proposed a domain
projection estimator for A1 , based on predicted y -values {ˆ
yi , i ∈ A1 }
obtained from A2 . This estimator can be considerably more efficient
than the customary domain two-phase regression estimator based on
phase 2 sample if regression of y on x is linear.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
13 / 29
Two-phase sampling (Cont’d)
Re-stratified two-phase sampling: Kim, Navarro and Fuller, JASA
2006
This paper gives a consistent replication variance estimator that is
applicable to both the double expansion estimator and the reweighted
expansion estimator of a total. It is based on a consistent first phase
replication variance estimator.
Earlier work: Kott and Stukel (SM, 1997) studied jackknife variance
estimator along the lines of Rao and Shao (Biometrika, 1992).
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
14 / 29
Small area estimation
Battese, Harter and Fuller, JASA, 1988
Introduced unit level models for small area estimation based on
nested error linear regression. Best linear unbiased predictors of small
area means and their estimators of MSE are studied as well as model
diagnostics. Application to estimation of county crop areas using
survey and satellite data. Reported actual data in the paper and
several subsequent papers used this data set.
Other work includes area level models when the sampling variances
are estimated, automatic benchmarking using augmented models,
estimation of three digit counts in Canadian provinces.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
15 / 29
Measurement errors
Fuller (1995): Estimation in the presence of measurement errors, Intl.
Statist. Rev. (based on Hansen Lecture)
For estimating a population mean, usual estimators remain unbiased
under additive measurement errors with zero means. Variance
estimation also can be handled through interpenetrating samples
method of Mahalanobis (1944).
Fuller (1995) demonstrated that the above nice features do not hold
in the case of distribution function, quantiles and some other complex
parameters. Usual estimators are biased and inconsistent and can lead
to erroneous inferences.
Bias-adjusted estimators can be obtained if at the design stage
resources are allocated to estimate measurement error variance
through replicate observations for a subsample.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
16 / 29
Non-response and imputation
Unit non-response: Fuller et al., 1994, SM
Regression estimator based only the respondent values is shown to be
asymptotically unbiased if the inverse of response probability pi is
linearly related to xi , the vector of regression variables. This condition
is satisfied if the response probability is equal within groups defined
by dummy x variables. This result was independently discovered by
Sarndal and Lundstrom (2006).
Identify xi correlated with pi and (or) yi
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
17 / 29
Fractional hot deck imputation
J. Kim and Fuller (2004): Fractional hot deck imputation, Biometrika.
For item non-response, ”fractional hot deck imputation replaces each
missing observation with a set of imputed values and assigns a weight
to each imputed value”. It reduces or eliminates imputation variance
unlike usual hot deck imputation.
Previous work: Kalton & Kish, Comm. Statist. (1984), Fay, JASA
(1996).
A consistent replication variance estimator is also proposed.
Fractional imputation and the proposed variance estimator “are
superior to multiple imputation in general, and much superior to
multiple imputation for estimating the variance of a domain mean”.
Kim, Brick, Fuller and Kalton (JRSS B, 2006) showed that the bias
of multiple imputation variance estimator ”may be sizeable for certain
estimators, such as domain means, when a large fraction of the values
are imputed”. Authors propose a bias-adjusted variance estimator.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
18 / 29
Rejective sampling
Fuller (2009): Some design properties of a rejective sampling procedure,
Biometrika.
A probability sample is rejected unless the estimated mean of auxiliary
variables vector is within a specified distance from the corresponding
known population mean vector. Asymptotic properties of regression
estimator under rejective sampling remain the same as those of the
regression estimator for the original probability sampling procedure.
This method is somewhat similar to balanced sampling. Yves Tille’s
presentation will provide more details of the two methods.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
19 / 29
Other contribution
Extensive advisory work for various agencies: (1) Twenty five years on
the Statistics Canada advisory committee on methodology, including
20 years as chair. Fuller plans to retire from the committee at the end
of 2011. Several methods used in Statistics Canada are based on
Fuller’s suggestions at the advisory committee meetings.
Fuller is heavily involved in the work of the Survey Section (now
Center for Survey Statistics and Methodology) at Iowa State
University. Many of the methods used at the Center are due to Fuller.
Extensive consulting work for USDA and Census Bureau. Methods
used by the National Resource Inventory of USDA are largely due to
Fuller.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
20 / 29
Ph.D. Theses directed (by Fuller)
(1) Shi, Chang Sheng (1966), Interval estimation for the exponential
model and the analysis of rotation experiment.
(2) Wey, Ing-Tzer (1966), Estimation of the mean using the rank
statistics of an auxiliary variable.
(3) Yusuf-Mia, Mohammed (1967), Sampling designs employing restricted
randomization.
(4) Lund, Richard E. (1967), Factors affecting consumer demand for
meat, Webster County, Iowa.
(5) DeGracie, James Sullivan (1968), Analysis of covariance when the
concomitant variable is measured with error.
(6) Rosenzweig, Martin Stephen (1968), Ordered estimators for skewed
populations.
(7) McElhone, Donald Hughes (1970), Estimation of the mean of skewed
distributions using systematic statistics.
(8) Martinez-Garza, Angel (1970), Estimators for the errors in variables
model.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
21 / 29
Ph.D. Theses directed
(9) O’Brien, Peter Charles (1970), Procedures for selecting the best of
several populations.
(10) Isaki, Cary Tsuguo (1970), Survey designs utilizing prior information.
(11) Gallant, A. Ronald (1971), Statistical inference for nonlinear
regression models.
(12) Burmeister, Leon Forrest (1972), Estimators for samples selected from
multiple overlapping frames.
(13) Huang, Her Tzai (1972), Combining multiple responses in sample
surveys.
(14) Jobson, John David (1972), Estimation for linear models with
unknown diagonal covariance matrix.
(15) Tejeda-Sanhueza, Herman R. (1973), Statistical analysis and model
building for a wheat production system in Chile.
(16) Booth, Gordon D. (1973), The errors-in-variables model when the
covariance matrix is not constant.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
22 / 29
Ph.D. Theses directed
(17) Battese, George E. (1973), Parametric models for response errors in
survey sampling.
(18) Goebel, John Jeffery (1974), Nonlinear regression in the presence of
autocorrelated errors.
(19) Wolter, Kirk M. (1974), Estimates for a nonlinear functional
relationship.
(20) Hidiroglou, Michael A. (1974), Estimation of regression parameters for
finite populations.
(21) Dickey, David A. (1976), Estimation and hypothesis testing in
nonstationary time series.
(22) Carter, Randy Lee (1976), Instrumental variable estimation of the
simple errors in variables model.
(23) Wang, George H. K. (1976), Estimators for the simultaneous equation
model with lagged endogenous variables and autocorrelated errors:
with application to the U.S. farm labor market.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
23 / 29
Ph.D. Theses directed
(24) Hasza, David P. (1977), Estimation in nonstationary time series.
(25) Huang, Elizabeth T. H. (1978), Nonnegative regression estimation for
sample survey data.
(26) Bhattacharyay, Biswanath (1979), Estimation for varying parameter
stochastic difference equations.
(27) Dahm, Paul F. (1979), Estimation of the parameters of the
multivariate linear errors in variables model.
(28) Macpherson, Brian D. (1981), Properties of estimation for the
parameter of the first order moving average process.
(29) Drew, James H. (1981), Nonresponse in surveys with callbacks.
(30) Mowers, Ronald P. (1981), Effects of rotations and nitrogen
fertilization on corn yields at the Northwest Iowa (Galva-Primghar)
Research Center.
(31) Lee, Edward H. (1981), Estimation of seasonal autoregressive time
series.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
24 / 29
Ph.D. Theses directed
(32) Pantula, Sastry G. (1982), Properties of estimators of the parameters
of autoregressive time series.
(33) Amemiya, Yasuo (1982), Estimators for the errors-in-variables model.
(34) Tin Chiu Chua (1983), Response errors in repeated surveys with
duplicated observations.
(35) Harter, Rachel (1983), Small area estimation using nested-error
models and auxiliary data.
(36) Hung, Hsien-Ming (1983), Use of transformed LANDSAT data in
regression estimation of crop acreages.
(37) Miazaki, Edina Shisue (1984), Estimation for time series subject to
the error of rotation sampling.
(38) Miller, Stephen M. (1986), The limiting behavior of residuals from
measurement error regressions.
(39) Nagaraj, Neerchal K. (1986), Estimation of stochastic difference
equations with nonlinear restrictions.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
25 / 29
Ph.D. Theses directed
(40) Schnell, Daniel J. (1987), Estimators for the nonlinear
errors-in-variables model.
(41) Francisco, Carol A. (1987), Estimation of quantiles and the
interquartile range in complex surveys.
(42) Morel, Jorge Guillermo (1987), Multivariate nonlinear models for
vectors of proportions: A generalized least squares approach. (Joint
major professor with Ken Koehler.)
(43) Eltinge, John Lamont (1987), Measurement error models for time
series.
(44) Hasabelnaby, Nancy Ann Eyink (1987), The use of a weighting
function in measurement error regression.
(45) Sullivan, Gary R. (1989), The use of added error to avoid disclosure in
microdata releases.
(46) Shin, Dongwan (1990), Estimation for the autoregressive moving
average model with a unit root.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
26 / 29
Ph.D. Theses directed
(47) Sarkar, Sahadeb (1990), Nonlinear least squares estimators with
differential rates of convergence.
(48) Park, Heon Jin (1990), Alternative estimators of the parameters of the
autoregressive process.
(49) Croos, Joseph H. R. (1992), Robust estimation in measurement error
models.
(50) Tollefson, Margot H. (1992), Variance estimation under random
imputation.
(51) Adam, Abdoulaye (1992), Covariance estimation for characteristics of
the Current Population Survey.
(52) Yansaneh, Ibrahim S. (1992), Least squares estimation for repeated
surveys.
(53) Sanger, Todd M. (1992), Estimated generalized least squares
estimation for the heterogeneous measurement error model.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
27 / 29
Ph.D. Theses directed
(54) Sriplung, Kai-one (1993), (Joint direction with Stanley Johnson,
Economics) Mispricing in the Black-Scholes model: An exploratory
analysis.
(55) Deo, Rohit (1995), Tests for unit roots in multivariate autoregressive
processes.
(56) An, Anthony B. (1996), Regression estimation for finite population
means in the presence of nonresponse.
(57) Sarkar, Pradipta (1997), Estimation and prediction for non-Gaussian
autoregressive processes.
(58) Chen, Cong (1999), Spline estimators of the distribution function of a
variable measured with error. (Joint with F. Jay Breidt.)
(59) Roy, Anindya (1999), Estimation for autoregressive processes.
(60) Dodd, Kevin W. (1999), Estimation of a distribution function from
survey data. (Joint with Alicia Carriquiry.)
(61) Goyeneche, Juan Jose’ (1999), Estimation of the distribution function
using auxiliary information.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
28 / 29
Ph.D. Theses directed
(62) Kim, Jae-kwang (2000), Variance estimation after imputation.
(63) Wang, Junyuan (2001), Small area estimation in the National
Resources Inventory. (Joint with F. Jay Breidt.)
(64) Qu, Yongming (2002), Estimation for the nonlinear errors-in-variables
model.
(65) Park, Mingue (2002), Regression estimation of the mean in survey
sampling.
(66) Legg, Jason (2006), (Joint with Sarah Nusser), Estimation for
two-phase longitudinal surveys with application to the National
Resources Inventory.
(67) Wu, Yu (2006), Estimation of regression coefficients with unequal
probability samples.
(68) Beyler, Nicholas (2010), (Joint with Sarah Nusser), Statistical
methods for analyzing physical activity data.
(69) Berg, Emily (2010),(Joint with Sarah Nusser), A small area procedure
for estimating population counts.
J.K. Kim (ISU)
Wayne Fuller’s contribution to sample survey
9/19/11
29 / 29