Document 291010

African Journal of Mathematics and Computer Science Research Vol. 6(1), pp. 5-15, January 2013
Available online at http://www.academicjournals.org/AJMCSR
DOI:10.5897/AJMCSR11.136
ISSN 2006-9731 ©2013 Academic Journals
Full Length Research Paper
Some new aspects on imputation in sampling
Diwakar Shukla1, Narendra Singh Thakur2 and Sharad Pathak1
1
Department of Mathematics and Statistics, Dr. H.S. Gour University of Sagar, Sagar (M. P.), 470003, India.
2
Centre for Mathematical Sciences (CMS), Banasthali University, Rajasthan, 304022, India.
Accepted 15 January, 2013
The number of causes that affect the quality of survey and missing data is one of such that keeps
sample incomplete. Many imputation methods are available in literature, and are used to replace
missing observations like Mean method, Ratio method, Compromised method, Ahmed’s method,
Factor-type (F-T) method etc. This paper suggests some new estimation aspects in imputation theory
using both, F-T estimator and Compromised estimator. All these are derived from existing procedure of
usual Compromised method and found efficient, bias controlled, having many properties. In support of
derived results, an empirical study is incorporated over artificial data set showing the efficiency in
favour of the suggested methods.
Key words: Estimation, missing data, imputation, bias, mean squared error (m.s.e.), compromised estimator,
factor-type (F-T) estimator, factor-type compromised imputation (FTCI).
INTRODUCTION
Missing data is one of the most common problems in
sample surveys and various imputation methodologies
are frequently used to substitute these values.
Statisticians have recognized that failure to account for
the stochastic nature of incompleteness in the form of
absence of data can spoil inference. There are three
major incompleteness concepts: Missing at random
(MAR), observed at random (OAR), and parametric
distribution (PD). In what follows in this paper, missing
completely at random (MCAR) is used. Some well known
imputation methods in literature are: deductive
imputation, mean imputation overall (MO), random
imputation overall (RO), mean imputation within classes
(MC), random imputation within classes (RC), hot-deck
imputation, flexible matching imputation, predicted
regression imputation (PR), random regression
imputation (RR), distance function matching etc.
In a contribution, Shukla (2002) suggested Factor-type
(F-T) estimator in two-phase sampling which incorporates
a parameter k combining known auxiliary information
survey components. But k used therein is in the form of
factors (k –1)(k –2); (k –1)(k –4); (k –2)(k –3)(k –4) which
ultimately produces a noble property of getting multiple
choices of k-values maintaining certain standard level of
*Corresponding author. E-mail: [email protected].
mean squared error (m.s.e). As suggested, best k among
all is that also optimizing other characteristic of estimator.
Singh and Horn (2000) proposed a compromised
imputation procedure having a constant α in linear
combination of “available main information” and “imputed
auxiliary information”. The α bears an optimum selection
at the minimum level of m.s.e. but this choice does not
have a control over bias factor. This paper derives
motivation from this source and, using properties of F-T
estimator, presents a new F-T compromised imputation
procedure for survey sampling. Special feature of the
suggested is the same parameter k used twice, in the
imputation part for missing observations as well as in the
linear combination with the known auxiliary part. Some
other useful contributions are due to Rueda and
Gonzalez (2008), Singh et al. (2009) etc. over imputation
techniques.
Y be mean of a finite population for desired
N


1
estimation Y  N  Yi  . A random sample S of size n
i 1


Let
drawn without replacement (SRSWOR) from population
  1,2,3,......, N  to estimate Y . Sample S of n units
contains r responding units r  n forming a set R and
n  r  non-responding units with the sub-space n  r 
having symbol R C . The variable Y is of main interest and
6
Afr. J. Math. Comput. Sci. Res.
X an auxiliary variable correlated with Y. For every unit
Compromised method
i  R , the value y i observed is available. For i  R C ,
y i values are missing and imputed values need to be
Singh and Horn (2000)
imputation procedure
proposed
αn r  yi  1   bˆxi if
yi  
1   bˆxi
if
iR
derived. The
i th value x i of auxiliary variate X is used as
iR .
Assume for sample S, the data x s  xi : i  S  are
a source of imputation for missing data when
C
compromised
(5)
i  RC
known and S  R  R . Under this setup, some wellknown imputation methods in survey sampling literature
are:
The imputation-based estimator, in this case, is
Mean method of imputation
Where  is a suitable constant value such that the
resultant variance is minimum.
C
For
y i define yi as
 yi
y i  
yr
(1)
i  RC
if
(6)
Lemma 1
iR
if

xn 
y COMP   y r  1    y r

xr 

The bias, m.s.e. and minimum m.s.e. of y COMP is (Singh
and Horn, 2000):
The imputation-based estimator of population mean
Y is:

 = 1    1r  1n Y C


(ii) M y

(i) B y COM
ym 
1
 yi  y r
r iR
(2)
COM
2
X
 CY C X


=  1  1 Y 2 C 2   1  1 Y 2 1   2 C 2  21   C C
Y
X
Y
X
r N 
r n
Ratio method of imputation
For
(iii) For optimum 
y i and xi , define yi as

 yi
yi  
ˆ

bxi
Where
bˆ   yi
iR
iR
if
iR
iR

(8)

 , the minimum m.s.e. of

y COMP is given by the expression
if
x

C
 1   Y
CX

(7)
(3)
C

M y COM

min
 1 1   1 1  2  2
       SY
 r N   r n  
= 
(9)
Ahmed methods
i
The imputation-based estimator of
For case where
the jth imputation method, Ahmed et al. (2006) suggested:
Y is:
 xn 
1
y S   yi  y r    y RAT
n iS
 xr 
1
1
Where y r   yi , x r   xi and
r iR
r iR
y ji denotes ith available observation for
(4)
1
x n   xi
n iS
 yi


(A) y  

1i

 1
 n  r 

if
iR
(10)
1
 X

n y r    r y r  if
  x n 

i  RC
Shukla et al.
Under this, the point estimator of
X
t1  y r 
 xn
(11)
 yi


(B)

y 2i  

 1
 n  r 

 xn 
t 2  y r  
 xr 
 y FT 1 i
(12)

 xn
n y r 
 xr






2

 r yr 

i  RC
if
(E)
 y FT 2 i
Y is
2
if
(F)
iR

X
n y r 
 xr






3

 r y r  if

 yi




 y r n 2 k   r 
 n  r 
if
Y is
(15)
Terms 1, 2 and 3 are suitably chosen constants, so as
to keep the variance minimum.
As special cases:
When 3 = 1, then




 yi





 y r n 3 k   r 

 n  r 
suggested
F-T
iR
(15)
if
i  RC
if
iR
(16)
i  RC
if
if
iR
(17)
if
i  RC
Where
i  RC
3
  A  C X  fB x n 
1 k   
;
  A  fB X  C x n 
  A  C x n  fB x r 
 2 k   
;


A

fB
x

C
x
n
r


  A  C X  fB x r 
 3 k   
;


A

fB
X

C
x
r


A  k  1k  2; B  k 1k  4;
C  k  2k  3k  4; f 
n
0k
N
(16)
Under Equations 15, 16 and 17, point estimators of
and when 3 = -1, then
 xr
t Pr oduct  y r 
X
 y FT 3 i
(14)
Under this, the point estimator of
X
t Ratio  y r 
 xr
 yi





 y r n1 k   r 
 n  r 

(13)
 yi


(C) y  

3i

 1
 n  r 





Shukla and Thakur (2008) have
imputation procedure. For this case:
(D)
iR
if
Under this, the point estimator of
X
t3  y r 
 xr
Factor-type methods of imputation
Y is
1




7




TFT 1  y r 1 k 
(17)
This is natural analogue of ratio estimator called the
product estimator used when an auxiliary variate x has
negative correlation with y.
TFT 2  y r  2 k 
TFT 3  y r  3 k 





As special cases, when
Y is
(18)
k  1, l  1 then TFTl  t l
8
Afr. J. Math. Comput. Sci. Res.
When
k  2, l  1 then TFTl  t l
techniques are in Remark 3.
When
Remark 3
l  1,2,3
k  4, l  0 then TFTl  t l  y r ;
The available information
y i in sample of size n may be
affected by some serious non-sampling errors like:
PROPOSED ESTIMATOR
Using Singh and Horn (2000), F-T compromised
imputation (FTCI) estimator is  j  1,2,3 :
 y i  j
 kn 
  yi  1  k  j k 
  r 
1  k  k 
j

if i  R
if i  R
(19)
C
  A  C X  fB x n 
Where, 1 k   y r 
;
  A  fB X  C x n 
  A  C x n  fB x r 
 2 k   y r 
;
  A  fB x n  C x r 
  A  C X  fB x r 
 3 k   y r 
;
  A  fB X  C x r 
A  k  1k  2 ; B  k  1k  4 ;
C  k  2k  3k  4 ; 0  k   .
Remark 1
Singh et al. (2001) have suggested a hybrid of calibration
and imputation method in the presence of random nonresponse in survey sampling. Estimation of the
population mean associated with the proposed hybrid
method remains unbiased under a design based
approach. Liu et al. (2005, 2006) have presented new
imputation methods developed for the treatment of
missing data, which remove the bias of usual ratio
imputation. Singh (2009) has a useful contribution in the
area of imputation techniques for missing data by
proposing a new method of imputation in survey
sampling.
(i) Biasness in recording by investigator,
(ii) Under estimation or over estimation in reporting facts
by respondents,
(iii) Partial unwillingness or lack of interest in answering
question by respondent,
(iv) Deliberate miss reporting by respondents,
(v) False response/answer recording etc.
If available data is affected by these non-sampling errors,
then there in need to also adjust available y i values.
Sometimes, auxiliary information correlated to main
variable Y is almost error free, for example, in an
economic survey of expenditure (Y) of military soldiers,
the income (X) record is correctly available through salary
pay roll but expenditure in houses may be affected by
serious non-sampling errors even if response is made.
So, one can adjust partially the available expenditure
data by the auxiliary income data to get better picture of
available sample. There are many smoothing statistical
techniques available in the literature useful over available
sample data. Moreover, it is a well proved fact that
sample mean of available data is also highly affected by
extreme observations, outliers which need to be tackled
by correlated information. In light of these, one can also
think of adjusting the responding units. The estimator of
Singh and Horn (2000) has dual property that it
smoothens the outliers in available data as well as
imputes the missing observations.
Remark 4
The proposed estimator is in the setup of SRSWOR. One
can think of extending the same in the general setup of
sampling design with multivariate structure.
Properties of
(i) At k = 1; A = 0; B = 0; C = - 6
1 1  y r
Remark 2
In the contribution of Singh and Horn (2000), a point is
raised by Singh and Deo (2003) that the adjustment in
responding units y i is not allowed. The present paper also
bears this point. Some arguments in favour of proposed
 j k 
X
;
xn
2 1  y r
xn
; 3 1  y r X
X
xr
(ii) At k = 2; A = 0; B = -2; C = 0
xn ;
X ;
X
1 2  y r
2 2  y r
3 2  y r
xn
xr
X
(iii) At
k
= 3;
A = 2;
B = - 2;
C =
0
Shukla et al.
9
Table 1. Some special cases of FTCI.
Estimators
y 
k
y 
y 
FTCI 2
FTCI 1
k=1
yr
yr
yr
k=2

xn 
y r  2  
X


xr 
y r  2  
xn 


xr 
y r  2  
X

k=3


2 X  f xn
y r  3 
1  f X



 2 xn  f xr
y r  3 
1  f x n



yr
k=4
 X  f xn 
 xn  f xr 
; 2 3  y r 
;
 1  f X 
 1  f x n 
 X  f xr 
3 3  y r 

 1  f X 
1 4  2 4  3 4  y r (Table 1)
  k y
r

 1  k  j k  ,
j  1,2,3
=

1
kn y r  1  k  j k 

n
iS

=
1
kn y r  n1  k  j k 
n
FTCI
=
j


S
j
r

 1  k  j k 
BIAS AND MEAN SQUARED ERROR
(20)
Let B(.) and M(.) denote, respectively the bias and m.s.e.
of an estimator under a given sampling design. Under the
large sample approximations, let
y r  Y 1  , x r  X 1  , x n  X 1  
Proof
y  = y 




1  kn
  yi  1  k  j k    j k 
n  r iR
iR C
 iR

FTCI
Y under FTCI is:

yr
y  = k y
Theorem 1
FTCI j

2 X  f xr
y r  3 
1  f X



=
(iv) At k = 4; A = 6; B = 0; C = 0
The point estimator of


yr
1 3  y r 
y
FTCI 3
j

1
  y i  j
n iS

1
   yi  j  C  yi  j 
n iR
iR



1   kn 
=  
 yi  1  k  j k    1  k  j k 
n  iR  r 
 iRC

Using the concept of two-phase sampling following,
Cochran (2005), Heitjan and Basu (1996), and Rao and
Sitter (1995), and with the mechanism of MCAR, for
given r and n, we have
E  E  E  0 ,
 
 
 
E  2  M 1CY2 , E  2  M 1C X2 , E 2  M 2 C X2 ;
E   M 1CY C X , E   M 2 C X2 , E   M 2 CY C X
10
Afr. J. Math. Comput. Sci. Res.
Where
1 1 
1 1 
M 1    , M 2    , M  M 1  M 2 
r N 
n N 

(by first order only)


= Y 1  k  1  k  1  k 1   2   1   2  2 2 
Some specific symbols
fB
C
, 2 
,
A  fB  C
A  fB  C
AC
A  fB
3 
, 4 
,
A  fB  C
A  fB  C
1 
P  1   3  , Q   2   4  ,
1   3    2   4   1 ,

=
Y 1    1  k 1   2     2 2 1  


2
2
a2 : Bias of y

B y FTCI
2
  =  YM  C

a3 :

(22)
  = Ey   Y 
FTCI 1
1
  = Y 1           Y 
2
2
1
2

y 
The m.s.e of
FTCI 1
M y FTCI

  = Y M C
Proof

2
1
1
M y FTCI

1

 CY C X
up to some level of
2
Y

 M 2  2 C X2  2CY C X
  = Ey   Y 
2
1
FTCI 1

 
= E Y 1          2 2  Y
= Y 1    k  1  k 1  11   2 

2
X
approximations:
1  1 

1   2

2
=  Y M 2  2 C X  CY C X

 A  C X  fB X 1   
= Y 1   k  1  k 
 A  fB X  C X 1   



B y FTCI

 1  k 1 k 

 A  C X  fB x n 
k y r  1  k  y r
 A  fB X  C x n 


2
1
upto
(21)
2
= Y 1   k  1  k 
is up to first order approximation is:
Using Equation 21 and taking expectations
y  = Y 1          
r

FTCI 1
B y FTCI
Proof
first order of approximation is:
FTCI 1


FTCI 1
a1  : The estimator y FTCI 1 in terms of  ,  and 
y  = k y

Y 1   1  1  k 1   2     2 2
Theorem 2
FTCI 1

=
y  = Y 1           
  1   2    3   4  ,   1  k  ,
C
V  Y .
CX
=

= Y 1   k  1  k 1  1   2   1   2  2 2 
Remark 5
Proof

= Y 1    k  1  k  1   2    22 2  1   1  2 2

= Y 2 E        2 2 2

= Y 1    k  1  k 1  1  1   2   22 2   23 3  ...
=
Y E  
2
2
2
 (23)
Shukla et al.
=
2


Y M 1CY2  M 2  2 C X2  2CY C X

  
=
M
1
CY
 V
CX
FTCI 1
FTCI 1 min
(24)
 
d
M y FTCI
d

1
a10 :
a11  :

1 min
M
=
1

a12 :
2
Y

,  and  up to
3
1
3
upto
(29)
is
2

The m.s.e of y FTCI
  =Y
2
3
2
X

3
 CY C X

is

Minimum m.s.e. of the estimator

C 

X

M y FTCI


3 min
=
(30)
M 1 CY2   2 C X2  2CY C X
     Y   V
C
in terms of
FTCI 2

  =  YM  C

Theorem 3
a5  : The estimator y

M y FTCI
 M 2 S
2
 ,  and 
2
Bias of y FTCI
B y FTCI
CY
Substituting   
in Equation 23, we get
CX

in terms of
y  = Y 1          

0
CY
CX
M y FTCI

FTCI 3
first order of approximation is:
2


a9  : The estimator y
FTCI 3
 M 22 SY2
Proof
 
Theorem 4
y  holds at

with expression: M y
a4  : Minimum m.s.e. of the estimator
11

(31)
y 
FTCI 3
while
holds and expression is given by

M 1 SY2 1   2

(32)
first order of approximation is:
Remark 6
y  =

FTCI 2

Y 1             2  4   22  42
a6  :


Bias of y FTCI
B y FTCI
a7 :
2
2
2

2
1
2
Y


C 

X
M y FTCI

=
M
k 4   f  V k 3  4 f  15   f  8V k 2   f  10  5 f  23V k

(27)
y  
is on
 M 2CX2  2CY CX 

 M S
2
1
CY
 1  k 1   2   V
CX
 fB  C   V
 1  k 
 A  fB  C 
On simplification, we get polynomial of degree four:
FTCI 2
2
Y
 4 f  24  4 f  22V   0
(28)
(33)
Remark 7
Reddy (1978) has shown that quantity V  
and expression is

2 min
(26)
is
Minimum m.s.e. of the estimator
     Y   V
C

 CY CX
FTCI 2
  = Y M C
2
2
X
y  
The m.s.e of
For minimum m.s.e., we have
  V  
is
  =  YM  C
M y FTCI
a8  :

 (25)
CY
CX
is
stable over moderate length time period and could be
initially known or guessed by past data. Therefore, pair (f,
V) be treated as known and Equation 33 generates
12
Afr. J. Math. Comput. Sci. Res.
n
n
 r  for 0  f  1 (37)
2 f
2
maximum of four roots (some may imaginary) on which
optimum level of m.s.e. will be attained.
D4  0 
Remark 8
This generally does not happen until huge non-response
occurs.
The Equation 33 has only unknown k in the power of
order four, other V and f by Reddy (1978) are known in
advance. We can solve Equation 33 for obtaining optimal
estimator in the suggested class.
(5)

D1  M y FTCI



1 min


 M y FTCI


FTCI 2
(34)

D1  0  r 

1 min
D2  0  M SY  0
2

 M y FTCI


 until n = r

D3  0


2 min

3
(35)
is always better

 M y FTCI
y  is always better than
 until n = N. But y  is better than
 therefore, estimation strategy y  is most
FTCI 2
FTCI 3
FTCI 2
FTCI 1
FTCI 3

D4  M y COMP

min
2 min

(38)

D6  M y COMP

min

 M y FTCI

3 min
(39)
which is always true.
y  and y 
FTCI 1
FTCI 3
are better offer over


3
is best.

 M y FTCI

1 min
One can think of comparing the proposed with other
existing like Hot-deck, Cold-deck, Nearest neighbour, and
Regression imputation methods. But all these are based
on replacing by single unit
to single missing unit of
sample. The proposed is on replacing single unit by an
average of random sub-group units of auxiliary variate.
The comparison environment in proposed case is
different.
Remark 10
It may be interesting to derive the Horvitz-Thompson
estimator based on imputation data and compare with the
proposed. It is an open problem to extend the content
further.
3 min
(36)
preferable over other two.
(4)

N n
Which is always true. So,
y
y
3 min
1
D3  M y FTCI
(3) Let

2
Which is always positive. So, y FTCI
than y FTCI


Remark 9
D1  0 if r   n  which usually happens and high
2
chance for D1  0 .


 M y FTCI
compromised imputation procedure and y FTCI
1
n
2  f 
D2  M y FTCI

It seems
 if
If population N is large, f is small then working rule is;
(2) Let
min
D6  0  M 2  2 SY2  0
 M  M 2  2 SY2
is better than y FTCI


 M 1  M 2 SY2  M 1  M 2 SY2  0
(6)
2 min
 M 1  M 2  2 SY2  M 1  M 2 SY2
y 

D5  M y COMP
Therefore, both are equally efficient.
COMPARISON OF THE ESTIMATORS
(1) Let
r
EMPIRICAL STUDY
The attached Appendix A has generated artificial
population of size N = 200 containing values of main
variable Y and auxiliary variable X. Parameter of these
are given as follows:
Y = 42.485; X = 18.515; S Y2 = 199.0598; S X2 = 48.5375;
 = 0.8652; C X = 0.3763; CY = 0.3321.
Using random sample SRSWOR of size, n = 30; r = 22;
Shukla et al.
13
Table 2. Bias and optimum m.s.e. at k = ki (i =1,2).
Estimator
y COMP
y
y
y
y
y
y






FTCI 1 k1
FTCI 1 k 2
FTCI 2 k1
FTCI 2 k 2
FTCI 3 k1
FTCI 3 k 2
Bias (.)
0.0150
(at αopt = 0.2365)
M(.)
6.2504
(at αopt = 0.2365)
0.0079
3.8254
0.0109
3.8254
0.0034
6.2504
0.0046
6.2504
0.0113
2.0225
0.0156
2.0225
f = 0.15,  = 0.2365. Solving optimum condition
  V (Equation 33), the equation of power four in k
provides only two real values k1 = 0.8350; k 2 =4.1043.
Rest other two roots appear imaginary (Table 2).
ALMOST UNBIASED FTCI ESTIMATOR:

 0
If
B y FTCI

22
 YM 2  2 C X2  CY C X = 0 Using Equation
When
1


and
k  k 4' 

Case (iii)
When
 C

[ AV + fBV + (V – 1)C ]= 0
2
2
X
(40)
 2  
CY
V
CX
(41)
Equation 41 is a cubic equation in k, solvable for fix f and
guess value V, and one can obtain three values k
 k1'' ;
imputed unbiased to the first order of approximation.
At seven values
Case (i)


 CY C X = 0
k  k 2'' and k  k 3'' to make the proposed estimator
  0  1  k   0
1  k   0

1
5  f   f 2  6 f  1
2
k  1  k1'
k1' , k 2' , k 3' , k 4' , k1" , k 2" and k 3" , bias is
almost zero or completely zero. Using data set in
Appendix A, the k-values are compared in Table 3. The
best choice is that having the lowest m.s.e.
Case (ii)
DISCUSSION
0 

1   2  0
The estimator y COMP of Singh and Horn (2000) bears
fB
C

0
A  fB  C A  fB  C

k  k 2'  4
k  k3' 
1
5  f  
2

[since (k - 4) = 0]

f 2  6 f 1
bias 0.0150, m.s.e. 6.2504 at  opt
 0.2365 . But, bias is
not at minimal level. Proposed estimators have shown
two choices k1, k2 on a data set for known (f, V) at which
m.s.e. is minimum and there are open options to choose
that optimum k-value having the lowest bias. These two
choices may extend to the maximum four on some data
sets. So, FTCI estimators have an upper hand over
Compromised estimator of Singh and Horn (2000) in
14
Afr. J. Math. Comput. Sci. Res.
Table 3. Almost unbiased FTCI.
y  
k-Values
'
1
= 1.0000
k 2'
= 4.0000
k 3'
= 3.2682
k 4'
= 1.8819
k1"
= 1.5743
k 2"
= 2.3355
k 3"
= 8.8231
k
y  
y  
FTCI 2
FTCI 1
FTCI 3
Bias (.)
M (.)
Bias (.)
M (.)
Bias (.)
M (.)
0
8.0424
0
8.0424
0
8.0424
0
8.0424
0
8.0424
0
8.0424
0
8.0424
0
8.0424
0
8.0424
0
8.0424
0
8.0424
0
8.0424
-0.00004
13.1265
-0.00004
10.1586
-0.00006
15.4300
-0.0004
56.7904
-0.0001
28.8276
-0.0006
77.7691
-0.0005
315.6901
-0.0002
139.5245
-0.0009
447.2762
terms of efficiency comparison. The
y 
FTCI 2
is equally
efficient to y COMP in terms of m.s.e. but has unique
option of lower bias. Moreover, estimator
y 
FTCI 3
is
more efficient than all others used herein. Another
observation is that, FTCI could be made almost unbiased
also over seven different choices of k-values, the best is
that having lowest m.s.e. Thus, FTCI has wider prospects
of superiority over Singh and Horn (2000). One more
interesting feature is that FTCI contains only one
parameter k, same as used by Singh and Horn (2000),
but mixed up in mathematical structure in such a way so
as to generate more efficient properties.
ACKNOWLEDGEMENT
Authors are thankful to the referee for the critical
comments and useful suggestions which has improved
the quality of the manuscript.
REFERENCES
Ahmed MS, Al-Titi O, Al-Rawi Z, Abu-Dayyeh W (2006). Estimation of a
population mean using different imputation methods. Stat. Transit.
7(6):1247-1264.
Cochran WG (2005). Sampling Techniques, John Wiley and Sons, New
York.
Heitjan DF, Basu S (1996). Distinguishing „Missing at random‟ and
„missing completely at random‟. Am. Stat. 50:207-213.
Liu L, Tu Y, Li Y, Zou G (2005, 2006). Imputation for missing data and
variance estimation when auxiliary information is incomplete. Model
Assist. Stat. Appl. pp. 83-94.
Rao JNK, Sitter RR (1995). Variance estimation under two-phase
sampling with application to imputation for missing data. Biometrica
82:453-460.
Reddy VN (1978). A study on the use of prior knowledge on certain
population parameters in estimation. Sankhya C40:29-37.
Rueda M, González S (2008). A new ratio-type imputation with random
disturbance. Appl. Math. Lett. 21(9):978-982.
Shukla D (2002). F-T estimator under two-phase sampling. Metron
59(1-2):253-263.
Shukla D, Thakur NS (2008). Estimation of mean with imputation of
missing data using factor type estimator. Stat. Transit. 9(1):33-48.
Singh GN, Kumari P, Jong-Min K, Singh S (2010). Estimation of
population mean using imputation techniques in sample survey. J.
Korean Stat. Soc. 39(1):67-74.
Singh S (2009). A new method of imputation in survey sampling.
Statistics 43(5):499-511.
Singh S, Deo B (2003). Imputing with power transformation. Stat. Pap.
44:555-579.
Singh S, Horn S (2000). Compromised imputation in survey sampling.
Metrika 51:266-276.
Singh S, Horn S, Tracy DS (2001). Hybrid of calibration and imputation:
Estimation of mean in survey sampling. Statistics 61:27-41.
Shukla et al.
Appendix A. Artificial population (N = 200).
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
45
15
40
29
36
15
35
13
65
27
67
25
50
15
61
25
73
28
23
07
57
31
33
13
30
11
37
15
23
09
37
17
45
21
32
15
35
19
52
25
50
20
55
35
43
20
41
15
62
25
70
30
69
23
72
31
63
23
25
09
54
23
33
11
35
15
37
16
20
08
27
13
40
23
27
13
35
19
43
19
39
23
45
20
68
38
45
18
68
30
60
27
53
29
65
30
67
23
35
15
60
25
20
07
20
08
37
17
26
11
21
11
31
15
30
14
46
23
39
18
60
35
36
14
70
42
65
25
85
45
40
21
55
30
39
19
47
17
30
11
51
17
25
09
18
07
34
13
26
12
23
11
20
11
33
17
39
15
37
17
42
18
40
18
50
23
30
09
40
15
35
15
71
33
43
21
53
19
38
13
26
09
28
13
20
09
41
20
40
15
24
09
40
20
31
15
35
17
20
11
38
12
58
25
56
25
28
08
32
12
30
17
74
31
57
23
51
17
60
25
32
11
40
15
27
13
35
15
56
25
21
08
50
25
47
25
30
13
23
09
28
8
56
28
45
18
32
11
60
22
25
09
55
17
37
15
54
18
60
27
30
13
33
13
23
12
39
21
41
15
39
15
45
23
43
23
31
19
35
15
42
15
62
21
32
11
38
13
57
19
38
15
39
14
71
30
57
21
40
15
45
19
38
17
42
25
45
25
47
25
33
17
35
17
35
17
53
25
39
17
38
17
58
19
30
09
61
23
47
17
23
11
43
17
71
32
59
23
47
17
55
25
41
15
37
21
24
11
43
21
25
11
30
16
30
16
63
35
45
19
35
13
46
18
38
17
58
21
55
21
55
21
45
19
70
29
39
20
30
11
54
27
33
13
45
22
27
13
33
15
35
19
35
18
40
19
41
21
37
19
15