Approaches to Weighting Data from Dual-Frame Surveys Darren Pennay

Approaches to Weighting Data from
Dual-Frame Surveys
Darren Pennay
Social Research Centre Pty Ltd
Michele Haynes, Mark Western, Bernard Baffour
Institute for Social Science Research, UQ
Social Research Centre Workshop (17 July 2012)
PHONE COVERAGE IN AUSTRALIA
• Currently 22.5 million people reside in Australia
– 16 million adults
– 8.5 million households.
• Approximately 19% of adults (3 million) live in mobile phone only
households in Australia1.
• An increase of almost 5% in 12 months.
• Landline sampling frames result in non-coverage of 21% of
households.
1
Australian Communications Media Authority (2011)
CRICOS Provider No 00025B
DEMONSTRATION SURVEY 2010
• First Australian dual – frame survey conducted in September 2010
– reported at 2010 ACSPRI Social Science Methodology Conference
• This was a Demonstration Survey implemented by SRC & ISSR.
• Questions on demographics and social issues sourced from reputable
questionnaires and scales.
• Sampling design used 2 telephone sampling frames
– A landline RDD sample (400 interviews)
– A mobile phone sample (300 interviews).
• Participants from each frame were asked whether they used
– Landline only, Landline mostly, Mixed, Mobile mostly, Mobile only.
CRICOS Provider No 00025B
Landline Sample
Telephone
usage
Landline only
Landline mostly
Mixed
Mobile mostly
Mobile only
Not determined
Mobile Sample
Total
(LL)
Landline
& Mobile
LLO
Total
Mobile & LL
MPO
(n=400)
%
(n=342)
%
(n=58)
%
(n=300)
%
(n=215)
%
(n=83)
%
a
28.1
31.0
29.1
11.8
b
43.1
40.5
16.4
c
100.0
-
d
18.2
24.4
32.9
e
23.6
31.7
42.8
f
-
-
-
-
22.7
1.9
1.9
100.0
-
Of those interviewed via landline sample, only 11.8% use ‘mobile mostly’
Of the landline sample dual users, only 16.4% use their ‘mobile mostly’
CRICOS Provider No 00025B
• Sample profile showed considerable differences in characteristics of
those who primarily used mobile and landline phones.
• Compared to landline users, mobile phone users are more likely to be
– Younger, reside in a capital city, born outside Australia, renting, living in a
group household, students, employed.
• Conclusion from demonstration dual-frame survey
–
–
–
–
80% of the adult population have mobile phones.....BUT
Dual-user respondents from landline and mobile frames are very different
Low chance of ‘mobile mostly’ user from a landline sampling frame
Response mechanism differs for landline & mobile sampling frames.
• Has implications for combining estimates.
CRICOS Provider No 00025B
THE OMNIBUS SURVEY 2011
• The larger omnibus survey was administered by SRC in Dec 2011.
• The sampling design again used 2 telephone sampling frames:
– A landline RDD sample proportionally stratified by geographical location
across Australia (1,012 interviews)
– A mobile phone RDD sample (1,002 interviews).
• Survey questions were provided by subscriber organisations.
• Response rates were 22.2% for the landline frame and 12.7% for the
mobile phone frame.....people with mobile phones are less accessible.
• Once again, the sample profile was very different for the landline and
mobile frame dual-users.
CRICOS Provider No 00025B
SAMPLE PROFILE
Selected
Characteristics
ABS
Pop’n
2011
Landline Frame
Mobile Frame
Landline only
(n=174)
Dual-user
(n=838)
Mobile only
(n=295)
Dual-user
(n=707)
%
%
%
%
%
Male
49.3
35.1
36.9
56.9
50.4
Female
50.7
64.9
63.1
43.1
49.6
18-24
12.8
2.3
3.7
23.1
19.0
25-39
28.3
6.9
17.1
48.1
26.9
40-49
18.0
6.3
20.9
8.8
18.8
50-64
24.0
24.7
32.7
15.6
27.2
65+
16.9
59.8
25.7
4.4
8.2
-
17.2
23.4
73.2
41.9
Gender
Age group (years)
Time in neigh’hood
5 years or less
CRICOS Provider No 00025B
APPROACHES TO WEIGHTING
•
•
Purpose: to combine the samples from each frame to produce
unbiased estimates for all adults that can be reached by telephone.
A dual frame estimator needs to combine data from the
–
–
–
landline only sample
mobile phone only sample, and
the overlapping sample from both frames.
3-Stage Weighting Strategy
1. Design weights
To adjust for sampling error (inverse of probability of selection)
2.
Poststratification weights
To adjust for non-response bias in the population
3. Composite weights
To produce a weighted average of the two dual-user sample estimates.
CRICOS Provider No 00025B
THE WEIGHTS
1. Design weights
The probability that an individual is selected into the sample depends on their
probability of being in the landline sample or mobile phone sample, less the
probability of being in both (Best, 2010):
S
LL   S MP   S LL LL S MP 
  
  

Pind   LL 


 U LL AD   U MP   U LL AD U MP 
•
ULL= 7,228,117
estimated universe of residential phones
UMP=15,334,107
estimated universe of mobile phones
SLL= 1,012
number of interviews with landline phone
SMP= 1,002
number of interviews with mobile phones
LL = number of landlines in household
AD = number of in-scope adults in household
In Australia, we can only estimate ‘universe’ from ABS and ACMA reports
CRICOS Provider No 00025B
STAGE 2 WEIGHTS
2. Poststratification weights (by raking)
–
–
–
Have shown that non-response mechanism differs by sampling frame
BUT ... data needed for poststratification by telephone usage domain is
not available in Australia
Have poststratified to population characteristics only
Gender: male, female
Location: Capital city, rest of state or territory
Age x education: 5 age categories by tertiary degree (or not)
Birthplace: English speaking background or not
Telephone status: mobile only (19%), dual user (72%), landline only (9%)
•
•
Do not account for non-response due to inaccessibility.
Is there an alternative approach when domain characteristics are
unknown?
CRICOS Provider No 00025B
STAGE 3 WEIGHTS
3. Composite Weights
– A weighted average of the dual-user estimates from each frame
– We have poststratifed and then averaged (rather than other way round)
– Both approaches are unbiased and consistent in the absence of nonsampling errors (Brick, 2011).
A = landline
B = mobile phone
frame
• The composite estimator is
y  y a  yb  y 
a
where
B
y  yabA  (1   ) yab
yabA
b
B
yab
are non-response adjusted estimators for dual-users
and
from frame A and frame B, respectively.
CRICOS Provider No 00025B
ab
Choice of
 , the compositing factor
• Lambda can be fixed or vary with quantity being estimated.
• Most researchers use a fixed λ = 0.5 as probability of selecting a
person in sample A is similar to probability of selection in sample B.
• We also use λ = 0.68, where probability of selecting a person from
landline frame is twice as high, relative to mobile frame.
Remember: ULL= 7,228,117, UMP= 15,334,107,
2.2 adults per household
Is


OR
CRICOS Provider No 00025B
S LL
S LL U LL
 0.68
U LL  S MP U MP
S LL
S LL (2.2  U LL )
 0.49
(2.2  U LL )  S MP U MP
OPTIMAL LAMBDA
• Or could calculate an optimal value of λ which minimises the variance
of the quantity Yˆ being estimated
ˆB)
Var
(
Y
ab
ˆ 
Var(YˆabA )  Var(YˆabB )
ˆ
• So λ close to optimal  will have a small effect on the variance,
but the bias may be more sensitive.
• Brick et al. (2011)
– show that λ influences the bias and the variance of the estimator
– propose that an alternative is to choose the compositing factor to
eliminate bias of the average estimator.
CRICOS Provider No 00025B
RESULTS FOR OMNIBUS SURVEY
Variable
Sex
Age
Country of Birth
Degree Status
Tenure
Living Arrangement
Time in neighbourhood
Anxiety or Depression
Hours of TV watched
Smoking Status
Belief in Climate Change
Part-time work (under 35 hours per week)
CRICOS Provider No 00025B
Optimal Choice of λ
0.50
0.49
0.55
0.49
0.57
0.53
0.58
0.57
0.47
0.55
0.50
0.55
RESULTS
Estimated proportion of adults by sex, age, degree & weighting scheme
Weighted
Unweighted and raked to
total
population
Landline
frame only
(raked)
Mobile
frame only
(raked)
Composite
with λ=0.68
Composite
with λ=0.5
Sex
Male
Female
0.444
0.566
0.493
0.507
0.404
0.597
0.572
0.428
0.488
0.512
0.502
0.498
Age
18-24
25-39
40-49
50-64
65-74
75+
0.118
0.242
0.171
0.276
0.129
0.065
0.128
0.283
0.180
0.240
0.117
0.052
0.042
0.187
0.198
0.282
0.184
0.106
0.204
0.368
0.164
0.203
0.057
0.005
0.119
0.293
0.163
0.232
0.130
0.063
0.134
0.301
0.160
0.226
0.122
0.057
Degree status
Degree +
0.328
0.185
0.162
0.205
0.173
0.178
CRICOS Provider No 00025B
RESULTS/2
Estimated proportion (SEs) of adults by degree, employment, housing
tenure & weighting scheme
Weighted by
Unweighted population
total
raking
Landline
frame only
(raked)
Mobile
frame only
(raked)
Composite
with λ=0.68
Composite
with λ=0.5
Employment
Employed
0.649
0.663 (0.012) 0.592 (0.017) 0.725 (0.016) 0.630 (0.013) 0.641 (0.013)
Tenure
Own
Mortgage
Rent
0.338 (0.012) 0.489 (0.018) 0.203 (0.014) 0.340 (0.013) 0.322 (0.013)
0.339 (0.012) 0.359 (0.018) 0.321 (0.017) 0.308 (0.013) 0.306 (0.013)
0.323 (0.012) 0.152 (0.013) 0.476 (0.018) 0.352 (0.014) 0.372 (0.014)
CRICOS Provider No 00025B
0.378
0.329
0.293
RESULTS/3
Estimated proportion (SEs) of adults by employment, housing tenure
& weighting scheme
Weighted by
ABS Census population
2011
raking
Landline
frame only
(raked)
Mobile
frame only
(raked)
Composite
with λ=0.68
Composite
with λ=0.5
Employment
Employed
Unavailable 0.663 (0.012) 0.592 (0.017) 0.725 (0.016) 0.6301 (0.013) 0.641 (0.013)
Tenure
Own
Mortgage
Rent
0.321
0.349
0.296
1,2Significantly
3
0.338 (0.012) 0.489 (0.018) 0.203 (0.014) 0.340 (0.013) 0.322 (0.013)
0.339 (0.012) 0.359 (0.018) 0.321 (0.017) 0.308 (0.013) 0.3062 (0.013)
0.323 (0.012) 0.152 (0.013) 0.476 (0.018) 0.352 (0.014) 0.3723 (0.014)
different to estimate from raked data at p-value=0.06
Significantly different to estimate from raked data at p-value=0.008
CRICOS Provider No 00025B
RESULTS/4
Estimated proportion of adults by degree, employment, housing
tenure & weighting scheme
Anxiety or Depression
Yes
Hours of TV watched
5 hours or more
Smoking Status
Yes,daily
Believe in climate
change
Yes
Living arrangement
Group household
Time in neigh’hood
Less than 5 years
CRICOS Provider No 00025B
Unweighted
total
Weighted by
population
raking
Composite
with λ=0.68
Composite
with λ=0.5
Composite
with optimal λ
0.192
0.198
0.199
0.204
0.207
0.108
0.108
0.113
0.114
0.114
0.154
0.173
0.180
0.182
0.183
0.786
0.774
0.769
0.778
0.778
0.088
0.010
0.116
0.122*
0.124*
0.366
0.338
0.397
0.413*
0.421*
WHERE TO FROM HERE?
• There is little to no population information on characteristics of mobile
phone usage in Australia.
• How can we improve on poststratification weights to account for nonresponse
– Due to inaccessibility to mobile phone users?
– In different telephone usage domains?
• Should the average estimator be applied before or after
poststratification?
• What is the best choice of compositing factor for dual-frame telephone
surveys in Australia?
CRICOS Provider No 00025B
REFERENCES
•
Best, J. (2010). First-stage weights for overlapping dual frame telephone surveys.
Presented at AAPOR’s 65th Annual Conference, Chicago, IL.
•
Brick, J.M., Cervantes, I.F., Lee, S. and Norman, G. (2011). Nonsampling errors in dual
frame telephone surveys. Survey Methodology. 37(1), pp.1-12.
•
Brick, J., Dipko, S., Presser, S., Tucker, C., Yuan, Y.(2006). Nonresponse bias in a dual
frame sample of cell & landline numbers. Public Opinion Quarterly. 70(5), pp.780-793.
•
Lohr, S.L. (2010). Dual frame surveys: Recent developments and challenges.
Proceedings of the 45th Meeting of the Italian Statistical Society.
•
Lohr, S.L. and Rao, J.N.K. (2000). Inference from dual frame surveys. Journal of the
American Statistical Association. 95(449), pp.271-280.
•
Pennay, D.W. and Vickers, N. (2012). Dual-frame Omnibus Survey. Technical and
Methodological Summary Report, Social Research Centre Pty Ltd, Melbourne,
Australia.
CRICOS Provider No 00025B