Document 274513

Economic and Social Review, Vol. 10, No. 2, January, 1979, pp.
169-174.
Notes a n d C o m m e n t s
R A N S A M : A R a n d o m Sample Design
for I r e l a n d
BRENDAN J. WHELAN
The Economic and Social Research Institute,
Dublin
I INTRODUCTION
T
he purpose o f this paper is t o describe R A N S A M , the computer-based
system for drawing r a n d o m samples f r o m the Electoral Register, w h i c h
has been developed over the past few years at E S R I . This system, w h i c h is
updated each year b y reference to the most recent Electoral Register, permits
us t o offer a sample selection service t o those contemplating carrying o u t a
survey o n a local (e.g., regional) or national level. The computer costs i n
selecting samples are quite l o w , and the names and addresses can b y t y p e d at
reasonable cost. Such samples have the advantage that we can ensure that the
selected respondents have n o t cropped up i n any o f the surveys conducted
by the E S R I i n the current year.
This paper first discusses the use o f the Electoral Register as a sampling
frame. I t then goes on t o describe the main features o f R A N S A M and the
reasons w h y these features have been included. I t should be n o t e d that
t h r o u g h o u t the paper we are concerned w i t h r a n d o m or p r o b a b i l i t y samples
(i.e., samples i n w h i c h the p r o b a b i l i t y o f inclusion o f each element is calcul1
1. In designing the system, I have benefited from the ideas and experiences of several research workers
who have drawn samples at the E S R I in the past ten years. The contributions by J . O'Mcara, C . A.
O Muircheartaigh and R. Wiggins have been especially helpful.
able). Non-random samples (e.g., quota samples) can also be based o n
present selection system. For instance, we sometimes use R A N S A M
select a random sample o f names and addresses as starting points for
interviewers and instruct t h e m t o call o n the selected respondent and
say, every t w e n t i e t h house thereafter.
the
to
the
on,
I I T H E E L E C T O R A L R E G I S T E R AS A S A M P L I N G F R A M E
The Electoral Register, w h i c h comes i n t o force on 15 A p r i l o f each year,
lists the names and addresses o f all those eligible t o vote i n Dail elections
(resident citizens) and those eligible t o vote i n local government elections (all
residents). Persons eligible to vote i n local government elections only are
indicated b y the letter L . The Register is compiled b y the franchise depart­
ments o f the c o u n t y councils and c o u n t y borough councils.
The Register is published i n the f o r m o f " b o o k s " , each relating t o " p o l l ­
ing districts", i n w h i c h the electors are numbered sequentially from 1 t o n ,
where n is the t o t a l number o f people i n the b o o k . The c o u n t y , constituency,
and District Electoral Divisions (DEDs) i n w h i c h the polling district is
located are shown. I n general, the polling districts are n o t sub-divisions o f
the DEDs — one p o l l i n g district may contain parts o f several DEDs. This is
n o t the case i n the urban areas o f Cork and D u b l i n where there is a one-toone correspondence between DEDs (or wards) and the p o l l i n g districts.
Kish ( 1 9 6 5 , p . 53) describes a perfect sampling frame as one i n w h i c h
"every element appears o n the list separately once, only once, and n o t h i n g
else appears on the l i s t " . Measured against this yardstick, the Irish Electoral
Register has certain deficiencies.
Age Restriction: Under current law, the Register excludes those under 18
years o f age. Thus, a sample w h i c h includes persons under 18 cannot be
obtained directly f r o m the Register.
Missing Elements: Individuals over 18 w h o are resident i n the countrymay fail t o appear o n the Register. Most o f these are people w h o have only
just reached their 1 8 t h b i r t h d a y or w h o have o n l y recently moved i n t o the
district.
Superfluous Elements: The Register includes the names o f some deceased
electors and o f some w h o have left the district w i t h o u t cancelling their
previous registration.
Despite these difficulties, the Electoral Register is still the most appro­
priate list for use as a sampling frame t o select representative samples at
reasonable cost.
Each year i n M a y , the Survey U n i t o f E S R I acquires the complete Elec­
toral Register. A computer card is punched for each polling district (or b o o k ) ,
giving the f o l l o w i n g data:
(a)
(b)
(c)
(d)
(e)
(f)
(g)
county,
constituency,
polling district name and identification code,
District Electoral Division(s) i n t o w h i c h the polling district falls,
map reference o f one o f the DEDs i n t o w h i c h polling district falls,
number o f electors i n polling district,
(for D u b l i n ) : a ranking o f the ward ( D E D ) b y percentage o f males
in unskilled occupations (1971) Census),
(h) (for D u b l i n ) : a ranking o f the ward ( D E D ) b y percentage o f house­
holds o w n i n g cars ( 1 9 7 1 Census),
(i) (for D u b l i n ) : a ranking o f the w a r d ( D E D ) b y an overall socioeco­
nomic status index (based o n 1971 Census).
These cards are sorted b y alphabetical order o f constituency. W i t h i n each
constituency, the order o f the cards is determined i n the manner described
below under the heading " S t r a t i f i c a t i o n " . B y using these cards as an i n p u t
i n t o the R A N S A M program a national r a n d o m sample is obtained i n the
f o r m o f a set o f p o l l i n g districts (books) and the registration numbers o f
the selected electors w i t h i n these books.
I l l M A I N FEATURES OF RANSAM
I t w o u l d , o f course, be possible to select a simple r a n d o m sample (srs)
from the t w o m i l l i o n names on the Register. However, this has many dis­
advantages such as high cost, impractical scattering o f respondents and higher
sampling variances than c o u l d be achieved b y other methods o f sampling.
Samples selected b y R A N S A M incorporate three m a i n modifications w h i c h
distinguish t h e m f r o m simple r a n d o m samples: (i) multi-stage sampling, ( i i )
selection w i t h p r o b a b i l i t y p r o p o r t i o n a l to size and (iii) stratification. We n o w
consider each o f these i n t u r n .
Multi-stage Sampling
Instead o f t a k i n g a sample o f individuals f r o m the Register, R A N S A M
first selects a set o f p r i m a r y sampling units (PSUs) and then selects a random
sample o f individuals w i t h i n each o f the PSUs. The PSUs used are the polling
districts i n t o w h i c h the Electoral Register is divided. I n general, this type o f
sample w i l l be expected t o be less efficient than a srs, b u t b y concentrating
the selected individuals i n t o certain geographical areas, one can reduce costs
t o such an extent as to more than j u s t i f y the decrease i n efficiency.
T o illustrate the procedure, consider the sample for the October 1978
EEC Consumer Survey. I t was desired t o obtain 5,000 completed question­
naires. I n order t o allow for non-response o f various sorts, i t was decided t o
select 6,500 names f r o m the Electoral Register. Thus, 162 polling districts
t h r o u g h o u t the c o u n t r y were selected and 40 electors selected from each
district. Each interviewer i n the survey was allocated one or t w o o f these
districts.
2
The p o l l i n g districts vary enormously i n size — f r o m a few dozen electors
t o several thousand. A p r o b l e m m a y , therefore, arise i n small polling districts
i n that t w o or more people f r o m the same household may crop up i n the
sample. This is undesirable, b o t h f r o m the interviewer's p o i n t o f view and
because the intra-household correlation o f response is likely t o be high. (See
Cochran, 1963, p . 210, o n this problem.) R A N S A M avoids this d i f f i c u l t y b y
allowing the user t o specify a m i n i m u m size for the PSUs. A n y p o l l i n g dis­
t r i c t b e l o w this size is amalgamated w i t h a contiguous one to f o r m a com­
bined PSU greater than the required size. A t y p i c a l value o f the m i n i m u m
size w o u l d be about 5 0 0 .
Probability Proportional to Size
A n i m p o r t a n t feature o f the R A N S A M design is that i t is "epsem" (i.e.,
each element (elector) i n the p o p u l a t i o n has an equal p r o b a b i l i t y o f selec­
t i o n ) . Epsem procedures have some i m p o r t a n t advantages, n o t a b l y the fact
t h a t the estimates derived f r o m such a sample are self-weighting (see K i s h ,
1965, p p . 20-22).
I n order t o show w h y the sample selected is epsem, let us examine h o w
the sampling procedure operates. T o use the procedure one specifies
n = the (constant) number o f electors t o be chosen from each PSU,
k = the number o f PSUs t o be selected,
m = the m i n i m u m desired PSU size.
The program w i l l t h e n decide the number o f PSUs t o be selected f r o m each
constituency b y f o r m i n g a cumulative list o f the populations o f the con­
stituencies and sampling systematically f r o m a random start. W i t h i n each
constituency, i t then amalgamates polling districts b e l o w the desired m i n i ­
m u m size w i t h contiguous polling districts. N e x t , i t selects the polling dis­
tricts t o be included i n the sample b y f o r m i n g a cumulative list o f the
populations o f the polling districts and sampling systematically from a
r a n d o m start. Thus, the p r o b a b i l i t y o f a given PSU being selected is propor­
tional t o its size. For each p o l l i n g district selected, a random start and an
interval are then calculated and the registration numbers o f the n selected
electors listed. F i n a l l y , the names and addresses o f the selected electors are
f o u n d and t y p e d .
2. This means that interviewer effects are confounded with the effects attributable to the polling
districts, but cost considerations usually rule out other strategies such as randomisation of interviewers
across polling districts or the allocation of several interviewers to one district.
Thus, the p r o b a b i l i t y of selecting the i t h element o f the j t h PSU is
Pr(Ej.) = 'Pr(PSUj)- P r ( E | PSUj selected)
;j
n
n
= constant
where Nj is the size o f the j t h PSU, and n the constant number o f elements
selected from each PSU. Hence, the p r o b a b i l i t y o f selection o f each element
i n the p o p u l a t i o n is equal.
Stratification
This means that the p o p u l a t i o n is divided i n t o sub-groups or strata and a
srs is selected f r o m each stratum. This procedure w i l l almost always result i n
an improvement i n precision (Cochran, 1963, p . 9 9 ) . Stratification is incor­
porated i n R A N S A M o n l y at the first stage, all PSUs being stratified b y geo­
graphical l o c a t i o n .
I n order t o stratify the PSUs geographically, one w o u l d ideally like t o
have maps o f the polling districts. U n f o r t u n a t e l y , n o such maps are available,
so one must utilise the maps o f the DEDs available from the Ordnance
Survey. Given the lack o f one-to-one correspondence between polling dis­
tricts and DEDs, these maps o n l y give the approximate location o f the polling
districts. F u r t h e r m o r e , when one p o l l i n g district contains parts o f several
D E D s , i t is necessary t o identify each polling district uniquely w i t h one o f
the DEDs i n some w a y . The rather arbitrary rule w h i c h we adopted t o do
this was t o identify the polling district w i t h the D E D w h i c h comes alpha­
betically first. T h o u g h far from perfect, this rule allows one t o locate the
polling districts a p p r o x i m a t e l y , and this is really all that is required for the
purposes o f stratification.
The order o f the DEDs w i t h i n each constituency is determined as follows.
Each D E D is numbered f r o m 1 t o t , where t is the t o t a l number o f DEDs i n
the constituency. The numbers are allocated b y starting i n the extreme
n o r t h o f the constituency, m o v i n g west t o east, then east t o west and so o n
(see Figure 1). Ordering the polling districts b y this numbering system and
selecting a systematic random sample means that those selected are stratified
geographically across the constituency.
IV FUTURE DEVELOPMENT OF RANSAM
T o date, R A N S A M has been used t o select samples for some t w e n t y
surveys and seems to produce sample estimates w h i c h correspond well to
k n o w n data on such variables as age, sex, occupational status, etc. O n the
whole, the researchers w h o have used these samples seem satisfied w i t h t h e m .
Figure 1
Constituency boundary
D E D boundary
I t is hoped t o develop the system i n several ways over the next few years.
Work is already at an advanced stage t o l i n k the Electoral Register w i t h the
Small Area data f r o m the Census o f P o p u l a t i o n . This w o u l d p e r m i t one t o
stratify the PSUs b y a number o f variables (such as age structure, socio­
economic status, etc.) i n a d d i t i o n to the simple geographical stratification
n o w incorporated i n the system. A slightly more long-term objective is"the
calculation o f t y p i c a l standard errors and design effects for samples d r a w n
by R A N S A M . Several surveys already exist at E S R I o n w h i c h such calcula­
tions c o u l d be based and more are planned for the future.
REFERENCES
C O C H R A N , W . G . , 1 9 6 3 . Sampling
K I S H , L . , 1 9 6 5 . Survey
Sampling,
Techniques,
New Y o r k : Wiley.
New Y o r k : Wiley.