Economic and Social Review, Vol. 10, No. 2, January, 1979, pp. 169-174. Notes a n d C o m m e n t s R A N S A M : A R a n d o m Sample Design for I r e l a n d BRENDAN J. WHELAN The Economic and Social Research Institute, Dublin I INTRODUCTION T he purpose o f this paper is t o describe R A N S A M , the computer-based system for drawing r a n d o m samples f r o m the Electoral Register, w h i c h has been developed over the past few years at E S R I . This system, w h i c h is updated each year b y reference to the most recent Electoral Register, permits us t o offer a sample selection service t o those contemplating carrying o u t a survey o n a local (e.g., regional) or national level. The computer costs i n selecting samples are quite l o w , and the names and addresses can b y t y p e d at reasonable cost. Such samples have the advantage that we can ensure that the selected respondents have n o t cropped up i n any o f the surveys conducted by the E S R I i n the current year. This paper first discusses the use o f the Electoral Register as a sampling frame. I t then goes on t o describe the main features o f R A N S A M and the reasons w h y these features have been included. I t should be n o t e d that t h r o u g h o u t the paper we are concerned w i t h r a n d o m or p r o b a b i l i t y samples (i.e., samples i n w h i c h the p r o b a b i l i t y o f inclusion o f each element is calcul1 1. In designing the system, I have benefited from the ideas and experiences of several research workers who have drawn samples at the E S R I in the past ten years. The contributions by J . O'Mcara, C . A. O Muircheartaigh and R. Wiggins have been especially helpful. able). Non-random samples (e.g., quota samples) can also be based o n present selection system. For instance, we sometimes use R A N S A M select a random sample o f names and addresses as starting points for interviewers and instruct t h e m t o call o n the selected respondent and say, every t w e n t i e t h house thereafter. the to the on, I I T H E E L E C T O R A L R E G I S T E R AS A S A M P L I N G F R A M E The Electoral Register, w h i c h comes i n t o force on 15 A p r i l o f each year, lists the names and addresses o f all those eligible t o vote i n Dail elections (resident citizens) and those eligible t o vote i n local government elections (all residents). Persons eligible to vote i n local government elections only are indicated b y the letter L . The Register is compiled b y the franchise depart ments o f the c o u n t y councils and c o u n t y borough councils. The Register is published i n the f o r m o f " b o o k s " , each relating t o " p o l l ing districts", i n w h i c h the electors are numbered sequentially from 1 t o n , where n is the t o t a l number o f people i n the b o o k . The c o u n t y , constituency, and District Electoral Divisions (DEDs) i n w h i c h the polling district is located are shown. I n general, the polling districts are n o t sub-divisions o f the DEDs — one p o l l i n g district may contain parts o f several DEDs. This is n o t the case i n the urban areas o f Cork and D u b l i n where there is a one-toone correspondence between DEDs (or wards) and the p o l l i n g districts. Kish ( 1 9 6 5 , p . 53) describes a perfect sampling frame as one i n w h i c h "every element appears o n the list separately once, only once, and n o t h i n g else appears on the l i s t " . Measured against this yardstick, the Irish Electoral Register has certain deficiencies. Age Restriction: Under current law, the Register excludes those under 18 years o f age. Thus, a sample w h i c h includes persons under 18 cannot be obtained directly f r o m the Register. Missing Elements: Individuals over 18 w h o are resident i n the countrymay fail t o appear o n the Register. Most o f these are people w h o have only just reached their 1 8 t h b i r t h d a y or w h o have o n l y recently moved i n t o the district. Superfluous Elements: The Register includes the names o f some deceased electors and o f some w h o have left the district w i t h o u t cancelling their previous registration. Despite these difficulties, the Electoral Register is still the most appro priate list for use as a sampling frame t o select representative samples at reasonable cost. Each year i n M a y , the Survey U n i t o f E S R I acquires the complete Elec toral Register. A computer card is punched for each polling district (or b o o k ) , giving the f o l l o w i n g data: (a) (b) (c) (d) (e) (f) (g) county, constituency, polling district name and identification code, District Electoral Division(s) i n t o w h i c h the polling district falls, map reference o f one o f the DEDs i n t o w h i c h polling district falls, number o f electors i n polling district, (for D u b l i n ) : a ranking o f the ward ( D E D ) b y percentage o f males in unskilled occupations (1971) Census), (h) (for D u b l i n ) : a ranking o f the ward ( D E D ) b y percentage o f house holds o w n i n g cars ( 1 9 7 1 Census), (i) (for D u b l i n ) : a ranking o f the w a r d ( D E D ) b y an overall socioeco nomic status index (based o n 1971 Census). These cards are sorted b y alphabetical order o f constituency. W i t h i n each constituency, the order o f the cards is determined i n the manner described below under the heading " S t r a t i f i c a t i o n " . B y using these cards as an i n p u t i n t o the R A N S A M program a national r a n d o m sample is obtained i n the f o r m o f a set o f p o l l i n g districts (books) and the registration numbers o f the selected electors w i t h i n these books. I l l M A I N FEATURES OF RANSAM I t w o u l d , o f course, be possible to select a simple r a n d o m sample (srs) from the t w o m i l l i o n names on the Register. However, this has many dis advantages such as high cost, impractical scattering o f respondents and higher sampling variances than c o u l d be achieved b y other methods o f sampling. Samples selected b y R A N S A M incorporate three m a i n modifications w h i c h distinguish t h e m f r o m simple r a n d o m samples: (i) multi-stage sampling, ( i i ) selection w i t h p r o b a b i l i t y p r o p o r t i o n a l to size and (iii) stratification. We n o w consider each o f these i n t u r n . Multi-stage Sampling Instead o f t a k i n g a sample o f individuals f r o m the Register, R A N S A M first selects a set o f p r i m a r y sampling units (PSUs) and then selects a random sample o f individuals w i t h i n each o f the PSUs. The PSUs used are the polling districts i n t o w h i c h the Electoral Register is divided. I n general, this type o f sample w i l l be expected t o be less efficient than a srs, b u t b y concentrating the selected individuals i n t o certain geographical areas, one can reduce costs t o such an extent as to more than j u s t i f y the decrease i n efficiency. T o illustrate the procedure, consider the sample for the October 1978 EEC Consumer Survey. I t was desired t o obtain 5,000 completed question naires. I n order t o allow for non-response o f various sorts, i t was decided t o select 6,500 names f r o m the Electoral Register. Thus, 162 polling districts t h r o u g h o u t the c o u n t r y were selected and 40 electors selected from each district. Each interviewer i n the survey was allocated one or t w o o f these districts. 2 The p o l l i n g districts vary enormously i n size — f r o m a few dozen electors t o several thousand. A p r o b l e m m a y , therefore, arise i n small polling districts i n that t w o or more people f r o m the same household may crop up i n the sample. This is undesirable, b o t h f r o m the interviewer's p o i n t o f view and because the intra-household correlation o f response is likely t o be high. (See Cochran, 1963, p . 210, o n this problem.) R A N S A M avoids this d i f f i c u l t y b y allowing the user t o specify a m i n i m u m size for the PSUs. A n y p o l l i n g dis t r i c t b e l o w this size is amalgamated w i t h a contiguous one to f o r m a com bined PSU greater than the required size. A t y p i c a l value o f the m i n i m u m size w o u l d be about 5 0 0 . Probability Proportional to Size A n i m p o r t a n t feature o f the R A N S A M design is that i t is "epsem" (i.e., each element (elector) i n the p o p u l a t i o n has an equal p r o b a b i l i t y o f selec t i o n ) . Epsem procedures have some i m p o r t a n t advantages, n o t a b l y the fact t h a t the estimates derived f r o m such a sample are self-weighting (see K i s h , 1965, p p . 20-22). I n order t o show w h y the sample selected is epsem, let us examine h o w the sampling procedure operates. T o use the procedure one specifies n = the (constant) number o f electors t o be chosen from each PSU, k = the number o f PSUs t o be selected, m = the m i n i m u m desired PSU size. The program w i l l t h e n decide the number o f PSUs t o be selected f r o m each constituency b y f o r m i n g a cumulative list o f the populations o f the con stituencies and sampling systematically f r o m a random start. W i t h i n each constituency, i t then amalgamates polling districts b e l o w the desired m i n i m u m size w i t h contiguous polling districts. N e x t , i t selects the polling dis tricts t o be included i n the sample b y f o r m i n g a cumulative list o f the populations o f the polling districts and sampling systematically from a r a n d o m start. Thus, the p r o b a b i l i t y o f a given PSU being selected is propor tional t o its size. For each p o l l i n g district selected, a random start and an interval are then calculated and the registration numbers o f the n selected electors listed. F i n a l l y , the names and addresses o f the selected electors are f o u n d and t y p e d . 2. This means that interviewer effects are confounded with the effects attributable to the polling districts, but cost considerations usually rule out other strategies such as randomisation of interviewers across polling districts or the allocation of several interviewers to one district. Thus, the p r o b a b i l i t y of selecting the i t h element o f the j t h PSU is Pr(Ej.) = 'Pr(PSUj)- P r ( E | PSUj selected) ;j n n = constant where Nj is the size o f the j t h PSU, and n the constant number o f elements selected from each PSU. Hence, the p r o b a b i l i t y o f selection o f each element i n the p o p u l a t i o n is equal. Stratification This means that the p o p u l a t i o n is divided i n t o sub-groups or strata and a srs is selected f r o m each stratum. This procedure w i l l almost always result i n an improvement i n precision (Cochran, 1963, p . 9 9 ) . Stratification is incor porated i n R A N S A M o n l y at the first stage, all PSUs being stratified b y geo graphical l o c a t i o n . I n order t o stratify the PSUs geographically, one w o u l d ideally like t o have maps o f the polling districts. U n f o r t u n a t e l y , n o such maps are available, so one must utilise the maps o f the DEDs available from the Ordnance Survey. Given the lack o f one-to-one correspondence between polling dis tricts and DEDs, these maps o n l y give the approximate location o f the polling districts. F u r t h e r m o r e , when one p o l l i n g district contains parts o f several D E D s , i t is necessary t o identify each polling district uniquely w i t h one o f the DEDs i n some w a y . The rather arbitrary rule w h i c h we adopted t o do this was t o identify the polling district w i t h the D E D w h i c h comes alpha betically first. T h o u g h far from perfect, this rule allows one t o locate the polling districts a p p r o x i m a t e l y , and this is really all that is required for the purposes o f stratification. The order o f the DEDs w i t h i n each constituency is determined as follows. Each D E D is numbered f r o m 1 t o t , where t is the t o t a l number o f DEDs i n the constituency. The numbers are allocated b y starting i n the extreme n o r t h o f the constituency, m o v i n g west t o east, then east t o west and so o n (see Figure 1). Ordering the polling districts b y this numbering system and selecting a systematic random sample means that those selected are stratified geographically across the constituency. IV FUTURE DEVELOPMENT OF RANSAM T o date, R A N S A M has been used t o select samples for some t w e n t y surveys and seems to produce sample estimates w h i c h correspond well to k n o w n data on such variables as age, sex, occupational status, etc. O n the whole, the researchers w h o have used these samples seem satisfied w i t h t h e m . Figure 1 Constituency boundary D E D boundary I t is hoped t o develop the system i n several ways over the next few years. Work is already at an advanced stage t o l i n k the Electoral Register w i t h the Small Area data f r o m the Census o f P o p u l a t i o n . This w o u l d p e r m i t one t o stratify the PSUs b y a number o f variables (such as age structure, socio economic status, etc.) i n a d d i t i o n to the simple geographical stratification n o w incorporated i n the system. A slightly more long-term objective is"the calculation o f t y p i c a l standard errors and design effects for samples d r a w n by R A N S A M . Several surveys already exist at E S R I o n w h i c h such calcula tions c o u l d be based and more are planned for the future. REFERENCES C O C H R A N , W . G . , 1 9 6 3 . Sampling K I S H , L . , 1 9 6 5 . Survey Sampling, Techniques, New Y o r k : Wiley. New Y o r k : Wiley.
© Copyright 2024