APPENDIX A SAMPLE DESIGN

APPENDIX A
SAMPLE DESIGN
APPENDIX A
SAMPLE DESIGN
Thanh Lê
A.1
Introduction
The Kyrgyz Republic Demographic and Health Survey (KRDHS) covers the population residing
in private households in the country. The design for the KRDHS calls for a representative probability
sample of approximately 4,000 completed individual interviews with women between the ages of 15 and
49. It was designed principally to produce reliable estimates of demographic rates (particularly fertility
and childhood mortality rates), of maternal and child health indicators, and of contraceptive knowledge
and use for the country as a whole, the urban and the rural areas separately, and for four survey regions
as follows:
Survey Region 1:
Survey Region 2:
Survey Region 3:
Survey Region 4:
A.2
Bishkek City
Issyk-Kulskaya, Chuiskaya and Talasskaya oblasts
Narynskaya oblasts
Oshskaya and Dzhelal-Abadskaya oblasts
Sampling Frames
In the urban areas, the sampling frame was is the list of therapeutical uchastoks1 collected by the
Institute of Obstetrics and Pediatrics. However the list of uchastoks only existed for main cities and not
for small towns. For small towns, each town had been divided into segments of equal size, around 2 000
population each, and these segments had been treated as if they were uchastoks. The actual segmentation
of each town, when it fell into the sample, was done in the field. In the rural areas, the sampling frame
was the list of villages in the whole country.
A.3
Characteristics of the KRDHS Sample
The sample for the KRDHS was selected in two stages. In the urban areas, the primary sampling
units, selected in the first sampling stage, corresponded to the uchastok. Large uchastoks that were
selected into the sample were divided in the field into smaller segments, only one of which was selected
for the survey. A complete listing of the households residing in each selected segment was carried out.
The lists of households obtained was used as the frame for second-stage sampling, which was the
selection of the households to be visited by the KRDHS interviewing teams during the main survey
fieldwork. Women between the ages of 15 and 49 were identified in these households and interviewed.
In the rural areas, the first stage sampling units were the villages. Very large villages (with 400
households or more) that had been selected into the sample were divided in the field into smaller
segments, and one segment was selected prior to the household listing operation which provided the
household lists for the second-stage selection of households.
1
Each city is divided into therapeutic uchastoks, each of which is the responsibility of one
physician. People living in the uchastok would go to a designated health center for service. This is
where the physician in charge is located and maintains a map of the uchastok.
139
A.4
Sample Allocation
Tables A.1 and A.2 show the distribution of the population in the Kyrgyz Republic in the
different survey regions, as of January 1997, according to the National Statistical Committee.
Table A.1 Population of the Kyrgyz Republic, by urban-rural residence, 1997
Survey region
Urban
Rural
Total
Bishkek City
Survey Region 2
Survey Region 3
Survey Region 4
596,200
335,200
55,800
581,100
3,100
1,053,000
207,300
1,742,400
599,300
1,388,200
263,100
2,323,500
Kyrgyz Republic
1,568,300
3,005,800
4,574,100
Table A.2 Percent distribution of the population, by urban-rural residence, 1997
Survey region
Urban
Rural
Total
Bishkek City
Survey Region 2
Survey Region 3
Survey Region 4
99.5
24.1
21.2
25.0
0.5
75.9
78.8
75.0
13.1
30.3
5.8
50.8
Kyrgyz Republic
34.3
65.7
100.0
Table A.3 Proportional sample allocation by urban-rural residence
Survey region
Urban
Rural
Total
Bishkek City
Survey Region 2
Survey Region 3
Survey Region 4
521
293
49
508
3
921
181
1,524
524
1,214
230
2,032
Kyrgyz Republic
1,371
2,629
4,000
The survey regions, stratified by urban and rural areas, were the sampling strata. There were
thus 7 strata with Bishkek City constituting an entire urban stratum as it had been decided that the
minuscule rural population of Bishkek would be included in the city as well. A proportional allocation of
the target number of 4,000 women to the 7 strata would yield the following sample distribution in Table
A.3:
The proportional allocation above would result in a completely self-weighting sample but would
not allow for reliable estimates for two of the four survey regions: Bishkek (Survey Region 1) and Naryn
140
(Survey Region 3). Results of other demographic and health surveys show that a minimum sample of
800-1,000 women is required in order to obtain estimates of fertility and childhood mortality rates at an
acceptable level of sampling errors. Given that the total sample size for the KRDHS could not be
increased to achieve the required level of sampling errors, it was decided to divide the sample to the four
regions as shown in Table A.4. Within each region, it was distributed approximately proportionally to
the urban and the rural areas.
Table A.4 Proposed sample allocation by urban-rural residence
Survey region
Urban
Rural
Total
Bishkek City
Survey Region 2
Survey Region 3
Survey Region 4
1 000
241
170
300
759
630
900
1,000
1,000
800
1,200
Kyrgyz Republic
1, 711
2,289
4,000
The number of sample points (or clusters) to be selected for each stratum was calculated by
dividing the number of women in the stratum by the average take in the cluster. Each cluster corresponds
to a segment of an uchastok, a village or a segment of a village. Analytical studies of surveys of the same
nature suggest that the optimum number of women to be interviewed is around 20-25 in each urban
cluster and 30-35 in each rural cluster. If on average 20 women were to be interviewed in each urban
cluster and 30 women in each rural cluster, the distribution of sample points would be as shown in Table
A.5.
Table A.5 Number of sample points by urban-rural residence
Survey region
Urban
Rural
Total
Bishkek City
Survey Region 2
Survey Region 3
Survey Region 4
50
12
9
15
25
21
30
50
37
30
45
Kyrgyz Republic
86
76
162
While examining these figures, it was noticed that because of rounding errors, the number of
clusters in Survey Region 2 would yield a slightly smaller number of women than expected. The number
of clusters were then rearranged in each stratum so that (1) it was an even number, but in such a way that
(2) the regional sample size would not fall short of the proposed size in Table A.5. The even number of
clusters is recommended for the purpose of calculating sampling errors in which the first step is to form
pairs of homogeneous clusters.
141
Table A.6 Proposed number of sample points by urban-rural residence
Survey region
Urban
Rural
Total
Bishkek City
Survey Region 2
Survey Region 3
Survey Region 4
50
12
10
14
26
20
30
50
38
30
44
Kyrgyz Republic
86
76
162
Table A.7 shows the estimated number of women with completed interviews in the selected
clusters.
Table A.7 Expected number of women by urban-rural residence
Survey region
Urban
Rural
Total
Bishkek City
Survey Region 2
Survey Region 3
Survey Region 4
1 ,000
240
200
280
780
600
900
1,000
1,020
800
1,180
Kyrgyz Republic
1 ,720
2,280
4,000
The number of households to be selected for each stratum was calculated as follows:
The estimated number of women 15-49 per household according to the 1989 census is shown in
Table A.8.
Table A.8 Estimated average number of women age 15-49 per household by urban-rural
residence
Survey region
Urban
Rural
Total
Bishkek City
Survey Region 2
Survey Region 3
Survey Region 4
1.0
1.1
1.4
1.3
1.0
1.1
1.3
1.3
1.0
1.1
1.3
1.3
Kyrgyz Republic
1.1
1.2
1.2
142
The overall response rate was assumed to be 90 percent (95 percent for households and 95
percent for women), which is the average overall response rate found in other surveys implemented in the
Central Asian Republics. Using these two parameters in the above equation, we would expect to select
approximately 3,800 households in order to yield the target sample of women. The average number of
households to be selected in each cluster is shown in Table A.9 for the different strata.
A.5
Stratification and Systematic Selection of Uchastoks and Villages
In the urban areas, stratification of the uchastoks was geographic. Within each sampling stratum,
the oblasts, then cities and towns were ordered geographically, and the uchastoks were selected with
probabilities proportional to size, the size being the estimated population in the uchastoks (the uchastok
population reported in the list of uchastoks is the population of adults, 15 years and older, which
represents about 69.5 percent of the total population residing in the urban areas).
Within each stratum, the selection procedure was as follows:
1.
Calculate the selection interval for the uchastoks as follows:
where EMi is the size of the stratum (total population in the stratum according to the sampling
frame) and a is the number of uchastoks to be selected in the stratum.
2.
Calculate the cumulated size of each uchastok.
3.
Calculate the series of sampling numbers R, R+I, R+2I, ..., R+(a-1)I, where R is a
random number between 1 and I.
4.
Compare each sampling number with the cumulated sizes.
The first uchastok to be selected was the first uchastok on the list whose cumulated size was
greater or equal to the first sampling number. The second uchastok to be selected was the next
uchastok on the list (after the first selected one) whose cumulated size was greater or equal to the
second sampling number, and so on.
In the rural areas, stratification of the oblasts and raions was geographic, but stratification of the
villages within the raions was by village population size. This was to ensure that not all large villages
entered the sample, since the larger the village, the larger the probability of it being selected (result of
selection with probabilities proportional to size).
A.6
Segmentation of Large Uchastoks and Villages
Uchastoks and villages could be very large in size. If a large uchastok/village was selected, it
would require enormous time and effort to list the households it contained. An upper limit of 400
households was imposed to the size of the uchastok/village. Therefore, any selected uchastok/village that
exceeded this upper limit was segmented into several segments, only one of which was retained for the
survey. Segmentation was done in the field during the mapping and household listing.
143
A.7
Sampling Probabilities
The sampling probabilities were calculated separately for each sampling stage, and
independently for each stratum. The following notations were used:
P1h:
P2h:
First-stage sampling probability (uchastoks, or villages).
Second-stage sampling probability (households).
Let ah be the number of uchastoks selected in stratum h, Mhi the size (population according to the
sampling frame) of the ith uchastok in the stratum, and EMhi the total size of the stratum (population
according to the sampling frame). The probability of inclusion of the ith uchastok in the sample was
calculated as follows:
An intermediate sampling stage was introduced between the first and second sampling stage.
This selection stage is not considered an effective stage but only a pseudo-stage in order to reduce the
size of the uchastok. Let thij be the estimated size (in proportion) of the jth segment selected for the ith
uchastok. Note that E thij = 1. The sampling probabilities are:
In the second stage, a number bhij of households were selected from the number M’hij
of households newly listed in the jth segment of the ith uchastok by the KRDHS teams. We then have
For the sample to be self-weighting within the stratum, the overall probability
fh = P1hi . P’1hij . P2hij must be the same for each household within the stratum, where fh is the sampling
fraction calculated separately for stratum h:
where nh is the number of households selected in stratum h, and Nh is the number of households that exist
in stratum h in 1997.
The selection of the households was systematic with equal probability and the selection interval
was calculated as follows:
144
In the rural areas, the calculations of the selection probabilities for the different stages of
sampling were the same as for the uchastoks, with villages equivalent to uchastoks.
Because of the non-proportional distribution of the sample to the different strata, sampling
weights were required to ensure the actual representativity of the sample at the national level.
Table A.9 Average number of households to be selected in each cluster
by urban-rural residence
Survey region
Bishkek City
Survey Region 2
Survey Region 3
Survey Region 4
145
Urban
Rural
22
20
16
17
30
26
26