Sampling Design, Sample Size, and Their Importance

Sampling Design,
Sample Size, and
Their Importance
Prof. Bhisma Murti, dr, MPH, MSc, PhD
Institute of Health Economic and Policy Studies (IHEPS),
Department of Public Health, Faculty of Medicine,
Universitas Sebelas Maret
Types of Population
•
•
•
•
Target population is the
population a researcher wants to
make inference about
Source population (accessible
population) is a subset of the target
population that is accessible to the
researcher, from which the samples
are drawn.
Study sample is a group of
subjects chosen from the source
population for study to represent the
target population
External population is the
population larger than the target
population that the researcher may
still want to generalize results
Target
population
Internal
Validity
External
population
Source
population
Sampling
Statistical
inference
Sample
External
Validity
Internal Validity and
External Validity
• Internal validity refers to the
extent to which the sample
estimate reflects the true value
of the association/ effect under
study in the target population
• External validity refers to the Internal
Validity
extent to which the sample
estimate is generalizable to the
(larger) external population. The
internal validity is a prerequisite
for the external validity
Target
population
External
population
Source
population
Sampling
Statistical
inference
Sample
External
Validity
What is Sampling and Why
• Sampling is the selection of a
subset of individuals from within
a population to estimate
characteristics of the whole
population, e.g.
– Prevalence of tuberculosis
– The relationship between smoking
and the risk of stroke
• Researchers rarely study the
entire population because the
cost of a census is too high.
Properties of a Good Research
•
•
•
A good research is one
that makes a valid,
precise, and consistent
estimate of characteristics
or difference/
association/ effect of
variables under study in
the population
The validity of a study is
inversely related to the
degree of systematic
error.
The precision and
consistency of an
estimate are inversely
related to the degree of
random error
Validity
Validity
Systematic Error
• A systematic error or bias occurs
when there is a deviation between
the true value (in the target
population) and the observed value
(in the study sample)
• A systematic error results from an
error in the selection of sample
(selection bias), faulty
measurement of variables
(information bias), and/ or mixed
effect by a third variable
(confounding factor)
Random Error
• Random error occurs due to random variation
in sampling and/ or measurement of
variables
• Random error is always present in a
measurement. It is caused by inherently
unpredictable fluctuations in measuring the
variables under study.
• The distribution of random errors follows a
Gaussian-shape "bell" curve. They are
scattered about the true value, and tend to
have null value when a measurement is
repeated several times with the same
instrument.
• Therefore increasing sample size can reduce
random error.
Systematic Error
The observed
values of the
characteristics in
the sample
Per Cent
14
12
10
The true values
of the
characteristics in
the target
population
8
6
4
2
0
0
5
10
15
20
Size of induration, mm
25
30
Random Error
The true
values of the
characteristics
in the target
population
Per Cent
14
12
10
The observed
values of the
characteristics
in the sample
8
6
4
2
0
0
5
10
15
20
25
Size of induration, mm
30
35
Why is Sampling Design
Important?
• Incorrect selection of a
sample leads to bias
estimate of a study
• Analysis of data from a
sample that is biased
or unrepresentative to
population will result
in wrong conclusion
about the
characteristics of the
population
Why is Sample Size Important?
• Choosing a sample size
that is too small may not
give a statistically
significant conclusion nor
precise estimate about
difference/ relationship/
effect of the variables
under study
• Too large a sample size is
wasteful and sometimes
impossible to complete.
Valid,
Not valid,
Valid,
Not valid,
Sample Size, Systematic Error,
and Random Error
• The larger sample size,
the smaller random error
• But sample size does not
affect systematic error Systematic error,
• Larger sample size does random error
not reduce systematic
error
• Systematic error is more
serious than random
error, as it cannot be
corrected by increasing
sample size
Random error
Systematic error
Sample size
Sample Size and Random Error
(Sampling Error, Margin of Error)
Larger sample size
reduces random
variation, therefore
increases precision
Sampling Design
• Random sampling:
– Simple random sampling
– Stratified random sampling
– Cluster random sampling
• Non-random sampling:
A. Convenient sampling
B. Purposive (judgmental )
sampling:
•
•
Fixed disease sampling
Fixed exposure sampling etc.
Types of Random Sampling
• Random sampling is a sampling
method in which all member of a
population (universe) have a known
and independent chance of being
selected.
• Simple random sampling is a
sampling method in which all
member of a population have an
equal chance of being selected.
• Stratified random sampling selects
independent samples at random
from subpopulations, groups or
strata within the population.
• Cluster (random) sampling selects
the sample units at random in groups
(called cluster, eg. neighborhood).
Choose groups
(cluster) at
random
Study all
members of the
groups selected
Types of Non-Random Sampling
• Purposive sampling uses expert
judgment to select a sample that
adequately represents the target
population on factors that might
influence the population: e.g.
socio-economic status,
intelligence, access to education,
environmental factors, etc.
• Convenience sampling is a nonprobability sampling technique
where subjects are selected
because of their convenient
accessibility and proximity to the
researcher. This sampling design
is poor, it very unlikely gives a
representative sample
Fixed Exposure Sampling and
Fixed Disease Sampling
•
•
Fixed exposure sampling selects a
fixed number of subjects from each
exposure category (exposed and nonexposed groups). This design is
primary used in a cohort study, but
can also be used in a cross-sectional
study
Fixed disease sampling select a fixed
number of subjects from each disease
category (case and control groups).
This design is primary used in a case
control study, but can also be used in
a cross-sectional study. Since cases are
rare, it will be efficient to include all
available cases for the study, while
subjects in the control group can be
selected at random from the available
non-diaseased population
Minimum Sample Size Formulas
• Formula for Testing/
Estimating One Population:
1. Mean
2. Proportion
3. Correlation coefficient
• Formula for Testing/
Estimating Two Populations:
1. Difference in Two (or More)
Population Means
2. Difference in Two (or More)
Population Proportion
Examples of Sample Size Formula
• Sample size for a study that tests proportion
difference between two (or more) populations:


Z
n
1α/2
  
2 P 1 P  Z1β P1 1 P1  P2 1 P2 
P P 

2
2
1
2
• Sample size for a study that tests mean difference
between two (or more) populations:
n
2σ
2
Z
1α/2
 Z1β
μ1 μ 2 2

2
Determinants of a Sample Size
Estimation
•
Minimum sample size calculated by any formula is only a statistical
estimate. It is dependent on the researcher’s choice of acceptable
random error and on findings from previous studies. Time, cost, and
ethics should also be considered.
• The researcher’s choice of acceptable random
error:
1.
2.
3.
Tipe I error (α). Arbritary, but conventional
choice: α= 0.05
Type 2 Error (β) or statistical power (1- β).
Arbritary, but conventional choice: β = 0.20
Degree of precision or margin of error (e.g.
+/- 5%)
• Findings from previous or preliminary studies:
1.
2.
3.
Difference in population means and their
variances
Difference in population proportions
Correlation coeficient from one population
Using Statistical Program to
Calculate Minimum Sample Size
Use of
OpenEpi to
calculate
sample size
Final Words: Important Reminder
• The sample should be selected by
correct (unbiased) sampling design so
that it accurately represents the
population. Incorrect sampling design
will cause systematic error, which leads
to an estimate of the characteristics or
the association/ effect of variables in
the population that is not valid.
• The sample size should be large
enough to achieve statistically
significant results (i.e. consistency) and
precise estimate. Small sample size will
increase random error, therefore will
cause non-statistically significant and
imprecise results.