Sample Size Determination Statistics for Oral Health Professionals L11: Sample Size

Statistics for Oral Health Professionals
L11: Sample Size
Sample Size Determination
by
Lin Naing
School of Dental Sciences
Universiti Sains Malaysia
1
Sample Size Calculation
for Estimation
2
School of Dental Sciences
Page 1
Statistics for Oral Health Professionals
L11: Sample Size
Estimating a mean
• A study is planned to estimate the knowledge
level among adults in Kampong X.
• The result should be reported as "mean
knowledge score and its 95% CI".
e.g. mean Kscore 10.14 units (95% CI: 9.29, 10.99)
10.99
The value of a study can be judged by the width
of Confidence Interval.
Wide CI means .. a poor study.
3
The value of a study can be judged by the
width of Confidence Interval.
CI of µ = x ± Z * (
σ
)
n
This will determine the width of CI.
5 ± 1 = (4, 6)
We call this part as “precision” ∆.
5 ± 2 = (3, 7)
2∆ = Width of CI
∆ = Z * (σ / n )
n = Z * (σ / ∆ )
Z *σ
n=

 ∆ 
2
4
School of Dental Sciences
Page 2
Statistics for Oral Health Professionals
L11: Sample Size
2
Z *σ
To estimate knowledge score
n=

 ∆ 
If we plan for 95% confidence (5% error), so Z = 1.96;
And SD (σ ) is estimated as 4.3 (K score) (either by
previous study or a pilot study; if previous study, state the
reference)
Impossible to check for
normality assumption
Now, it is the researcher decision to select which sample size
will be appropriate for the study.
5
How to report? (in Methodology)
• Sample size was determined as follows.
• The following formula (Daniel, 1999) is used to calculate
the sample size for objective 1 (to determine K level).
Z *σ
n=

 ∆ 
Z = 1.96 for 95% confidence
2
σ = SD of Kscore = 4.3 (Brian, 2002??)
∆ = Precision = 1 unit
2
 1.96 * 4.3 
n=
 = 72
1


• We need 72 people in order to estimate the mean K
score with the precision of 1 unit.
• We decided to take 87 people (additional 20%) for
anticipated non-response cases.
6
School of Dental Sciences
Page 3
Statistics for Oral Health Professionals
L11: Sample Size
Estimating a Proportion
• A study is planned to estimate the
prevalence of Pods. in Kampong X.
• The result should be reported as
Prevalence (Proportion) of POds. and
its 95% CI".
In our example data,
we get 37% (95% CI: 27%, 47%).
7
2
Z
n =  | * P (1 − P)
∆
If we plan for 95% confidence (5% error), Z = 1.96,
and P is estimated as 40% (Prevalence of POds.)
(Literature or Pilot study).
Relationship between P & Sample Size
800
700
Sample Size
600
500
400
300
200
100
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0
P
8
School of Dental Sciences
Page 4
Statistics for Oral Health Professionals
L11: Sample Size
2
Z
n =  | * P (1 − P)
∆
If we plan for 95% confidence (5% error), Z = 1.96,
and P is estimated as 40% (Prevalence of POds)
(Literature or Pilot study).
With this sample size, you can "expect" the result as ...
e.g. Prevalence 40% (95% CI: 37.5, 42.5)
2.5% precision
This is considered a very good precision but we need a
sample size of 1,476 people to achieve this. Maybe
IMPOSSIBLE!!!
9
2
Z
n =  | * P (1 − P)
∆
If we plan for 95% confidence (5% error), Z = 1.96,
and P is estimated as 40% (Prevalence of POds)
(Literature or Pilot study).
Now, it is the
researcher
decision to
select which
sample size will
be appropriate
for the study.
10
School of Dental Sciences
Page 5
Statistics for Oral Health Professionals
L11: Sample Size
2
Z
n =  | * P (1 − P)
∆
Z *σ
n=

 ∆ 
2
Setting the level of confidence is conventional at 95% (Z = 1.96).
P or SD is estimated by the literature or pilot study.
The remaining question is "HOW TO DECIDE THE PRECISION?".
(1) Generally, smaller precision is better.
(2) However, commonly, researchers are limited with
the availability of resources .
(3) It may depend on previous studies:
- In case of the first study, a relatively wide CI is
still considered valuable.
- Previous studies have reported with a certain
width of CIs in their studies. Somehow, if we
want to repeat the study, we should come out
with a better width of CI (added value).
11
Sample Size Calculation
for Hypothesis Testing
Important Concepts
1. Type I (α error)
2. Type II (β error) / Power of the Study (1-β)
3. Detectable Difference (Detectable
Alternative)
12
School of Dental Sciences
Page 6
Statistics for Oral Health Professionals
L11: Sample Size
Two types of Error
The woman is
NOT pregnant.
But the test is
‘positive’.
In reality, no
association.
False
Positive
Your result
gives ‘sig
association’.
False
positive
Alpha error
(Type I)
Allow 0.05 (5%).
The woman is
pregnant.
But the test is
‘negative’.
False
negative
In reality,
there is
association.
Your result
gives ‘no sig.
association’
False
Negative
Beta error
(Type II)
Allow 0.2 (20%)
13
Two types of Error
In reality, say there is no difference between male and
female.
But your result is ..
The mean K score is significantly different between
two groups (P=0.009).
This is Type I or alpha error.
In reality, say there is a difference between 2 groups.
But your result is ..
The mean K score is not significantly different
between two groups (P=0.234).
This is Type II or beta error.
It means that even though there is a difference, you do
not have enough ‘power’ to prove it.
14
School of Dental Sciences
Page 7
Statistics for Oral Health Professionals
L11: Sample Size
Power of the study
The woman is
pregnant.
But the test is
‘negative’.
False
negative
In reality,
there is
association.
Your result
gives ‘no sig.
association’
False
Negative
Beta error
(Type II)
Allow 0.2 (20%)
Let’s say, our study has β error 20%.
20%
It means that there is 20% chance that we will get ‘no
no sig.
association’
association even though there is an association in reality.
In other words, there is 80% chance that we will get ‘sig.
sig.
association’
association if there is an association in reality.
It means that our study has power 80%.
80%
Power: Power to achieve ‘sig.’ result if there is truly association.
15
Important Concepts
Detectable Difference
1.What is Detectable Difference?
2.How to decide the Detectable
Difference?
16
School of Dental Sciences
Page 8
Statistics for Oral Health Professionals
L11: Sample Size
Important Concepts
What is Detectable Difference (Detectable Alternative)?
• The “minimum size of the difference between groups”
that the study could detect !!!
• "The study could detect" means ...
Let's say, you are comparing means of 2 groups,
and in reality, the 2 means are truly different.
And also at the end of the study, you get the result
as "two group means are significantly different"
(one is more than the other).
It means that "you detect the difference".
Let's say, you get the result "the difference is not
significant" ... meaning that "you fail to detect it".
17
Important Concepts
What is Detectable Difference (Detectable Alternative)?
60.0 Kg
≈
60.1 Kg
60.0 Kg
<
60.5 Kg
18
School of Dental Sciences
Page 9
Statistics for Oral Health Professionals
L11: Sample Size
Important Concepts
What is Detectable Difference (Detectable Alternative)?
19
Comparing means of two (2) POPULATIONS
µ = 15.00
µ = 15.01
A
B
Type I Type II
α
For Alpha 0.05, Zα/2= 1.96
For Beta 0.2, Zβ= 0.84
σ (SD) from previous study or pilot study.
DD
β
2σ 2 (Zα/2 + Z β )
2
n=
∆2
How researchers should decide DD (∆)?
20
School of Dental Sciences
Page 10
Statistics for Oral Health Professionals
L11: Sample Size
Comparing means of two (2) POPULATIONS
How to decide Detectable Difference?
Difference
It should reflect the “Clinically
Significant Difference” (CSD).
We should be able to detect the
“CSD”.
In other words, we should design a
study to detect CSD.
21
Comparing means of two (2) POPULATIONS
How to decide "Detectable
Difference“ or CSD?
The 2 means in this example (15.00
versus 15.01 units) is different
numerically.
However, this difference will not be
considered as important difference
by any reasonable person.
In other words, 15.00 versus 15.01 unit is NOT a meaningful
difference or NOT a practically/clinically important difference.
Then, when will you call the "meaningful difference" or "practically/
clinically important difference"?
15.0 versus 15.1?
15.0 versus 15.2?
15.0 versus 16.0?
15.0 versus 16.5?
15.0 versus 15.5?
15.0 versus 17.0?
22
School of Dental Sciences
Page 11
Statistics for Oral Health Professionals
L11: Sample Size
Comparing means of two (2) POPULATIONS
How to decide "Detectable
Difference“ or CSD?
Let's say, researchers
consider that 15.0 versus 18.0
is an important difference (3
units). The difference of less
than 3 is considered 'not
important'.
Then, we should set the study in order to detect the
difference of 3 units ... meaning that DD should be set at 3
units.
WHO can make this decision? Experts who know well
about the importance of the level of Kscore. These
experts are researchers who plan this study.
23
Using PS software ….
• Comparing 2 means
• Comparing 2 proportions
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize
24
School of Dental Sciences
Page 12
Statistics for Oral Health Professionals
L11: Sample Size
Using PS software ….
• Comparing 2 means
Researchers want to compare K score
between male and female college students.
Objective: To compare mean K score
between male and female students
2σ 2 (Zα/2 + Z β )
2
n=
∆2
Don’
Don’t use
formula.
This is for
explanation.
Alpha (0.05)
Power (start with 80%)
σ = SD (within group SD of K score)
∆ = Detectable Difference (Clinically important
difference)
25
1
2
3
4. Fill all 5 inputs
5
Detectable Difference
SD from other study or pilot study
With the
sample size
33 in each
group, we
will achieve
80% power
to detect the
difference of
3 units (K
score) with
the Alpha at
0.05.
Ratio between 2 groups (m=1 means 1:1)
26
School of Dental Sciences
Page 13
Statistics for Oral Health Professionals
L11: Sample Size
With the sample size 33 in each group,
we will achieve 80% power
to detect the difference of 3 units of K score
with the Alpha at 0.05.
Example:
Say, in reality, the difference is 5 units between
male and female.
With this sample size, you have at least 80%
chance to get the ‘significant’ or ‘positive’ result.
(You have at least 80% power to reject the Null).
Say, if the difference is 1 unit only. So, this
sample size will fail to detect this difference. But
it’s OK, we don’t want to detect this small diff. It
is not clinically/practically important.
27
IN SUMMARY, for comparing 2 means
We need to decide ….
Alpha (0.05; consensus – 0.05)
Power (80%=0.8)
SD (variable of interest – from previous study or pilot study)
Detectable Difference (should reflect clinical/practical importance)
Ratio of sample size between 2 groups (m = 1 “1:1”; m=2 “2:1”)
--------------------------------------------------------------------------
How to report?
We use PS software (Dupont & Plummer, 1997) to calculate the
sample size based on comparing two means.
To detect the difference of 3 units (of K score) with 80% power and
alpha 0.05, we need 33 students in each study group (SD was
estimated as 4.3, reference??).
We have decided to take 40 male and 40 female students
(additional 20%) with the anticipation of some non-responses.
Reference:
program
m available for free on
Dupont WD and Plummer WD (1997). PS power and sample size progra
the Internet. Controlled Clin Trials, 18:274.
Available at http:// biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize
28
School of Dental Sciences
Page 14
Statistics for Oral Health Professionals
L11: Sample Size
• Researchers want to compare the SBP between
treated and untreated hypertensive patients.
• A recent study revealed that the SD of SBP among
hypertension patients was 10 mmHg (state ref.??).
• The researchers feel that it is important to detect the
difference of 5 mmHg between 2 study groups.
• They plan to take equal sample size (1:1) for 2 study
groups (m=1).
• Set alpha at 0.05 as usual.
• Calculate the sample size to achieve the power of
80%.
Exercise
29
• Researchers want to compare the SBP between
treated and untreated hypertension patients.
• SD was 10 mmHg (state reference??).
• DD sets at 5 mmHg.
• They plan to take 1:2 ratio for untreated:
untreated: treated
(m=2) (because difficult to find ‘untreated’).
• Set alpha at 0.05 as usual.
• Calculate the sample size to achieve the power of
80%.
Sample size that
software gives
is for ‘1’ from
1:2 ratio.
30
School of Dental Sciences
Page 15
Statistics for Oral Health Professionals
L11: Sample Size
Using PS software ….
• Comparing 2 proportions
Researchers want to compare prevalence of
POds. between male and female students.
Objective: To compare the prevalence of
POds. between male and female students.
Remember … We have to set Alpha, Power
and Detectable Difference.
Alpha = 0.05
Power = 80% (0.8)
Detectable Difference = ???
31
Using PS software ….
• Comparing 2 proportions
Researchers want to compare prevalence of
POds. between male and female students.
Objective: To compare the prevalence of
POds. between male and female students.
Alpha (0.05)
Power (80% = 0.8)
∆ = Detectable Difference (Clinically important difference) (P1-P0)
P0 = Prevalence of POds. among male (Get from literature)
P1 = Prevalence of POds. among female (Set based on
desired DD)
m = 1 (equal ratio between male and female)
32
School of Dental Sciences
Page 16
Statistics for Oral Health Professionals
L11: Sample Size
Using PS software ….
• Comparing 2 proportions
Researchers want to compare prevalence of
POds. between male and female students.
Objective: To compare the prevalence of
POds. between male and female students.
Alpha (0.05)
Power (80% = 0.8)
∆ = Detectable Difference (Clinically important difference) (P1-P0)
P0 = 0.27 (Say, we get from literature)
P1 = 0.37 (This is our decision based of DD. Here, we put
0.37. It means that we are setting the DD in this study as 0.10
or 10%, considering that difference of <10% is not important.
m = 1 (equal ratio between male and female)
33
1
4. Fill all 5 inputs
P0 – from previous or pilot study
(P1-P0) is Detectable Difference.
With the
2 sample size
340 in each
group, we
will achieve
80% power
3 to detect the
difference of
10% (PO
prev.) with
the Alpha at
5 0.05.
Ratio between 2 groups (m=1 means 1:1)
34
School of Dental Sciences
Page 17
Statistics for Oral Health Professionals
L11: Sample Size
IN SUMMARY, for comparing 2 proportions
We need to decide ….
Alpha (0.05; consensus – 0.05)
Power (80% = 0.8)
Po (Prevalence of POds. among male, 27% (reference?)
Detectable difference (should reflect clinical/practical importance
– in this example, 10% difference is decided. Therefore, P1 = 37%)
Ratio between 2 groups (m=1 “1:1”; m=2 “2:1”)
--------------------------------------------------------------------------
How to report?
We use PS software (Dupont & Plummer, 1997) to calculate the
sample size based on comparing two proportions.
To detect the difference of 10% in prevalence of POds. (P0 27%
versus P1 37%) between the two study groups with 80% power and
alpha 0.05, we need 340 male and 340 female students. (Po, the
prevalence of POds. among male was estimated as 27%, ref?) (You may add
some e.g. 10%)
Reference:
Dupont WD and Plummer WD (1997). PS power and sample size program
program available
for free on the Internet. Controlled Clin Trials, 18:274.
Available at http:// biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize
35
Exercise!
• Calculate the sample size for the detectable
difference of prevalence 20%. It means P0 = 27% and
P1 = 47%.
• Calculate the sample size for the detectable
difference of prevalence 20% (as above).
• And “male: female” ratio as 2:1 (m=2).
36
School of Dental Sciences
Page 18
Statistics for Oral Health Professionals
L11: Sample Size
Final COMMENTS
• For each objective, we should calculate the sample
size.
• Sometimes, in one objective, more than one variables
of interest (multiple linear regression). In this case,
we need to calculate for each variable of interest.
• Then, the biggest sample size will be “the sample size
of the study”.
• We need to add-up 10-20% because we may get nonresponse, loss of follow up, or any other loss.
37
Summary
• Estimating a mean
• Estimating a proportion
• Comparing two means
• Comparing two proportions
Using formulae
Using PS software
Not only “How to
Calculate”
Calculate” but
also “How to
Report”
Report”
38
School of Dental Sciences
Page 19