An analytical study for empirical ride to determine the

An analytical study for empirical ride to determine the
appropriate sample size for multiple regression models
Dr. Marie IL Mahnwud
Dept. Of Statistics & Mathematics
Fatuity of Came., Tanta University
ra=11====1C" .
ABSTRACT
Sample size is a critical pan of any research aesign. In vety basic terms,
the larger your sample size, the more likely you will be able to find
statistically significant results. If we understand these- concepts.-we will
understand why sample size is so important to the effectiveness of our
studies. In this paper simulation studies are used to assess the effect of
varying sample size on the accuracy of the' estimates of the parameters
and variance coniponents of multiple regression models. The objective of
this study is to identify the appropriate sample size for multiple
regression models. A numerical example will be carried out to illustrate
the results using SPSS Software v (14). The results show that the. sample
size increases when the required power increases or when the level of
significance decreases. This paper reviews- criteria for specying a
sample size and presents a strategy for determining the sample size. The
present study makes comparisons between different results with different
rules for estimating sample size and confirms that the adequate sample
size differs according to the main objective of multiple regression model
(the structural analysis of the relationship among the variables. or using
the model for forecasting). Results suggest interval estimation for sample
size in the previous IWO cases.
,
• Keywords:
STRATEGIES FOR DETERMINING SAMPLE SIZE. SAMPLE SIZE
CRITERIA, EFFECT SIZE, POWER OF A TEST, MULTIPLE
REGRESSION, and Richard Sawyer rule, Kleinbaum, /Clipper '
Muller rule. Krejcie & Morgan rule, Tabachnick & Fidel! Rule
(I) - INTRODUCTION
In most situations, researchers do not have access to an entire statistical
population of interest partly because it is too expensive and time
consuming to cover a lasge population or due to thidifficulty to get the
cooperation from the entire population to participate in the study. As a
result, researchers normally resod somaking important decisions about a
population based on a representative sample, licnce, estimating an
appropriate sample size is a very important aspect of a research design to
allow the researcher to make inferences from the sample aatistics to the
statistical population. The power of a sample survey lies in the ability to
estimate an appropriate sample size to obtain the necessary data to
describe the characteristics of the population. In other words, the most
frequently asked question concerning sampling is, "What is the sample
size do I need?" The answer of this question is influenced by a number of
factors, including the purpose of the study, population size, the risk of
selecting a "bad" sample, and the allowable sampling error (Too small a
sample will yield scant information, but ethics, economics. time. and
other constraints require that a sample size not be too large). This paper
reviews criteria for specifying a santrile size and presents a strategy for
determining the sample size. Some decision keys in planning any
experiment are, "How precise will my parameter estimates tend to he if I
select a particular sample size?" and "How big a sample do I need to
attain a desirable level of precision?" The formal definition of accuracy
is given by the square Root of the Mean Square Error (RMSE) and can be
expressed by the following formulation:
2
A
2
- ( 1)
RMSE =1,1E [(0 A -9) ] = E [(O - E [91) ] +(F. 10^-91)?
Where E is the expectation operator and is an estimate of 0, the value
of the parameter of interest. The first component represents precision.
Whereas the second component represents bias. An "accurate" estimate
has small bias, whereas a "precise" estimate has both small biai and
v ariance. When the parameter is unbiased, accuracy and precision are
equo alent concepts and the terms can be used interchangeably.
e
Since sample size is so important in making statistical inferences. your
committee naturally wants to be sure that your research uses an adequate
sample size to effectively address your research questions.
I- SAMPLE SIZE CRITERIA: for any research, the sample size of any
study must be determined during the designing stage of the study.
However. before determining the size of the sample that needed to he
drawn from the population, there are some factors must be taken into
consideration. In addition to the purpose of the study and population size,
there are several criteria usually will need to be specified to determine the
appropriate sample size: the level of precision, the level of confidence or
risk, and the degree of variability in the attributes being measured
(Miaoulis and Michener, 1976) and so Power of a Test and Effect size.
(1-1): The Level of Precision (The margin of error the researcher will
sampling error, is the
tolerate): The level of precision, sometimes called
2
range in which the true value of the population is estimated to be. This
range is often expressed in percentage points, (e.g., +.5 percent).
(1-2): The Confidence Level (a): The confidence or risk level is based
on ideas encompassed under the Central Limit Theorem. The key idea
encompassed in the Cerium! Limit Theorem is thatwhen a population is
repeatedly sampled, the average value of the attribute obtained by those
samples is equal to the true population value. Furthamore, the values
obtained by these samples are distributed normally about the try value.
In a normal distribution, approximately 95% of the sample valuei are
within two standard deviations of the true population value. In other
words, this means that, if a 95% confidence level is selected, 95 out of
100 samples will have the true population value. There is always a chance
that the sample you obtain does not represent the true population value.
This risk is reduced for 99% confidence levels and increased for 90% (or
lower) confidence levels.
(1-3): Degree of Variability in the attributes being measured refers to the
distribution of attributes in the population. The more heterogeneous a
population, the larger the sample size required to obtain a given level of
precision. The more homogeneous (less variable) a pu e u:ati,...i. zh,:
smaller the sample size.
(1-4): Power of a Test: Power of a test is the probability
rejecting a false null hypothesis. This probability is one minus the
probability of making a Type II error (p). Recall also that we choose the
probability of making a Type I error when we set q and that if we
decrease the probability of making a Type I error we increase the
probability of making a Type'II error. Thus, the-probability of correctly
retaining a true null hypothesis has the same relationship to Type I errors
as the probability of correctly rejecting an untrue null does to Type II
error. Yet; as I mentioned if we decrease the odds of making one type of
error we increase the odds of making the other type of error. What is the
relationship between Type I and Type II errors? Convention chooses
power of 80%. Note that this assumes that the risk of a Type II error can
be four times as great as the risk of a Tvoe I error. Sample size has an
indirect effect on power. Thus, sample size is of intensi because it
modifies our estimate of the standard deviation. When n is larg: %%e trill
have a smaller ft.
(1-5) — Effect size: (ES) is a ratio of a mean difference to a standard
deviation. Suppose an experimental treatment group has a mean score of
Xe and a control group has a mean score of Xc and a standard deviation
of Sc. Then ES = (Xe — Xc) / Sc, by Glass's method, while by HunterSchmit's method, then ES = (Xe — Xc) / pooled SD. Effect size permits
the comparative effect of different treatments to be compared. even when
based on different samples and different measuring instruments. Fffect
size generally means the degree to which the null hypothesis is false
(Cohen, 1988). It measures the distance between the null hypothesis and a
specified value of the alternative hypothesis. For any statistical test. the
null hypothesis has an effect size of zero. Effect size can be measured
using raw values or standardized values. Gillen has standardized effect
sizes into small, medium, large values depending on the type of statistical
analyses employed. Each statistical test has its own effect size index.
For example, effect size index for multiple regression is e and4 Ho posits
= .02, .15,
a
that f2= zero. f for small, medium and large effect sizes are 1
and .35 respectively. (Cohen, 1992) proposed that a medium effect size is
desirable as it would be able to approximate the average size of observed
effects in various fields.
In completing this discussion of
(1- 6) — OTHER CONSIDERATIONS:
determining sample size, there are three additional issueFirst, the above approaches to determining sample size have assumed
that a simple random sample is the sampling design. More complex
designs, e.g., stratified random samples. must take into account the
variances of subpopulations. strata. or clusters before an estimate of the
variability in the population as a whole can be made.
Second, with sample size is the number needed for the data analysis. If
descriptive statistics are to be used (mean, frequencies), then nearly any
sample size Will suffice. On the other hand, a good size sample. e.g.,
200-500 (Israel, Glenn D., 1992) is needed for multiple regression.
analysis of covariance, or log-linear analysis. The sample size Arnold be
appropriate for the analysis that is planned.
Finally, the sample size formulas provide the number of respi .:ses that
need to be obtained. Many researchers commonly add 10% to the sample
size to compensate for persons that the researcher is unable to contact.
Also n is often increased by 30% to compensate for nonresponse.
There are
2 - STRATEGIES FOR DETERMINING SAMPLE SIZE:
several approaches to determining the sample size. These include using a
census for small populations, imitating a sample size of similar studies,
ding published tables, and applying formulas to calculate a sample size.
4
(2-1): Using a Census for Small Populations: One approach is to use
the entire population as the sample (Tabachnick & Fidel'. 2001).
Although cost considerations make this impossible for large populations.
a census is attractive for small populations (NS 200). A census eliminates
sampling error and provides data on all the individuals in the population.
In addition, some costs such as questionnaire design and developing the
sampling framtrare fXetl
(2-2): Using a Sample Size of a Similar Study: Another approach is to
use the same sample size as those of studies similar to the one you plan.
Without reviewing the procedures employed in these studies you may run
the risk of repeating errors that were made in determining the sample size
for another study.
(2-3): Using Published Tables: A third way is to rely on published tables
which provide the sample size for a given set of criteria.
•
(2-4): Using formulas to calculate a Sample Size: Although tables can
provide a useful guide for determining the sample size, you may need to
calculate the necessary sample size for a different combination of levels
of precision, confidence, and variability. The fourth approach to
determining sample size is the application of one of several formulas.
- Formula for Sample Size for the Mean:
n = Hz, e. s).
As mentioned above, sample size calculation depends on a number of
complex factors. n = (sszte) 2 - - - - - - - - - - - - - - - - - - - - - - - (2)
where (s) is the standard deviation of the variable (perhaps estimated in a
pretest sample), (z) is the value of standard units corresponding to the
desired proportion of cases (z = 1.96 for mu-tailed testa at ,:tt. )5
significance level). and (e) is the tolerated variation in the sample. The
disadvantage of the sample size based on the mean is that a "good estimate of the population variance is necessary.
- Formula for Calculating a Sample for Proportions: n = f(z. e. v).
Cochran (1963) provides a formula to calculate sample sizes. Yams=
(1967) provides a simplified formula to calculate sample sizes.
n =P (I-P)(z/ 4) 2 .
where (P) is the population proportion.
(3)
- Another rule of thumb is based on (x 2 ), may be followed:
I- Determine desired'signIficance and difference levels. 'the researcher
must first select the desired level of significance (typically .05 i and the
smallest difference he or she wiches to be detected as significant. For
instance, in a study of gender and presidential support, one might want a
10% gender difference to be found significant at the .05 level.
2- Specifying expected and least difference tables. Researchers then
-
must create two tables. This requires estimating the marginal frequencies
(the number of men.and women and of presidential supporters and nonsupporters, for example). Expected cell frequencies are then calculated
for (x2 ). Then the researcher creates a least difference table as. for
example, placing 10% more cases than expected on the diagonal (ex., on
the male-non-supporters, female-supporters diagonal).
3- Solving for n. using the (x2 ) formula, this is:
(x2 ) = SUM ((Observed — Expected) 2 / Expected).
(4)
For instance, in a 2-by-2 table, let the upper-left cell be .20n, the upper
right .30n. the lower-left .20n. and the lower-right inn 1 to th • leastdifference cells be .25n..25n..15n. and .35n respectix el). t o. I degree
of freedom. at the (a) = 0.05. the critical value of chi-s'quare is 3.R41
Then: n =f (i, j, a). Where i. j refers to number of rows and columns.
Therefore X2 = 3.841 = SUM
+
(1(. 25n - . 20n) 2 /. 20ni
25n-. 30n) 2 /.30nl + ft.15n-.20n) 2/. 2011/ +
Solving for n,
35n-. 30n) 2 /. 30n1).
then: n = 3.841/.0416 = 92.3 = 93
Therefore, a sample size of 93 is the minimum sample size needed to
detect a 10% difference at the .05 significance level, by chi-square.
What happens when the population has less members than the sample
size calculated requires?
Calculate the sample size as before (n o ). And calculate n:
n = n o / 11 + (n o / N)l.
- - - - - - - - - (5)
- Appropriate Sample size estimation for multiple regression models
using four approaches: A comparison study
Multiple regression is a method used to examine the relationship
between one dependent variable rand one or more independvm ariahles,
Xi . The unstandardized regression coefficients br in the regression
equation
6
Y =130-1-b l X, + b2X2 ............ + bk )(k + e
(6)
are estimated using the method of least squares. In this method. the sum
of squared residuals between the regression plane and the observed values
of the dependent variable are minimized. The regression equation
represents a (hyper) plane in a k+I dimensional space in which k is the
number of independent variables XI XI, X3, .................... , plus uric
dimension for the dependent variable Y.
,
Now, the main equation is: How big a sample size do 1 need to do
multiple regression?
There are many techniques to answer this questions, such as:
(a) - Richard Sawyer rule (1984):
Suppose the regression coefficients in a prediction equation are
estimated from a random sample (y,, x 1 1 ), (i=1, 2, ----- n) where (y,) is
the dependent variable and (x;) is a vector of (K) predictor variables
for the i-th case. Suppose x has a multivariate normal distribution with
mean (p) and covariance matrix (I). Therefore, the predictors (x) arc
assumed to be random rather than fixed.
The conditional distribution of y i given x i is assumed to he normal
with mean ( I, x i ') l3 and variance cr. The regression core I
arc
estimated by the usual least squares estimates:
-
= (x
x) x y.
where y = (
yi.y2, ...... ).,).
1... 1 .......................
and
x=
x i x2 .................... x n
an additional independent observations (y. xT')is to be taken and y , is
to be predicted by y= (1, x .1 )
Sawyer (1983) studied the moments of the distribution of the
prediction error ()I- y). The mean of (y"- y) is, of course, zero. and
its standard deviation (Root of Mean Square Error (RMSE)) is:
RMSE = o"A (n, k),
---------------------
(7 )
(n + 11(n — 2)
n* (n — 2 — K) - - - - - (8)
Where: A ( n , k)
A •
Sawyer found that when A < 1.10, the distribution of (y - y ) is
approximately normal. In this case, the San absolute error (MAE) of
prediction: MAE i= E (19 - y I ) is approximately
7
7
MAE = .4 2 / B *RMSE = a* A (n, k) -4 2 / i ----- (9)
A 0,, k) is an inflation factor due to estimating the regression
coefficients, as n —, co, A 0 , k) -. 1. For fixed values of (A) and (K),
one can approximate the corresponding required base sample size (n)
by:
2
2
- - - - - (10)
-1 + A *K
n= 2A
— --2
2
1
A -1
A -1
The coefficients in (10) are displayed in table (1) for several values of
(A) and (K). Then: n =f(A, K).
This formula is used to calculate the sample sizes in Tahle I 1 1
=f (A, K).
Table (1): Sample size According to Richard Sawyer rule: n
Approximate relationship between number of predictors (K) and sample
size (n) required for varying degrees of prediction accuracy (Al
1.50
1.25
I
110
1.05
1.01
A—,
3.8
n
=1.8K
+2.8
n = 50.8K +51.8 n = 10.8K +11.8 n = 5.8K +6.8 n 2.8K
Kj
5
7
13
22
103
1
7
10
19
33
154
205
255
306
357
2
3
4
5
6
8
9
0_
11
12
13
14
I
15
16
17
18
19
20
_
459
509
560
611
662
713
763
815
866
917
968
1018
1069
3:1 65
76
87
98
108
)19
130
141
152
162
173
184
195
206
216
227
IU
15
30
-
9
13
25
—
36
42
48
18
21
24
12
14
16
59
65
71
77
83
88
94
100
29
-1
35
38
41
41
46
49
19
.;
10
112
Ill
123
I
I
1
Approximate base sample size needed to4cbieve • MAE= o•
32
.
23
25
27
25
30
32
I
1
I
[
Si
57
_
_
60
L
34
_
16 _
37
39
421 7 with 1<K< 20
The previous applied study results show that the sample size is adequate
when the degree of prediction accuracy (1.055A<1.10), where if A >1.10.
the estimated parameters has less precise. but if Ae1.05. the
is too large. and perhaps is not appropriate with the required precise.
8
I
(b) - Krejcie and Morgan (1970) used the following formula to
determine the sample size:
n =f (x2 N, p, d ).
,
2
2
2
/ id (NM+ X P ( 1 -1))1
( 11 )
n -=X NP
N: the population size.
d: the degree of accuracy expressed as a proportion (0.05).
p: the population propoition (assumed 1560150 since this would ' 2 provide the maximum sample size).
)( : the critical (table) value of chi-square for one degree of freedom
at the desired confidence level (a).
Let: a = 0.05, we can compute (n), when N=I000, 500, and 100 units:
(a). N=1000 (large population size).
n = 3.841*1000*0.5*0.5 / [(0.05) 2 *999 +3.841*0.5*0.51 =4278
(b). N= 500 (medium population size).
n = 3.841*500*0.5*0.5 / [(0.05)2 *499 +3.841*0.5*0.51 206
(c). ?fr. 100 (small population size).
n= 3.841*100*0.5*0.5 / [(0.05) 2 *99 + 3.841.0.5*0.51r- 80
From equation (8), we cantompute degree of prediction accuracy (A),
according to the sample size, in the previous three cases table (2)
Table 2
K
i
1
2
3
1
5
6
7
8
_ __ _9 _
10
11
12
13
14
15
16
17
18
19
20
N = 1000
A ma kl
1.003
1.005
1.007
1.009
1.011
1.013
1.015
1.017
1.019
1.021
1.023
1.025
1.027
1.029
1.031
1.032
1.034
1.036
1.038
1.040
N = 500
A (206.10
1.005
1.007
1.010
1.012
1.015
1.018
1.020
1.023
1.025
1.028
1.031
1.033
1.036
1.039
1.041
1.044
1.047
1.050
1.053. ,
1.056
N = 100
A 0.0.10
1.012
1. 019
1.026
1.033
1.040
1.04'
1.055
1.062
1 07
1.078
1.086
1.094
1.102
1.111
1 .120
1.129
1.138
1.147
.13 17
1.167
-
From table (2):
1 - When the population size is small (Nk 100), the sample size closed
up to the population size (the proportion n / N ?. 80% )
9
2 - When the population size N =100, we cannot recommend using
this rule if K > 12 because A a k) >1.10
3 - We cannot recommend using this rule, when N = 500, k < 18 and
when N = 1000, k < 25 because the sample size 206 & 278
eases
respectively are cons-geed too large in
(c)
—
a =f(K).
Kleinbaum, Kupper and Muller rule (1988).
The number of data points (i.e. observations or cases) is considerably
more than 5 to 10 times — the number of variables.
This formula is used to calculate the sample sizes in Table ;3).
Table (3): Sample size According to Kleinbaum, Kupper and Muller rule:
a =ft
Ki
1
n=
2
3
4
5
6
7
8
9
10
I
i
12
15
16
17
18
19
20
6K
5
10
15
6
12
18
20
11
13
14
5K
i
i
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
7K ' 8K
7
14
21
28
24
30
35
42
36
49
42
48 I 56
54
63
60 1 70
66 I 77
72
84
91
78
98
84
90 ' 105
112
90
119
102
126
1,08
114 f 133
8
16
24
32
40
48
56
64
72
80
88
96
104
112
1 10
128
136
144
152
9K
sok
9
18
27
36
45
54
63
72
81
90
99
i os
IO
20
30
40
50
60
70
80
90
UP)
110
12i)
117
13u
126
115 I
140
v541
144 1
160
153 I
162
171
180
170
180
190
200
A (Lim
1.121
1.087
1.076
1.07
L067
- 1.065
1.063
t 1.062
T 1.061
1.1161
1
L060
1
1.060
1.059
1
1.059
a
[
1
1
1 058 -1
1.058
1.058
1.058
1.058
1.057
1
t20 140 160
WM K>30 i K>91 K>4 K-2 K ...., ,
n is appropriate if
From table (3), we can modify the previous rule as follow:
The sample size is considerab y (more than 6 to 10 K) according to the
following restrictions:
n = 6k is appropriate sample size if K > 30.
n = 7k is appropriate sample size if K > 9,
n = 8k is appropriate sample size if K > 4,
n = 9k is appropriate sample size if K > 2,
n = 10k is appropriate sample size if K > I. This indicates that the
100
10
sample size can range from a minimum of 6K for performing multiple
regression analysis to a maximum of 10K .
(d) - Tabachnick and Fidell rules (2001):
1 - A rule of thumb for (structural analysis) testing (b) is given by:
n> 104 +
................................. (12
2 - A rule of thumb for (forecasting) testing-f&square is:
n > 50 +8k ..................................... (13)
3 - If you are using stepwise regression:
n > 40 k
(14)
is a rule of thumb since stepwise methods can train to noise too easily
and not generalize in a smaller dataset. From 1, 2. 3. then:
n t J(K),
where k = number of independent variables.
4 - In general. you need a larger n when the dependent ar:..ble is
skewed) you are seeking to test small effect sizes (rule of thumb):
n =f(K.,12 ), n?(8 /6+(k-1) ......................... (15)
where: 12 = .01, .15, and 35 (for srriall. medium & large effect sizes).
This formula was used to calculate the sample sizes in Table (4).
Notes that:
(1)= Where k> rt, regression gives a meaningless solution with R 2 =I
-
(2)- Coefficient of determination R 2 : this is the proportion of the
variation in the dependent yaziable explained by the regression model.
and is a measure of the goodness of fit of the model.
R 2 -adjusted: this is the coefficie.nt of detennination adjusted for the
number of independent variables in the regression model, Unlike the
coefficient of determination, R 2-adjusted may decrease if variables are
entered in the model that doesnot add significantly to the model fit.
- T,„) 2
E(y - y) 2
(n
-I)
(n k 1)
-
(16)
-
11
Where (Y) are the observed values for the dependent variable. cis the
average of the observed values and (Y ,$) are predicted values for the
dependent variable. In general, if the main objective of the study is the
structural analysis of the relationship among the variables, we focus on
reducing the values of the standard errors of estimated parameters
(increasing its signifkanke aw, if the main objective of the stud.y.,is
2
2
using the model for forecasting, we focus on increasing R or R .4; .
(3) - f for small, medium and large effect sizes (Cohen, 1992) are:
f = .02, .15, and .35 respectively. As we mentioned before, he proposed
that a medium effect size is desirable as it would be able to approximate
the average size of observed effects in various fields.
Table (4): Sample size According to Tabachnick and Fidel/ rule
n
=f (K).
n
I
.
..
..... --- - — -. i n.
Sample size Sample sin for
Sample size
for testing (b) for testing using stepwise
regression is
coefficients is (R) 2 is
__
x
n > I04+k
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
105
106
107
108
109
110
I11
112
113
114
115
116
117
118
119
120
121
122
19
123
20
124
it > 50+8k
58
66
74
82
90
98
106
114
122
130
138
146
154
162
170
178
116
194
202
210
II
?
40k
80
120
160
200
240
280
320
360
400
440
480
520
560
600
640
680
720
760
800
19
=f (K, f).
•
•>
+ -
large
effect site
Small
effect size
medium
effect site
r = 0.0Ior
f2 =n 02
w.
r=oas 1 f2 = 0 35
2
800 1400
801;401
401
802/40?.
803 , 403
804 !404
805/405
806 / 406
807 / 407
808 / 408
809/409
810/410
,
I
54
55
56
57
I.
I
ig
59
00
61
62
63
44
811 /411
65
812/412
1113 MU
814/414
815/415
816 / 416
817:41 7 '
818 / 418
819;419
66
67
68
69
70
'
23
24
25
26
27
28
29
30
31
32
33
34
1
35
36
37
38
39
4ll
72
73
-r-
41
42
From table (4), although the estimated sample size calculated using
Tabachnick and Fidell rule differs according to the purpose of the study.
But, in the applied studies we can excuse two cases:
1- Small effect size because the sample size is too large.
2- Large.effect size because the sample size is too small.
(III) — APPLIED STUDY
Example and application of the procedures:
Using the above methods as a guideline, the following section aims to
compare four approaches in determining the sample size of a population
500 units (where K = 4, 8 variables and n = 0.05) using:
(a)-Richard Sawyer(R) rule, (b)-Kleinbaum, Kupper&Muller(KK)rule,
( )-K rei e& Morgan (KM)rule. and
(d) - Tahachnich
L!...11( 'IF
c
(I) - (R) rule:
1.10
n 2:10.8K -Ett.8
4
:4 5.8K +6.8
54
98
8
30
54
(II) - (KK) rule:
■
Kt
36
a
(III)
-
n = 9K
8K
4
72
64
= IOK
I
4()
so
(KM) rule:
A am 1E114=500
4
1.015
1.024
8
(V) — (TF) rule.
Normal or Normal Approximation Dist.
-~
4
8
Sample size
for testing (b)
108
112
Sample size
for testing R 2
82
114
Sample size
for using
stepwise
regression
160
320
Sample size for
Skewed Dist.. and
medium effect size
i
57
61
In table (5), we show many comparisons between the laur rules hum utc
point of view of R2 4, RMSE, degree of prediction accuracy, and the
standard errors of estimated parameters for detenniningithe appropriate
sample sizes in different cases would be more meaningful and acceptable.
13
Table (5)
1
(KM) rule ' (TF) rule
(R) rule \(KK) rule
A A=
1.05 1.19 ns
Sk
n=
9k
n=
10k
K=4
= 54
206
111 1 )
(1.05)
R2 adj
1.94
.870
1.87
In
1 . 81
.009
.020
.001
.210
.008
.015
.001
.196
.008
.014
.001
.189
.864
2.25
.878
SE(61) .008
SE(b2) .016
SE(b3) .001
SE(b4) .193
RMSE
K=8
n=
R2 adi
RMSE
Test Test
R2
(IV
N=500
.845
2.32
.005
.011
.001
.097
1
S.W.
Medium
E.S.
01 / (1131
57
160
Mai 11.051
.869 1 .873
2.07 2.10
249 .869
2.27 2.20
.006 .007
.005
.012
.001
108
82
1
.012 .014
.001 1.001
127 1.140
I
112 1, '14
.
0.041 (1.04)
In
.008
.015
.001
.179
61
11.01) (1.011)
320
(1.05)
80
72
54 i
(1.05)
(1.07)
(1.10) (tat
I
206
062)
.904
1 79
.
.911
1.82
.908\ .900
1.79 1.84
.872
2.11
.889\ .885 .833 .895
1.89 1.94 2.95 1.93
.006
.012
.007 1.007 .907 .010
.0131.013 .014 .017
.001 1 001 . 101 .001
.120 1 .123 .100 .144
.176 i .1 7 0 A159 A o
.411 ' .4 1 1 .306 .553
I .457 1.468 .376 .685
1.17 , 1.19 .888 1.64
98
64
SE(61) stog .010
SE(b2) .014 .017
SE(b3) .001 .001
SE(b4) .119 .139
SE(b5) .193 .624
SE(b6) .426 .654
SE(b7) I .497 .675
SE(68) 1.25 [ 1.63
.898
1.89
.009 ' .009
.010
.016 1 .016
.017
.001 .001 .001
.141 i .132 i .129
.412 .330 1 .311
.509 .472
.538
.594 .587
.657
1.501 1.50
1.60
. 00/
.090
. 036
.308
I
.362
.854
I
From table (5), the numbers between the bracts represent the degrees of
prediction accuracy, and the bold Italic numbers in the rows represent the
, the smallest RMSE, and the smallest SE (bi), i = I, 2....8.
largest R2 „,ii
So we suggest that:
1- When K = 4
Ibr Ibreeasting.
(a)-If the main objective of the study is using the model
but as an
then the adequate sample size is 30 as a point estimation of n.
i sk and the smallest RMSE).
interval estimation is 30 S n S 40 (the largest R
(b)-If the main objective of the study is the structural analysis of the
relationship among the variables, then the adequate sample size is 206 as
a point estimation of n, but as interval estimation is 160 < n < 206 (the
estimated parameters have smallest standard errors and more significance).
14
2- When K = 8
(a)- If the main objective of the study is using the model for forecasting.
then the adequate sample size is 54 as a point estimation of n, but as
interval estimation is 54 <71"‹ 72.
(b)- If the main objective of the study is the structural analysts of the
relationshipinithig the variables, then the adequate sample size is 206 as
a point estimation of n, but as interval estimation is 206 < n < 320.
— CONCLUSIONS
In this paper simulation studies are used to assess the effect of varying
sample size on the accuracy of the estimates ofthe parameters and
variance cOmponents of multiple regression models.
One of the main goals of this article is to construct some statistical tables
that provide an estimate of the adequate sample size requi red for multiple
regression models (Table 1, 2, 3, Akand 5) with known some factors.
One of the most important advantages of these tables, besides saving time
and efforts, is the ability of making comparisons between different
sample sizes with different factors and different rules.
Limits of the study:
- We assume that the costs of choosing the sample units are equal.
- All the methods (rules) of determining the sample size (n) gives the
lower limit of (n), which the sample size always have a negative
relationship with thestandard en-Ors of the estimated pararneters(direct
relationship with the' accuracy or the results).
- The number of data (observations) for each variable represents a
determined factor of the sample size because we must choose the
minimum number of observations (list wise).
- Finally. we recommend in the future studies that consider other factors.
for example the kurtosis of the dependent variable and so t.,1.
REFERENCES:
Ary, D., Jacobe. L. C., and Razavieh, A. (1996). Introduction to research
in education. Orlando, Florida: Harcourt Brace College Publishers.
Barbara G. Tabachnick and Linda S. Fidell (2001). "Using Multivariate
Statistics", Fourth Edition, Needham Heights, MA: Allyn & Bacon.
Cohen,J.(1988)."Quantitative methods in psychology - . A power primer.
Psychological Bulletin, 112 (1).
Cooper H., and Hedges L. ( 1994). "The Handbook of Research
Synthesis", NY. Russell Sage.
15
Cox, D. and Oakes. D. (1984), "Analysis of survival data". Chapman and
hall, New York.
Glass G., McGawi3.-,ead-Smidt M., (1981). "Meta-analysis inSocial
Research, Newbury Park.
Israel, Glenn D. (1992). "Sampling the Evidence of Extension Program
Impact". Program Evaluation and Organizational Development, WAS,
University of Florida. PEOD-5. October.
Krejcie, R. V., & Morgan, D. W. (1970). Determining sample size for
research activities. Educational and Psychological Measurement, 30.
Lipsey. Mark (1990). "Design Sensitivity: Statistical power for
Experimental Research". Sage Publications.
Maxwell, S.E. (2000). "Sample Size and multiple Regression analysis".
Psychological methods", 5.
Salant, P., and Dittman, D. A. (1994). How to conduct your own survey.
New York: John Wiley & Sons. Inc.
Sawyer. R. (1982). "Sample size and the accuracy of predictions made
from multiple regression equations". Journal of Educational statistics,
7(2). 91 104
-
Sawy er. R. (1984). Determining minimum Sample sizes lhr multiple
regression grade prediction equations for colleges". ACT.
Research report. No 83
Smith, M. F. (1983). "Sampling Considerations In Evaluating
Cooperative Extension Programs". Florida Cooperative Extension
Service Bulletin PE-I. Institute of Food and Agricultural Sciences.
University of Florida.
http://www.ophmi.com
http://www. pphmj.com/journals/ada.htm
http://www.survevsystem.com/sscalc.htm
Cachedfaculty.chass.ncsu.eduigiuson/PA
16
765
/tabachnick.htm