An analytical study for empirical ride to determine the appropriate sample size for multiple regression models Dr. Marie IL Mahnwud Dept. Of Statistics & Mathematics Fatuity of Came., Tanta University ra=11====1C" . ABSTRACT Sample size is a critical pan of any research aesign. In vety basic terms, the larger your sample size, the more likely you will be able to find statistically significant results. If we understand these- concepts.-we will understand why sample size is so important to the effectiveness of our studies. In this paper simulation studies are used to assess the effect of varying sample size on the accuracy of the' estimates of the parameters and variance coniponents of multiple regression models. The objective of this study is to identify the appropriate sample size for multiple regression models. A numerical example will be carried out to illustrate the results using SPSS Software v (14). The results show that the. sample size increases when the required power increases or when the level of significance decreases. This paper reviews- criteria for specying a sample size and presents a strategy for determining the sample size. The present study makes comparisons between different results with different rules for estimating sample size and confirms that the adequate sample size differs according to the main objective of multiple regression model (the structural analysis of the relationship among the variables. or using the model for forecasting). Results suggest interval estimation for sample size in the previous IWO cases. , • Keywords: STRATEGIES FOR DETERMINING SAMPLE SIZE. SAMPLE SIZE CRITERIA, EFFECT SIZE, POWER OF A TEST, MULTIPLE REGRESSION, and Richard Sawyer rule, Kleinbaum, /Clipper ' Muller rule. Krejcie & Morgan rule, Tabachnick & Fidel! Rule (I) - INTRODUCTION In most situations, researchers do not have access to an entire statistical population of interest partly because it is too expensive and time consuming to cover a lasge population or due to thidifficulty to get the cooperation from the entire population to participate in the study. As a result, researchers normally resod somaking important decisions about a population based on a representative sample, licnce, estimating an appropriate sample size is a very important aspect of a research design to allow the researcher to make inferences from the sample aatistics to the statistical population. The power of a sample survey lies in the ability to estimate an appropriate sample size to obtain the necessary data to describe the characteristics of the population. In other words, the most frequently asked question concerning sampling is, "What is the sample size do I need?" The answer of this question is influenced by a number of factors, including the purpose of the study, population size, the risk of selecting a "bad" sample, and the allowable sampling error (Too small a sample will yield scant information, but ethics, economics. time. and other constraints require that a sample size not be too large). This paper reviews criteria for specifying a santrile size and presents a strategy for determining the sample size. Some decision keys in planning any experiment are, "How precise will my parameter estimates tend to he if I select a particular sample size?" and "How big a sample do I need to attain a desirable level of precision?" The formal definition of accuracy is given by the square Root of the Mean Square Error (RMSE) and can be expressed by the following formulation: 2 A 2 - ( 1) RMSE =1,1E [(0 A -9) ] = E [(O - E [91) ] +(F. 10^-91)? Where E is the expectation operator and is an estimate of 0, the value of the parameter of interest. The first component represents precision. Whereas the second component represents bias. An "accurate" estimate has small bias, whereas a "precise" estimate has both small biai and v ariance. When the parameter is unbiased, accuracy and precision are equo alent concepts and the terms can be used interchangeably. e Since sample size is so important in making statistical inferences. your committee naturally wants to be sure that your research uses an adequate sample size to effectively address your research questions. I- SAMPLE SIZE CRITERIA: for any research, the sample size of any study must be determined during the designing stage of the study. However. before determining the size of the sample that needed to he drawn from the population, there are some factors must be taken into consideration. In addition to the purpose of the study and population size, there are several criteria usually will need to be specified to determine the appropriate sample size: the level of precision, the level of confidence or risk, and the degree of variability in the attributes being measured (Miaoulis and Michener, 1976) and so Power of a Test and Effect size. (1-1): The Level of Precision (The margin of error the researcher will sampling error, is the tolerate): The level of precision, sometimes called 2 range in which the true value of the population is estimated to be. This range is often expressed in percentage points, (e.g., +.5 percent). (1-2): The Confidence Level (a): The confidence or risk level is based on ideas encompassed under the Central Limit Theorem. The key idea encompassed in the Cerium! Limit Theorem is thatwhen a population is repeatedly sampled, the average value of the attribute obtained by those samples is equal to the true population value. Furthamore, the values obtained by these samples are distributed normally about the try value. In a normal distribution, approximately 95% of the sample valuei are within two standard deviations of the true population value. In other words, this means that, if a 95% confidence level is selected, 95 out of 100 samples will have the true population value. There is always a chance that the sample you obtain does not represent the true population value. This risk is reduced for 99% confidence levels and increased for 90% (or lower) confidence levels. (1-3): Degree of Variability in the attributes being measured refers to the distribution of attributes in the population. The more heterogeneous a population, the larger the sample size required to obtain a given level of precision. The more homogeneous (less variable) a pu e u:ati,...i. zh,: smaller the sample size. (1-4): Power of a Test: Power of a test is the probability rejecting a false null hypothesis. This probability is one minus the probability of making a Type II error (p). Recall also that we choose the probability of making a Type I error when we set q and that if we decrease the probability of making a Type I error we increase the probability of making a Type'II error. Thus, the-probability of correctly retaining a true null hypothesis has the same relationship to Type I errors as the probability of correctly rejecting an untrue null does to Type II error. Yet; as I mentioned if we decrease the odds of making one type of error we increase the odds of making the other type of error. What is the relationship between Type I and Type II errors? Convention chooses power of 80%. Note that this assumes that the risk of a Type II error can be four times as great as the risk of a Tvoe I error. Sample size has an indirect effect on power. Thus, sample size is of intensi because it modifies our estimate of the standard deviation. When n is larg: %%e trill have a smaller ft. (1-5) — Effect size: (ES) is a ratio of a mean difference to a standard deviation. Suppose an experimental treatment group has a mean score of Xe and a control group has a mean score of Xc and a standard deviation of Sc. Then ES = (Xe — Xc) / Sc, by Glass's method, while by HunterSchmit's method, then ES = (Xe — Xc) / pooled SD. Effect size permits the comparative effect of different treatments to be compared. even when based on different samples and different measuring instruments. Fffect size generally means the degree to which the null hypothesis is false (Cohen, 1988). It measures the distance between the null hypothesis and a specified value of the alternative hypothesis. For any statistical test. the null hypothesis has an effect size of zero. Effect size can be measured using raw values or standardized values. Gillen has standardized effect sizes into small, medium, large values depending on the type of statistical analyses employed. Each statistical test has its own effect size index. For example, effect size index for multiple regression is e and4 Ho posits = .02, .15, a that f2= zero. f for small, medium and large effect sizes are 1 and .35 respectively. (Cohen, 1992) proposed that a medium effect size is desirable as it would be able to approximate the average size of observed effects in various fields. In completing this discussion of (1- 6) — OTHER CONSIDERATIONS: determining sample size, there are three additional issueFirst, the above approaches to determining sample size have assumed that a simple random sample is the sampling design. More complex designs, e.g., stratified random samples. must take into account the variances of subpopulations. strata. or clusters before an estimate of the variability in the population as a whole can be made. Second, with sample size is the number needed for the data analysis. If descriptive statistics are to be used (mean, frequencies), then nearly any sample size Will suffice. On the other hand, a good size sample. e.g., 200-500 (Israel, Glenn D., 1992) is needed for multiple regression. analysis of covariance, or log-linear analysis. The sample size Arnold be appropriate for the analysis that is planned. Finally, the sample size formulas provide the number of respi .:ses that need to be obtained. Many researchers commonly add 10% to the sample size to compensate for persons that the researcher is unable to contact. Also n is often increased by 30% to compensate for nonresponse. There are 2 - STRATEGIES FOR DETERMINING SAMPLE SIZE: several approaches to determining the sample size. These include using a census for small populations, imitating a sample size of similar studies, ding published tables, and applying formulas to calculate a sample size. 4 (2-1): Using a Census for Small Populations: One approach is to use the entire population as the sample (Tabachnick & Fidel'. 2001). Although cost considerations make this impossible for large populations. a census is attractive for small populations (NS 200). A census eliminates sampling error and provides data on all the individuals in the population. In addition, some costs such as questionnaire design and developing the sampling framtrare fXetl (2-2): Using a Sample Size of a Similar Study: Another approach is to use the same sample size as those of studies similar to the one you plan. Without reviewing the procedures employed in these studies you may run the risk of repeating errors that were made in determining the sample size for another study. (2-3): Using Published Tables: A third way is to rely on published tables which provide the sample size for a given set of criteria. • (2-4): Using formulas to calculate a Sample Size: Although tables can provide a useful guide for determining the sample size, you may need to calculate the necessary sample size for a different combination of levels of precision, confidence, and variability. The fourth approach to determining sample size is the application of one of several formulas. - Formula for Sample Size for the Mean: n = Hz, e. s). As mentioned above, sample size calculation depends on a number of complex factors. n = (sszte) 2 - - - - - - - - - - - - - - - - - - - - - - - (2) where (s) is the standard deviation of the variable (perhaps estimated in a pretest sample), (z) is the value of standard units corresponding to the desired proportion of cases (z = 1.96 for mu-tailed testa at ,:tt. )5 significance level). and (e) is the tolerated variation in the sample. The disadvantage of the sample size based on the mean is that a "good estimate of the population variance is necessary. - Formula for Calculating a Sample for Proportions: n = f(z. e. v). Cochran (1963) provides a formula to calculate sample sizes. Yams= (1967) provides a simplified formula to calculate sample sizes. n =P (I-P)(z/ 4) 2 . where (P) is the population proportion. (3) - Another rule of thumb is based on (x 2 ), may be followed: I- Determine desired'signIficance and difference levels. 'the researcher must first select the desired level of significance (typically .05 i and the smallest difference he or she wiches to be detected as significant. For instance, in a study of gender and presidential support, one might want a 10% gender difference to be found significant at the .05 level. 2- Specifying expected and least difference tables. Researchers then - must create two tables. This requires estimating the marginal frequencies (the number of men.and women and of presidential supporters and nonsupporters, for example). Expected cell frequencies are then calculated for (x2 ). Then the researcher creates a least difference table as. for example, placing 10% more cases than expected on the diagonal (ex., on the male-non-supporters, female-supporters diagonal). 3- Solving for n. using the (x2 ) formula, this is: (x2 ) = SUM ((Observed — Expected) 2 / Expected). (4) For instance, in a 2-by-2 table, let the upper-left cell be .20n, the upper right .30n. the lower-left .20n. and the lower-right inn 1 to th • leastdifference cells be .25n..25n..15n. and .35n respectix el). t o. I degree of freedom. at the (a) = 0.05. the critical value of chi-s'quare is 3.R41 Then: n =f (i, j, a). Where i. j refers to number of rows and columns. Therefore X2 = 3.841 = SUM + (1(. 25n - . 20n) 2 /. 20ni 25n-. 30n) 2 /.30nl + ft.15n-.20n) 2/. 2011/ + Solving for n, 35n-. 30n) 2 /. 30n1). then: n = 3.841/.0416 = 92.3 = 93 Therefore, a sample size of 93 is the minimum sample size needed to detect a 10% difference at the .05 significance level, by chi-square. What happens when the population has less members than the sample size calculated requires? Calculate the sample size as before (n o ). And calculate n: n = n o / 11 + (n o / N)l. - - - - - - - - - (5) - Appropriate Sample size estimation for multiple regression models using four approaches: A comparison study Multiple regression is a method used to examine the relationship between one dependent variable rand one or more independvm ariahles, Xi . The unstandardized regression coefficients br in the regression equation 6 Y =130-1-b l X, + b2X2 ............ + bk )(k + e (6) are estimated using the method of least squares. In this method. the sum of squared residuals between the regression plane and the observed values of the dependent variable are minimized. The regression equation represents a (hyper) plane in a k+I dimensional space in which k is the number of independent variables XI XI, X3, .................... , plus uric dimension for the dependent variable Y. , Now, the main equation is: How big a sample size do 1 need to do multiple regression? There are many techniques to answer this questions, such as: (a) - Richard Sawyer rule (1984): Suppose the regression coefficients in a prediction equation are estimated from a random sample (y,, x 1 1 ), (i=1, 2, ----- n) where (y,) is the dependent variable and (x;) is a vector of (K) predictor variables for the i-th case. Suppose x has a multivariate normal distribution with mean (p) and covariance matrix (I). Therefore, the predictors (x) arc assumed to be random rather than fixed. The conditional distribution of y i given x i is assumed to he normal with mean ( I, x i ') l3 and variance cr. The regression core I arc estimated by the usual least squares estimates: - = (x x) x y. where y = ( yi.y2, ...... ).,). 1... 1 ....................... and x= x i x2 .................... x n an additional independent observations (y. xT')is to be taken and y , is to be predicted by y= (1, x .1 ) Sawyer (1983) studied the moments of the distribution of the prediction error ()I- y). The mean of (y"- y) is, of course, zero. and its standard deviation (Root of Mean Square Error (RMSE)) is: RMSE = o"A (n, k), --------------------- (7 ) (n + 11(n — 2) n* (n — 2 — K) - - - - - (8) Where: A ( n , k) A • Sawyer found that when A < 1.10, the distribution of (y - y ) is approximately normal. In this case, the San absolute error (MAE) of prediction: MAE i= E (19 - y I ) is approximately 7 7 MAE = .4 2 / B *RMSE = a* A (n, k) -4 2 / i ----- (9) A 0,, k) is an inflation factor due to estimating the regression coefficients, as n —, co, A 0 , k) -. 1. For fixed values of (A) and (K), one can approximate the corresponding required base sample size (n) by: 2 2 - - - - - (10) -1 + A *K n= 2A — --2 2 1 A -1 A -1 The coefficients in (10) are displayed in table (1) for several values of (A) and (K). Then: n =f(A, K). This formula is used to calculate the sample sizes in Tahle I 1 1 =f (A, K). Table (1): Sample size According to Richard Sawyer rule: n Approximate relationship between number of predictors (K) and sample size (n) required for varying degrees of prediction accuracy (Al 1.50 1.25 I 110 1.05 1.01 A—, 3.8 n =1.8K +2.8 n = 50.8K +51.8 n = 10.8K +11.8 n = 5.8K +6.8 n 2.8K Kj 5 7 13 22 103 1 7 10 19 33 154 205 255 306 357 2 3 4 5 6 8 9 0_ 11 12 13 14 I 15 16 17 18 19 20 _ 459 509 560 611 662 713 763 815 866 917 968 1018 1069 3:1 65 76 87 98 108 )19 130 141 152 162 173 184 195 206 216 227 IU 15 30 - 9 13 25 — 36 42 48 18 21 24 12 14 16 59 65 71 77 83 88 94 100 29 -1 35 38 41 41 46 49 19 .; 10 112 Ill 123 I I 1 Approximate base sample size needed to4cbieve • MAE= o• 32 . 23 25 27 25 30 32 I 1 I [ Si 57 _ _ 60 L 34 _ 16 _ 37 39 421 7 with 1<K< 20 The previous applied study results show that the sample size is adequate when the degree of prediction accuracy (1.055A<1.10), where if A >1.10. the estimated parameters has less precise. but if Ae1.05. the is too large. and perhaps is not appropriate with the required precise. 8 I (b) - Krejcie and Morgan (1970) used the following formula to determine the sample size: n =f (x2 N, p, d ). , 2 2 2 / id (NM+ X P ( 1 -1))1 ( 11 ) n -=X NP N: the population size. d: the degree of accuracy expressed as a proportion (0.05). p: the population propoition (assumed 1560150 since this would ' 2 provide the maximum sample size). )( : the critical (table) value of chi-square for one degree of freedom at the desired confidence level (a). Let: a = 0.05, we can compute (n), when N=I000, 500, and 100 units: (a). N=1000 (large population size). n = 3.841*1000*0.5*0.5 / [(0.05) 2 *999 +3.841*0.5*0.51 =4278 (b). N= 500 (medium population size). n = 3.841*500*0.5*0.5 / [(0.05)2 *499 +3.841*0.5*0.51 206 (c). ?fr. 100 (small population size). n= 3.841*100*0.5*0.5 / [(0.05) 2 *99 + 3.841.0.5*0.51r- 80 From equation (8), we cantompute degree of prediction accuracy (A), according to the sample size, in the previous three cases table (2) Table 2 K i 1 2 3 1 5 6 7 8 _ __ _9 _ 10 11 12 13 14 15 16 17 18 19 20 N = 1000 A ma kl 1.003 1.005 1.007 1.009 1.011 1.013 1.015 1.017 1.019 1.021 1.023 1.025 1.027 1.029 1.031 1.032 1.034 1.036 1.038 1.040 N = 500 A (206.10 1.005 1.007 1.010 1.012 1.015 1.018 1.020 1.023 1.025 1.028 1.031 1.033 1.036 1.039 1.041 1.044 1.047 1.050 1.053. , 1.056 N = 100 A 0.0.10 1.012 1. 019 1.026 1.033 1.040 1.04' 1.055 1.062 1 07 1.078 1.086 1.094 1.102 1.111 1 .120 1.129 1.138 1.147 .13 17 1.167 - From table (2): 1 - When the population size is small (Nk 100), the sample size closed up to the population size (the proportion n / N ?. 80% ) 9 2 - When the population size N =100, we cannot recommend using this rule if K > 12 because A a k) >1.10 3 - We cannot recommend using this rule, when N = 500, k < 18 and when N = 1000, k < 25 because the sample size 206 & 278 eases respectively are cons-geed too large in (c) — a =f(K). Kleinbaum, Kupper and Muller rule (1988). The number of data points (i.e. observations or cases) is considerably more than 5 to 10 times — the number of variables. This formula is used to calculate the sample sizes in Table ;3). Table (3): Sample size According to Kleinbaum, Kupper and Muller rule: a =ft Ki 1 n= 2 3 4 5 6 7 8 9 10 I i 12 15 16 17 18 19 20 6K 5 10 15 6 12 18 20 11 13 14 5K i i 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 7K ' 8K 7 14 21 28 24 30 35 42 36 49 42 48 I 56 54 63 60 1 70 66 I 77 72 84 91 78 98 84 90 ' 105 112 90 119 102 126 1,08 114 f 133 8 16 24 32 40 48 56 64 72 80 88 96 104 112 1 10 128 136 144 152 9K sok 9 18 27 36 45 54 63 72 81 90 99 i os IO 20 30 40 50 60 70 80 90 UP) 110 12i) 117 13u 126 115 I 140 v541 144 1 160 153 I 162 171 180 170 180 190 200 A (Lim 1.121 1.087 1.076 1.07 L067 - 1.065 1.063 t 1.062 T 1.061 1.1161 1 L060 1 1.060 1.059 1 1.059 a [ 1 1 1 058 -1 1.058 1.058 1.058 1.058 1.057 1 t20 140 160 WM K>30 i K>91 K>4 K-2 K ...., , n is appropriate if From table (3), we can modify the previous rule as follow: The sample size is considerab y (more than 6 to 10 K) according to the following restrictions: n = 6k is appropriate sample size if K > 30. n = 7k is appropriate sample size if K > 9, n = 8k is appropriate sample size if K > 4, n = 9k is appropriate sample size if K > 2, n = 10k is appropriate sample size if K > I. This indicates that the 100 10 sample size can range from a minimum of 6K for performing multiple regression analysis to a maximum of 10K . (d) - Tabachnick and Fidell rules (2001): 1 - A rule of thumb for (structural analysis) testing (b) is given by: n> 104 + ................................. (12 2 - A rule of thumb for (forecasting) testing-f&square is: n > 50 +8k ..................................... (13) 3 - If you are using stepwise regression: n > 40 k (14) is a rule of thumb since stepwise methods can train to noise too easily and not generalize in a smaller dataset. From 1, 2. 3. then: n t J(K), where k = number of independent variables. 4 - In general. you need a larger n when the dependent ar:..ble is skewed) you are seeking to test small effect sizes (rule of thumb): n =f(K.,12 ), n?(8 /6+(k-1) ......................... (15) where: 12 = .01, .15, and 35 (for srriall. medium & large effect sizes). This formula was used to calculate the sample sizes in Table (4). Notes that: (1)= Where k> rt, regression gives a meaningless solution with R 2 =I - (2)- Coefficient of determination R 2 : this is the proportion of the variation in the dependent yaziable explained by the regression model. and is a measure of the goodness of fit of the model. R 2 -adjusted: this is the coefficie.nt of detennination adjusted for the number of independent variables in the regression model, Unlike the coefficient of determination, R 2-adjusted may decrease if variables are entered in the model that doesnot add significantly to the model fit. - T,„) 2 E(y - y) 2 (n -I) (n k 1) - (16) - 11 Where (Y) are the observed values for the dependent variable. cis the average of the observed values and (Y ,$) are predicted values for the dependent variable. In general, if the main objective of the study is the structural analysis of the relationship among the variables, we focus on reducing the values of the standard errors of estimated parameters (increasing its signifkanke aw, if the main objective of the stud.y.,is 2 2 using the model for forecasting, we focus on increasing R or R .4; . (3) - f for small, medium and large effect sizes (Cohen, 1992) are: f = .02, .15, and .35 respectively. As we mentioned before, he proposed that a medium effect size is desirable as it would be able to approximate the average size of observed effects in various fields. Table (4): Sample size According to Tabachnick and Fidel/ rule n =f (K). n I . .. ..... --- - — -. i n. Sample size Sample sin for Sample size for testing (b) for testing using stepwise regression is coefficients is (R) 2 is __ x n > I04+k 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 105 106 107 108 109 110 I11 112 113 114 115 116 117 118 119 120 121 122 19 123 20 124 it > 50+8k 58 66 74 82 90 98 106 114 122 130 138 146 154 162 170 178 116 194 202 210 II ? 40k 80 120 160 200 240 280 320 360 400 440 480 520 560 600 640 680 720 760 800 19 =f (K, f). • •> + - large effect site Small effect size medium effect site r = 0.0Ior f2 =n 02 w. r=oas 1 f2 = 0 35 2 800 1400 801;401 401 802/40?. 803 , 403 804 !404 805/405 806 / 406 807 / 407 808 / 408 809/409 810/410 , I 54 55 56 57 I. I ig 59 00 61 62 63 44 811 /411 65 812/412 1113 MU 814/414 815/415 816 / 416 817:41 7 ' 818 / 418 819;419 66 67 68 69 70 ' 23 24 25 26 27 28 29 30 31 32 33 34 1 35 36 37 38 39 4ll 72 73 -r- 41 42 From table (4), although the estimated sample size calculated using Tabachnick and Fidell rule differs according to the purpose of the study. But, in the applied studies we can excuse two cases: 1- Small effect size because the sample size is too large. 2- Large.effect size because the sample size is too small. (III) — APPLIED STUDY Example and application of the procedures: Using the above methods as a guideline, the following section aims to compare four approaches in determining the sample size of a population 500 units (where K = 4, 8 variables and n = 0.05) using: (a)-Richard Sawyer(R) rule, (b)-Kleinbaum, Kupper&Muller(KK)rule, ( )-K rei e& Morgan (KM)rule. and (d) - Tahachnich L!...11( 'IF c (I) - (R) rule: 1.10 n 2:10.8K -Ett.8 4 :4 5.8K +6.8 54 98 8 30 54 (II) - (KK) rule: ■ Kt 36 a (III) - n = 9K 8K 4 72 64 = IOK I 4() so (KM) rule: A am 1E114=500 4 1.015 1.024 8 (V) — (TF) rule. Normal or Normal Approximation Dist. -~ 4 8 Sample size for testing (b) 108 112 Sample size for testing R 2 82 114 Sample size for using stepwise regression 160 320 Sample size for Skewed Dist.. and medium effect size i 57 61 In table (5), we show many comparisons between the laur rules hum utc point of view of R2 4, RMSE, degree of prediction accuracy, and the standard errors of estimated parameters for detenniningithe appropriate sample sizes in different cases would be more meaningful and acceptable. 13 Table (5) 1 (KM) rule ' (TF) rule (R) rule \(KK) rule A A= 1.05 1.19 ns Sk n= 9k n= 10k K=4 = 54 206 111 1 ) (1.05) R2 adj 1.94 .870 1.87 In 1 . 81 .009 .020 .001 .210 .008 .015 .001 .196 .008 .014 .001 .189 .864 2.25 .878 SE(61) .008 SE(b2) .016 SE(b3) .001 SE(b4) .193 RMSE K=8 n= R2 adi RMSE Test Test R2 (IV N=500 .845 2.32 .005 .011 .001 .097 1 S.W. Medium E.S. 01 / (1131 57 160 Mai 11.051 .869 1 .873 2.07 2.10 249 .869 2.27 2.20 .006 .007 .005 .012 .001 108 82 1 .012 .014 .001 1.001 127 1.140 I 112 1, '14 . 0.041 (1.04) In .008 .015 .001 .179 61 11.01) (1.011) 320 (1.05) 80 72 54 i (1.05) (1.07) (1.10) (tat I 206 062) .904 1 79 . .911 1.82 .908\ .900 1.79 1.84 .872 2.11 .889\ .885 .833 .895 1.89 1.94 2.95 1.93 .006 .012 .007 1.007 .907 .010 .0131.013 .014 .017 .001 1 001 . 101 .001 .120 1 .123 .100 .144 .176 i .1 7 0 A159 A o .411 ' .4 1 1 .306 .553 I .457 1.468 .376 .685 1.17 , 1.19 .888 1.64 98 64 SE(61) stog .010 SE(b2) .014 .017 SE(b3) .001 .001 SE(b4) .119 .139 SE(b5) .193 .624 SE(b6) .426 .654 SE(b7) I .497 .675 SE(68) 1.25 [ 1.63 .898 1.89 .009 ' .009 .010 .016 1 .016 .017 .001 .001 .001 .141 i .132 i .129 .412 .330 1 .311 .509 .472 .538 .594 .587 .657 1.501 1.50 1.60 . 00/ .090 . 036 .308 I .362 .854 I From table (5), the numbers between the bracts represent the degrees of prediction accuracy, and the bold Italic numbers in the rows represent the , the smallest RMSE, and the smallest SE (bi), i = I, 2....8. largest R2 „,ii So we suggest that: 1- When K = 4 Ibr Ibreeasting. (a)-If the main objective of the study is using the model but as an then the adequate sample size is 30 as a point estimation of n. i sk and the smallest RMSE). interval estimation is 30 S n S 40 (the largest R (b)-If the main objective of the study is the structural analysis of the relationship among the variables, then the adequate sample size is 206 as a point estimation of n, but as interval estimation is 160 < n < 206 (the estimated parameters have smallest standard errors and more significance). 14 2- When K = 8 (a)- If the main objective of the study is using the model for forecasting. then the adequate sample size is 54 as a point estimation of n, but as interval estimation is 54 <71"‹ 72. (b)- If the main objective of the study is the structural analysts of the relationshipinithig the variables, then the adequate sample size is 206 as a point estimation of n, but as interval estimation is 206 < n < 320. — CONCLUSIONS In this paper simulation studies are used to assess the effect of varying sample size on the accuracy of the estimates ofthe parameters and variance cOmponents of multiple regression models. One of the main goals of this article is to construct some statistical tables that provide an estimate of the adequate sample size requi red for multiple regression models (Table 1, 2, 3, Akand 5) with known some factors. One of the most important advantages of these tables, besides saving time and efforts, is the ability of making comparisons between different sample sizes with different factors and different rules. Limits of the study: - We assume that the costs of choosing the sample units are equal. - All the methods (rules) of determining the sample size (n) gives the lower limit of (n), which the sample size always have a negative relationship with thestandard en-Ors of the estimated pararneters(direct relationship with the' accuracy or the results). - The number of data (observations) for each variable represents a determined factor of the sample size because we must choose the minimum number of observations (list wise). - Finally. we recommend in the future studies that consider other factors. for example the kurtosis of the dependent variable and so t.,1. REFERENCES: Ary, D., Jacobe. L. C., and Razavieh, A. (1996). Introduction to research in education. Orlando, Florida: Harcourt Brace College Publishers. Barbara G. Tabachnick and Linda S. Fidell (2001). "Using Multivariate Statistics", Fourth Edition, Needham Heights, MA: Allyn & Bacon. Cohen,J.(1988)."Quantitative methods in psychology - . A power primer. Psychological Bulletin, 112 (1). Cooper H., and Hedges L. ( 1994). "The Handbook of Research Synthesis", NY. Russell Sage. 15 Cox, D. and Oakes. D. (1984), "Analysis of survival data". Chapman and hall, New York. Glass G., McGawi3.-,ead-Smidt M., (1981). "Meta-analysis inSocial Research, Newbury Park. Israel, Glenn D. (1992). "Sampling the Evidence of Extension Program Impact". Program Evaluation and Organizational Development, WAS, University of Florida. PEOD-5. October. Krejcie, R. V., & Morgan, D. W. (1970). Determining sample size for research activities. Educational and Psychological Measurement, 30. Lipsey. Mark (1990). "Design Sensitivity: Statistical power for Experimental Research". Sage Publications. Maxwell, S.E. (2000). "Sample Size and multiple Regression analysis". Psychological methods", 5. Salant, P., and Dittman, D. A. (1994). How to conduct your own survey. New York: John Wiley & Sons. Inc. Sawyer. R. (1982). "Sample size and the accuracy of predictions made from multiple regression equations". Journal of Educational statistics, 7(2). 91 104 - Sawy er. R. (1984). Determining minimum Sample sizes lhr multiple regression grade prediction equations for colleges". ACT. Research report. No 83 Smith, M. F. (1983). "Sampling Considerations In Evaluating Cooperative Extension Programs". Florida Cooperative Extension Service Bulletin PE-I. Institute of Food and Agricultural Sciences. University of Florida. http://www.ophmi.com http://www. pphmj.com/journals/ada.htm http://www.survevsystem.com/sscalc.htm Cachedfaculty.chass.ncsu.eduigiuson/PA 16 765 /tabachnick.htm
© Copyright 2024