5 Reliability of the TOMAGS were computed for the two forming groups (normal and gifted) at each age interval from 6-0 to 9-11 for the TOMAGS Primary and from 9-0 to 12-11 for the TOMAGS Intermediate using the entire normative sample as subjects. In addition, the sample alphas for the entire normative samples (all ages) were computed. Finally, the alphas were averaged across age intervals using the z-transformation technique. The average alphas are listed in the right-hand column of each table. The average alphas for the quotient are highly acceptable. All of the coefficients exceed .80; in 6 of 20 instances (not counting average alphas), they exceed .90. The magnitude of these coefficients indicates that both levels of the TOMAGS are highly reliable and the results can be used with confidence. Standard errors o f measurement f o r t h e t w o groups are reported in Tables 5.3 and 5.4. As noted previously, t h e SEM i s a useful statistic t h a t i s closely related to reliability and can be used to estimate a confidence interval that surrounds a particular score. It estimates the amount of error that may be reflected in an individual's score due to the less than perfect reliability of the instrument. The SEM is based on the formula SEM = SD N/1-7. (SD •-• standard deviation; r = reliability), and establishes a zone within which an individual's true score probably lies. The r in the formula is represented b y the coefficients alpha reported in Tables 5.1 and 5.2. To Anastasi and Urbina (1997), the concept of test reliability refers to the "extent to which individual differences in test scores are attributable to 'true' differences in the characteristics under consideration and the extent to which they are attributable to chance errors" (p. 84). For instruments such as the TOMAGS, reliability coefficients must approximate or exceed .80 in magnitude for the instrument to be considered minimally reliable; coefficients must be .90 or above to be considered most desirable (Aiken, 1994; Helmstadter, 1964; Nunnally, 1978; Salvia & Ysseldyke, 1995). This section examines three types of errors that can affect the reliability of the TOMAGS: (a) content sampling, (b) time sampling, and (c) scorer differences. Content Sampling Content sampling reflects the degree of homogeneity among items within an instrument. To determine this homogeneity, internal consistency reliability of the items must be studied. This type of reliability demonstrates the extent to which the items correlate with one another, and it is computed by using Cronbach's (1951) coefficient alpha method. Coefficient alphas for the TOMAGS Primary and TOMAGS Intermediate are reported in Tables 5.1 and 5.2. Coefficient alphas 25 TABLE 5.1 Coefficient Alphas for the TOMAGS Primary at Four Ages and All Ages for Both Standardization Groups (Decimals Omitted) Age Group 6 7 8 9 All Ages Average Normal Gifted 92 90 81 84 89 88 81 81 92 90 86 87 TABLE 5.2 Coefficient Alphas for the TOMAGS Intermediate at Four Ages and All Ages for Both Standardization Groups (Decimals Omitted) Age Group 9 10 11 12 All Ages Average Normal Gifted 81 85 86 83 90 82 90 82 88 86 87 84 TABLE 5.3 Standard Errors of Measurement for the TOMAGS Primary at Four Ages and All Ages for Both Standardization Groups Age Group 6 7 8 9 All Ages Average Normal Gifted 4.24 4.74 6.53 6.00 4.97 5.19 6.53 6.53 4.24 4.74 5.61 5.40 TABLE 5.4 Standard Errors of Measurement for the TOMAGS Intermediate at Four Ages and All Ages for Both Standardization Groups Age Group 9 10 11 12 All Ages Average Normal Gifted 6.53 5.80 5.61 6.18 4.74 6.36 4.74 6.36 5.19 5.61 5.40 6.00 The clinical value of these figures is exemplified by the case of 11-year-old Susan. She has a TOMAGS Intermediate Quotient of 98, and the standard score SEM (using the normal group) for her age is found to be 4.74. Thus, the examiner knows with 68% probability that her true score lies between 93.26 and 102.74 (1 SEM); with 95% probability that her true score lies between 88.52 and 107.48 (2 SEM); and with 99% probability that her true score lies between 83.78 and 112.22 (3 SEM). Obviously, the smaller the SEM, the more confidence one can have in an instrument's results. One cannot always assume that, because a n assessment instrument is reliable for a general popu26 and Urbina (1997) stated that this form of reliability "shows the extent to which scores on a test can be generalized over different occasions; the higher the reliability, the less susceptible the scores are to the random daily changes in the conditions of the test takers or of the testing environment" (p. 92). Two studies w e r e undertaken t o determine whether the results of the TOMAGS were stable over time. In both studies, 30 students identified as gifted in mathematics i n a public elementary school i n Texas were rated twice, w i t h a 2-week interval between ratings. The students in Study 1 took the TOMAGS Primary and were 6 through 9 years old. The students i n Study 2 took the TOMAGS Intermediate and were 9 through 11 years old. Standard scores were computed (M=100, SD=15), a n d t h e results from the two ratings for both studies were correlated. The resulting test—retest coefficients are reported in Table 5.7. lation, it will be equally reliable for every subgroup within that population. Persons who develop instruments thus should demonstrate that they are indeed reliable for subgroups, especially those subgroups that are likely to be assessed or that because of ethnic, cultural, racial, or linguistic differences may not be appropriate candidates for assessment. The alphas for four selected subgroups within the two normative samples for both levels o f the TOMAGS are reported i n Tables 5.5 and 5.6. The groups s t u d i e d a r e m a l e s , females, A f r i c a n Americans (blacks), and Hispanic Americans. The large composite alphas found in the right column of the tables demonstrate that the TOMAGS is equally reliable f o r all t h e subgroups investigated. These alphas are essentially the same as the average alphas reported in Tables 5.1 and 5.2 for the entire normative sample. These findings support the idea that the TOMAGS contains little or no bias relative to the subgroups studied. Scorer Differences Time Sampling The type of reliability discussed in this section refers to the amount of test error due to examiner variability because of scoring differences. Unreliable scoring is usually the result of clerical errors o r improper application of standard scoring criteria on the part of the examiner. Scorer error can be reduced consider- Time sampling examines the extent to which a child's test performance is constant over time, and it is generally measured using the test—retest technique. This approach involves administering an instrument and then readministering it a week or 2 later. Anastasi TABLE 5.5 Coefficient Alphas for the TOMAGS Primary for Four Subgroups for Both Standardization Groups (Decimals Omitted) Subgroup Group Males Females African Americans Normal Gifted 92 92 92 92 94 88 Hispanic Americans 89 92 Average 92 91 TABLE 5.6 Coefficient Alphas for the TOMAGS Intermediate for Four Subgroups for Both Standardization Groups (Decimals Omitted) Subgroup Group Males Females African Americans Normal Gifted 90 86 89 85 89 90 27 Hispanic Americans 88 82 Average 89 86 ably by the availability of clear administration procedures and detailed guidelines governing scoring, and by opportunities to practice scoring. Nevertheless, developers should demonstrate statistically the amount of error in their instruments due to different scorers. To do this, Anastasi and Urbina (1997) recommended that two trained individuals score a set of assessments independently. The correlation between scores is a relational index of agreement. In the case of the TOMAGS, two staff persons in PRO-ED's r e s e a r c h d e p a r t m e n t independently scored 3 8 completed protocols f o r t h e TOMAGS Primary a n d 4 6 completed protocols f o r t h e TOMAGS Intermediate. The protocols were randomly selected from the normative sample. The sample for the TOMAGS Primary ranged in age from 8-2 through 9-10 and the sample for the TOMAGS Intermediate ranged in age from 9-6 through 12-1. The results of the scoring were correlated. The resulting coefficients for both TOMAGS Quotients were .99. The size of these coefficients provides convincing evidence in support of the scale's scorer reliability. Summary The TOMAGS's overall reliability is summarized in Table 5.8, which shows the scale's status relative to three sources of test error: content, time, and scorer. The coefficients displayed are drawn from those reported in previous sections of this chapter. They were averaged using the z-transformation technique described by Guilford and Fruchter (1978). As can be seen from viewing the figures listed in the average column at the right of the table, the TOMAGS evidences a high degree of reliability. This reliability is consistently high across all four types of test error. The magnitude o f these coefficients strongly suggests that (a) the TOMAGS possesses little test error and (b) users can have confidence in its results. TABLE 5.7 Test—Retest Reliability for the TOMAGS First Testing Level Primary Intermediate N 30 29 M 112.28 109.59 Second Testing SD 16.69 12.60 M 115.10 111.29 SD 15.74 10.94 r .84 .94 TABLE 5.8 Summary of TOMAGS: Reliability Related to Three Sources of Test Error (Decimals Omitted) Source of Test Error Content Sampling TOMAGS Primary TOMAGS Intermediate Normal Gifted 92 88 90 86 28 Time Sampling 84 94 Scorer Average 99 99 93 93
© Copyright 2025