DMDIBELS Next Development Team ® • What Is Next for Benchmark Goals? • • National Association of School Psychologists Annual Conference San Francisco, CA February 25, 2011 Kelly Powell-Smith Dynamic D i M Measurementt G Group, IInc. Lisa Habedank Stewart Minnesota State University Moorhead • Roland H. Good III Dynamic Measurement Group, Inc. University of Oregon Elizabeth N. Dewey y Measurement Group, Inc. Dynamic • San Francisco, CA February 25, 2011 1 Executive Directors o Ruth Kaminski o Roland Good Research Scientists o Kelli Cummings o Kelly A. A Powell Powell-Smith Smith o Stephanie Stollar Professional Development Specialists o Kathleen Petersen o Alisa Dorman Project Manager o Josh Wallin Graphic Designer o Karla K l Wysocki W ki February 25, 2011 Kristen MacConnell Amy Murdoch Alecia Rahn-Blakeslee Kristin Orton • Thank you also to all of the school, teachers, students who worked to contribute to our knowledge of DIBELS Next. Next February 25, 2011 San Francisco, CA • Data Team o Beth Dewey o Rachel Latimer o Maya O’Neil Support Staff o Dan Cohn o Laura Collins o Jeff Heriot o Sarah S h Laszlo L l • San Francisco, CA 2 1. Research Validated Benchmark Goals and Cut Points for Risk. Benchmark goals and cut points for risk are empirically validated lid t d b based d on th the odds dd off achieving hi i ffuture t reading di goals. 2. Research Based DIBELS Composite Score. The DIBELS Composite Score combines multiple DIBELS Next scores into a single composite that best predicts and measures outcomes. 3. Extraordinary Control of Text Readability. Passages at each grade level are developed, researched, and arranged to provide maximum control of text difficulty difficulty. 4. DIBELS Instructional Grouping Worksheets, DIBELS Survey. 5. And, First Sound Fluency replaces Initial Sound Fluency; NWF-Whole Words Read and Daze are added. And more! • A special p thank yyou to all of the site coordinators who made our benchmark goal study possible. They ensured fidelity to the research design, made sure all research activities were timely timely, and did a wonderful job of submitting data in time for analyses. Lisa Habedank Stewart Research Assistants o Annie Hommel o Doug Rice o Katherine Schwinler o Karla Wysocki Significant Advances in Reading Assessment to Inform Instruction Special Thank You Mary y Giboneyy • 3 February 25, 2011 San Francisco, CA 4 Participants Benchmark Goal Study R Research hQ Questions ti • The Benchmark Goal Study was designed to address three research questions: 1. What levels of performance on DIBELS Next assessments predict a student is likely to score at or above the 40%ile on selected outcome measures? 2. What levels of performance on DIBELS Next assessments predict a student is unlikely to score at or above the 40%ile on selected outcome measures? 3 What are the correlations between DIBELS Next 3. assessments and the Group Reading Assessment and Diagnostic Evaluation (GRADE), a criterion measure of reading proficiency that includes comprehension? February 25, 2011 San Francisco, CA • Students recruited for the study were from 13 schools in five school districts representing five US regions. • Participating school districts had a median of 10 years experience using DIBELS. • Kindergarten through 6th grade students participated in DIBELS Next assessments (n = 3,816 total; 433 to 569 per grade). The percentage at benchmark ranged from 65% 79% across grades and times of year. • Subsamples of students participated in testing with an external criterion measure (Group Reading Assessment and Diagnostic Evaluation; GRADE) (n = 1257 total; 103 to 219 per grade). The GRADE subsample was 50% female on average across grades grades. 5 February 25, 2011 San Francisco, CA 6 7 February 25, 2011 San Francisco, CA 8 Participant Demographics Fi Figure 1 1. R Racial/Ethnic i l/Eth i B Background k d February 25, 2011 San Francisco, CA Measures: DIBELS Next • The measures included all DIBELS Next assessments. DIBELS Next assessments include: • Letter Naming Fluency • First Sound Fluency • Phoneme Segmentation Fluency • Nonsense Word Fluency Correct Letter Sounds and Whole Words Read • Oral Reading Fluency Words Correct, Accuracy, and Retell. • Daze Adjusted Score (DIBELS-maze) • DIBELS Composite Score February 25, 2011 San Francisco, CA 9 February 25, 2011 Measures: Group Reading Assessment and Di Diagnostic ti Evaluation E l ti (GRADE) • Un-timed and group administered. Appropriate for students in preschool through grade 12 • Five components and 16 subtests. Subtests combine to form the following composites: 10 Procedures: Data Collection • All Data were collected during g the 2009-2010 school year • DIBELS Next assessments were administered at regular benchmark intervals by trained school personnel using standardized procedures. • Phonemic Awareness, Early Literacy Skills, Comprehension, Vocabulary, and Total Test. • • We used the Total Test Raw Score for analyses. • San Francisco, CA The GRADE has excellent reliabilityy and validityy for its intended purposes. • Reliability ranges from .77 to .98. G GRADE testing occurred in the spring at the end off the year and was conducted across two to three sessions. Total testing g time ranged g from 60 to 90 minutes. The GRADE was administered by trained school personnel and onsite coordinators. • Correlation coefficients range from .69 to .86 with other groupand individually-administered achievement tests. February 25, 2011 San Francisco, CA 11 February 25, 2011 San Francisco, CA 12 Fourth Grade Reading Outcomes on the 2007 National Assessment of Educational Progress What is the Purpose of Benchmark Goals and dS Screening i ffor Ri Risk k iin Ed Education? ti ? Different standards, procedures, and requirements are necessary if our purpose is: 1. To quickly identify students that are likely to need additional support to prevent later academic difficulty. • To specify important and meaningful future goals—a level of skills at a point in time where we can change the odds to being in favor of an individual’s meeting subsequent goals. Basic We are troubled by the purpose of identifying true Tier 3 students. We thi k th think the ffuture t iis nott set. t Ti Tier 3 iis not a characteristic of the student. There are no true Tier 3 students. pp Tier 3 is a level of support necessary for the student to make adequate progress. No fate but what we make. Our purpose is to prevent reading difficulty and enhance reading outcomes by providing targeted, differentiated instruction early. February 25, 2011 Skill Level 2. To accurately identify students who are the true Tier 3 students or who have a true learning disability early. San Francisco, CA 13 34% 54%, 51%, 49%, 50% 68% 86%, 83%, 80%, 83% Basic denotes ppartial masteryy of prerequisite knowledge and skills that are fundamental for proficient work at a given g grade. Note: Students from diverse backgrounds includes students identified as Black, Hispanic, American Indian/Alaska Native, and eligible for free/reduced-price school lunch. From 14 data reported in Lee, Grigg, & Donahue (2007). Building Outcomes Futures Driven by Changing Decisions Odds • Adequate reading skills should generalize across different state, national, and published reading tests. • Adequate Ad t reading di skills kill are nott a normative ti d decision, i i b butt are a socio-political judgment. • The 40th p percentile or above on a high g q quality, y, nationally y norm-referenced test can serve as an approximation for adequate reading performance. • Students St d t att or above b th the 40th percentile til on a hi high h quality, nationally norm-referenced test are on track to be rated Basic or above on NAEP. • We used the Group Reading Assessment and Diagnostic Evaluation (GRADE) in our initial research to provide an initial approximation of adequate reading skills. San Francisco, CA N ti (public) Nation ( bli ) percentt of fourth grade students from diverse backgrounds scoring below (pp. 54 & 57) Proficient Proficient represents solid academic performance. Students reaching this level have demonstrated competency over challenging subject matter. Goal: Adequate Reading Skills February 25, 2011 Skill level definition National N ti l (public ( bli school) percent of fourth grade students scoring below (pp. 16, 52-53) Beginning of Year Benchmark Beginning of Year Cut Point for Risk Sep 15 February 25, 2011 Oct At or Above Benchmark Odds are 80% to 90% g Subsequent q of Achieving Early Literacy Goals Below Benchmark Odds are 40% to 60% of Achieving Subsequent Early Literacy Goals Well Below Benchmark Odds are 10% to 20% of Achieving Subsequent Early Literacy Goals Nov Dec Jan Feb Mar San Francisco, CA Apr End of Year Benchmark Goal End of Year Cut Point for Risk May 16 Evidence Base, Score Level, Likely Need for Support Outcome: Looking back DIBELS Composite Scores Provided the Framework Screening: Looking forward • DIBELS Composite Scores formed the backbone of the system of benchmark goals and cut points for risk risk. • First, linked end of year DIBELS Composite Score to the end of year GRADE. • Second, linked middle of year DIBELS Composite Score to the end of year DIBELS Composite Score. • Third, Third linked beginning of year DIBELS Composite Score to the middle of year DIBELS Composite Score. • Fourth, linked individual measures to the next DIBELS Composite Score. – For example, individual beginning of year measures were linked to the middle of year DIBELS Composite Score. Odds off achieving Odd hi i Lik l need Likely d for f supportt to t subsequent early achieve subsequent early literacy goals Score level literacy goals Likely to Need Core Support 80% to 90% At or Above Benchmark scores att or above b th the benchmark goal Likely to Need Strategic Support 40% to 60% Below Benchmark scores below the b benchmark h k goall and d att or above the cut point for risk Likely to Need Intensive Support 10% to 20% Well Below Benchmark scores below the cut point f risk for i k The fundamental rationale for benchmark goals and screening decisions i b is based d on th the odds dd off achieving hi i subsequent b t early l lit literacy goals. l February 25, 2011 17 San Francisco, CA DIBELS Composite Score For Example p Third Grade. Benchmark Goal: 220 DORF Words Correct 99 39 16 78 64 • DIBELS Composite Score explains more variance in reading outcomes than DORF Words Correct alone. • Median 9% more, range 3% to 17%. • DORF Words Correct alone is good, DIBELS Composite Score is better. 104 345 San Francisco, CA 19 February 25, 2011 18 San Francisco, CA DIBELS Composite Score Research Rationale 98 February 25, 2011 February 25, 2011 Grade and Time of Year Grade 1 Middle of Year Grade 1 End of Year Grade 2 Beginning of Year Grade 2 Middle of Year Grade 2 End of Year Grade 3 Beginning of Year Grade 3 Middle of Year Grade 3 End of Year Grade 4 Beginning of Year Grade 4 Middle of Year Grade 4 End of Year G d 5B Grade Beginning i i off Y Year Grade 5 Middle of Year Grade 5 End of Year Grade 6 Beginning of Year Grade 6 Middle of Year Grade 6 End of Year San Francisco, CA DORF Words Correct Predicting GRADE Total 0.64 0.75 0.69 0.76 0.73 0.66 0.67 0 66 0.66 0.76 0.76 0.75 0 69 0.69 0.64 0.66 0.64 0.59 0.61 DIBELS Additional Composite Variance Score Explained Predicting by DIBELS GRADE Composite Total Score 0.70 8% 0.77 4% 0.75 8% 0.80 5% 0.75 3% 0.73 10% 0.78 15% 0 75 0.75 13% 0.80 5% 0.80 6% 0.80 8% 0 76 0.76 11% 0.76 17% 0.77 17% 0.71 9% 0.68 12% 0.73 16% 20 Primary Design Specifications for DIBELS Goals and Cut Points for Risk DIBELS Composite Score Ed Educational ti l Rationale R ti l 99 39 16 78 64 98 104 345 • Reading at an adequate rate. Reading orally for meaning. • Reading silently for meaning. With a high degree of accuracy. • Students who are at or above benchmark on the DIBELS Composite Score are reading for meaning at an adequate rate and with a high degree of accuracy accuracy. February 25, 2011 San Francisco, CA 21 San Francisco, CA 22 • Other considerations – Receiver Operator Characteristic Curve (ROC) analysis with large area under curve – Other metrics for decision utility • sensitivity, • specificity, • percent correct classification, • kappa – Coherent pattern of goals across measures and grades. • Marginal g p percents for the p predictor close to marginal g percents for the outcome. – The sample for the Benchmark Goal Study was a relatively l ti l hi high h performing f i sample. l – We tried have them appear equally high performing on DIBELS Next and the GRADE. • Logistic Regression Analysis – Logistic regression predicted odds of about 60% or better at the exact goal score. – Logistic regression predicted odds of about 40% or below at the exact cut point for risk score. score San Francisco, CA February 25, 2011 Other Considerations DIBELS Goals and Cut Points Secondary Specifications for B Benchmark h k Goals G l and dC Cutt P Points i t February 25, 2011 Primary Specification: At or Above Benchmark Decision on initial (screening) DIBELS assessment should provide favorable odds (80% -- 90%) of achieving subsequent reading outcomes. outcomes Benchmark Goal should provide a level where we are reasonably confident the student is making adequate progress. Below Benchmark Decision on initial DIBELS assessment should provide 50 – 50 odds (40% -- 60%) of achieving subsequent reading outcomes. Below the Benchmark Goal but above the Cut Point should p provide a zone of uncertainty y where we don’t know if the student is making adequate progress or not. Well Below Benchmark Decision on initial DIBELS assessment should p provide low odds ((10% -- 20%)) of achieving g subsequent q reading outcomes – unless intensive intervention is implemented. Below the Cut Point should provide a zone where we are reasonably confident the student will not make adequate progress -- unless we provide additional support. 23 February 25, 2011 San Francisco, CA 24 Setting Benchmark Goals and Cut Points for Risk Example Analysis Detail 1. Examine scatterplot illustrating the relation between the screening assessment (earlier assessment or predictor) and the outcome assessment (later assessment). – DIBELS is a step-by-step step by step model, so the outcome of one step is the predictor of the next step. 2. Examine the table of counts for each zone of the scatterplot. 3 Primary: 3. Pi C Consider id odds dd off students t d t with ith each h screening i decision achieving goal. 4. Secondary: Consider marginal percents 5. Secondary: Consider logistic regression analysis 6. Other: Consider ROC curve and decision utility metrics 7 Other: 7. Oth Consider C id the th overallll pattern tt off goals l and d cutt points. i t 25 San Francisco, CA February 25, 2011 Third Grade DIBELS Composite Score for Beginning (DCS3b) to Middle of Year (DCS3m) Third Grade DIBELS Composite Score for Beginning g g (DCS3b) ( ) to Middle of Year (DCS3m) ( ) Likely to need intensive support 608 508 At or Above Benchmark D DCS3m 408 4 Likely to need strategic support Likely to need core support 22 324 DCS3b Screening Decision: DCS3m Outcome: At or Above Benchmark Below Benchmark 308 Below Benchmark 20 16 21 70 9 4 204 304 DCS3b Well Below Benchmark M i lT Marginal Total t l Likely to need intensive support 4 20 70 94 Likely to need strategic support 22 16 9 47 Likely to need core support Marginal total 324 21 4 349 350 57 83 490 208 Well 108Below Benchmark • Primary Pi consideration: id i Odd Odds off achieving hi i outcome goal. l • Secondary consideration: Marginal Percents 8 4 February 25, 2011 104 San Francisco, CA 404 504 27 February 25, 2011 San Francisco, CA 28 Also Considered Marginal Percents Primary consideration: Odds of achieving goal DCS3b Screening Decision: DCS3b Screening Decision: Likely to need intensive support DCS3m Outcome: Likely to need strategic support 4 20 70 94 At or Above Benchmark Below Benchmark Well Below Benchmark M i l Total Marginal T t l 22 16 9 47 Likely to need core support Marginal total At or Above Benchmark 324 21 4 349 350 57 83 490 Well Below Benchmark DCS3m Outcome: • Core support beginning of year screening decision: 324 of 349 students achieve the middle of year goal, or 93% odds. • Strategic support: 22 of 47 students achieve the goal, or 47% odds. • Intensive support: 4 of 94 students achieve the goal, or 4% odds. San Francisco, CA February 25, 2011 29 DCS3m 1 0.6 204 304 DCS3b 404 504 0.2 0 86 136 186 350 57 83 490 71% 12% 17% 30 San Francisco, CA February 25, 2011 • Blue diamonds are moving proportion with ith adequate d t outcome. • Red line is logistic regression estimated odds of adequate outcomes. 0.8 February 25, 2011 324 21 4 349 71% Marginal percent Logistic Regression Estimates Odds of Adequate O t Outcomes for f each h Score S 208 104 22 16 9 47 10% Marginal total • Percent At or Above Benchmark at beginning of year is very close to the percent At or Above Benchmark in the middle of the year. • Desirable for the screening decision to identify about the same percent of students that are expected on the outcome. 308 0.4 4 20 70 94 19% Marginal Total 408 4 Core support decision Marginal Percent 508 8 Strategic support decision Below Benchmark 608 Odds: Logistic Moving Regression 108 Intensive support decision 236 1 0.8 60% estimated odds of adequate outcomes for the score exactly at the Benchmark Goal: higher scores, higher odds 0.6 04 0.4 0.2 0 86 136 30% estimated odds of adequate d t outcomes t ffor th the score exactly at the Cut Point for Risk, lower scores, lower 186 odds.236 286 286 San Francisco, CA 31 February 25, 2011 San Francisco, CA 32 DIBELS is a Step-by-Step Model: Beginning g g to Middle;; Middle to End;; • • End of Year Benchmark Goals Mastering each step puts the odds in favor of mastering the next step. – At or Ab Above B Benchmark: h k Odds Odd are generally ll 80% tto 90% of achieving subsequent benchmark goals and important reading outcomes. Student is likely to make adequate progress with effective core instruction instruction. – Below Benchmark: Odds are generally 40% to 60% of achieving subsequent benchmark goals and important reading outcomes outcomes. Student is likely to need strategic support to make adequate progress. – Well Below Benchmark: Odds are generally 10% to 20% of achieving subsequent benchmark goals and important reading outcomes. Student is likely to need intensive support to make adequate progress. Contiguous Continuity. Continuity Each step is a continuous process with a strong linkage. Each step is contiguous with the next step. February 25, 2011 San Francisco, CA 40 33 28 -- 58 13 47 90% 15 87 97% 27 100 97% 30 19 *Word Use Fluency—Revised (WUF-R) is available as an experimental measure from http://dibels.org/. For Nonsense Word Fluency, the first number is the Correct Letter Sounds goal, and the second number is the Whole Words Read Goal. For DIBELS Oral Reading Fluency, the first number is the Words Correct goal, the second number is the Accuracy goal, and the third number is the Retell goal. For example, at the end of second grade a student should be able to read 87 words with 97% accuracy and retell of 27 words relevant to the passage. Third grade benchmark goals are illustrated, benchmark goals for grades 4 through 6 are available. 34 Third Grade DIBELS Composite Score for Beginning off Year Y (DCS3b) and d Middle Middl off Year Y (DCS3m) (DCS3 ) • .91 correlation to DIBELS Composite Score at Middle of Year 4% Odds 47% Odds 93% Odds 608 Middle of Year Goal: 285 508 DCS3 3m 408 308 208 108 8 4 104 204 304 DCS3b 404 504 Beginning of Year Goal: 220 February 25, 2011 San Francisco, CA 35 February 25, 2011 San Francisco, CA 36 Third Grade DIBELS Composite Score for End of Year (DCS3e) and GRADE Total Raw Score (gtotr3e) Third Grade DIBELS Composite Score for Middle off Year Y (DCS3 (DCS3m)) and dE End d off Y Year (DCS3 (DCS3e)) • .90 correlation to DIBELS Composite Score at End of Year 43% 8% Odds Odds • .75 correlation to GRADE Total Raw Score at End of Year 48% 91% Odds 7% Odds Odds 90% Odds 40th Percentile on GRADE 612 End of Year Goal: 330 512 97 87 DCSS3e 412 gtotr3e 77 312 67 212 57 112 47 12 37 12 212 DCS3m 412 612 50 150 250 350 DCS3e Middle of Year Goal: 285 February 25, 2011 37 San Francisco, CA We are troubled by the terminology. We think a “True Positive” is actually a student for whom we were not effective in ruining the prediction. A “False P iti ” iis a student Positive” t d t for whom we have changed the future. .80 60 .60 .40 .20 BenchmarkGoalROC,AUC=.90 CutPointforRiskROC,AUC=.87 .00 00 .20 20 .40 40 .60 60 .80 80 Core support decision True Negative False Negative True Positive False Positive Sensitivity Specificity Negative Predictive Pow er Accurate Classification Kappa Screening Decision Outcome San Francisco, CA 38 At or Above Well Below Benchmark B h k outcome t Benchmark B h k outcome t Positive Predictive Pow er 1 00 1.00 Receiver Operator Characteristic (ROC) curves. February 25, 2011 San Francisco, CA February 25, 2011 Other Decision Utility Metrics E d off Third End Thi d G Grade d 1.00 .00 550 End of Year DIBELS Composite Score Goal: 330 Receiver Operator Characteristic Curve • Larger g area under the curve indicates favorable trade off of sensitivity and specificity specificity. • Decision points in the upper left bend of the curve indicate a favorable balance of sensitivity and specificity specificity. 450 39 February 25, 2011 Role Variable Predictor DCS3e Criterion gtotr3e Goal 330 83 123 14 37 13 .73 .90 .90 .74 .86 .63 Intensive support decision 134 26 25 2 .49 .99 .84 .93 .85 .56 Core support decision 132 5 18 32 .78 .80 .96 .36 .80 .39 Intensive support decision 154 6 17 10 .74 .94 .96 .63 .91 .63 Cut Point Description 280 DIBELS Composite Score, Grade 3, End of Year 71 GRADE Total Test, Grade 3, End of Year San Francisco, CA 40 Caveats for Use: Early a y Intervention te e t o a and d Prevention e e to are Active Ingredients Caveats for Use • DIBELS Next Benchmark Goals and Cut Points for Risk are specific to the DIBELS Next assessments assessments, passage difficulty, and readability. – Alternative passages of a lower level of difficulty will require higher benchmark goals. – Alternative passages of a higher level of difficulty will require a lower benchmark goal goal. • You cannot use DIBELS Next Benchmark Goals with other progress monitoring passages. Each set of passages must conduct their own research on benchmark goals. February 25, 2011 San Francisco, CA • Th The effectiveness ff ti off the th school-wide h l id system t off instruction can change the odds. – Differences in the effectiveness of Tier 1 instruction and Tier 2 & 3 intervention change the underlying relation between screener and outcome. – Less effective school-wide school wide system Tier 1 instruction can decrease the odds of achieving subsequent early literacy goals for students who are at or above benchmark. – Increasing the effectiveness of Tier 2 & 3 intervention can increase the odds of achieving subsequent early literacy goals for students who are at risk risk. 41 February 25, 2011 Building Futures San Francisco, CA 42 Implications and Discussion • Key Point: The student’s outcome is unknown and not fixed at the time of the screening. Instead, the outcome is the result of the targeted targeted, differentiated instruction and intervention we provide as a direct result of the screening information. • Our instructional goal is to ruin screening predictions • For Example: If a child screens as at high risk on a measure of early literacy skills in Kindergarten, Kindergarten we know they are likely to need additional instructional support to be successful. Their later outcome, their reading skills in first grade for example, are a direct result of the targeted, differentiated instruction and early intervention that we provide. February 25, 2011 San Francisco, CA 1. Using a composite is new way of using DIBELS Next d t to data t improve i our d decision i i making ki • Understand where the cut points and goals come from and what theyy mean • Composites represent a more complete sample of behavior, better at almost every grade and time of year • “The beauty of the DIBELS Composite score is that it allows for easy and meaningful integration of information” • Represents accuracy, rate and meaning 43 February 25, 2011 San Francisco, CA 44 Implications and Discussion, Cont’d Cont d Implications and Discussion, Cont’d Cont d 2. Some caveats • Still a screening measure • use yep, yep, huh? test and validate as needed • For instructional planning, look at individual test scores (e.g., K, DORF) February 25, 2011 San Francisco, CA 3. How should we evaluate and think about screening g measures? our job in the schools is to “ruin” the predictions for at risk students. – How H would ld we kknow th the “t “true”” sensitivity iti it and d specificity of a measure? – At a practical level, this highlights the importance of a Tier Effectiveness Report or similar reports that track students over time 45 February 25, 2011 Implications and Discussion, Cont’d Cont d San Francisco, CA 46 What Questions Are There? 4. Study y bonus: Additional evidence of the strong g correlation between CBM/DIBELS type measures and lengthier paper pencil measures of reading skills and comprehension in this case the GRADE comprehension, – More Info: Technical Manual, NASP poster – Personal Note: giving the GRADE in K was quite an experience! February 25, 2011 San Francisco, CA 47 February 25, 2011 San Francisco, CA 48
© Copyright 2024