Informational Significance of A-F Grades 1 RUNNING HEAD: Informational Significance of A-F Grades The Informational Significance of A-F School Accountability Grades Curt M. Adams Patrick B. Forsyth Jordan K. Ware University of Oklahoma, Oklahoma Center for Education Policy Mwrumba Mwavita Oklahoma State University, Center for Educational Evaluation and Research Corresponding Author: Curt M. Adams, 4502 E. 41st Street, Tulsa, OK 74135; 918.671.9637; [email protected]. Curt M. Adams is an associate professor in the department of Educational Leadership and Policy Studies at the University of Oklahoma, and co-director of the Oklahoma Center for Education Policy. Recent publications include the journal article, Self-Regulatory Climate: A social resource for student regulation and achievement (Teachers College Record, 2015), and the book Quantitative Research in Education: A Primer, with Wayne Hoy (SAGE, 2015). Patrick B. Forsyth is professor of education at the University of Oklahoma, co-director of the Center for Education Policy, and former executive director of UCEA and The National Policy Board for Educational Administration. Recent books include Trust and School Life (2014, Springer) with D. Van Maele and M. Van Houtte, and Collective Trust (2011, Teachers College Press) with C.M. Adams and W.K. Hoy. Jordan Ware is a post-doctoral social scientist at the University of Oklahoma and the Oklahoma Center for Education Policy where he directs the project to measure and improve school capacity. He studies neighborhood poverty and its effects on learning and development. Mwarumba Mwavita is an assistant professor of Research, Evaluation, Measurement, and Statistics at Oklahoma State University. He is also Director of the Center for Educational Research and Evaluation. Description: This study evaluates the information significance of Oklahoma A-F school accountability grades relevant to the policy objective of achievement equity. In press with Teachers College Record. Do not cite without permission of the author. Informational Significance of A-F Grades 2 The Informational Significance of A-F School Accountability Grades Abstract Background/Context. Despite problems with accountability systems under No Child Left Behind, the policy has been widely commended for exposing the depth and breadth of educational inequality in the United States. As states implement new accountability systems, there is growing concern that attention to achievement gaps and the performance of marginalized children has faded. Many approved accountability plans no longer report achievement by student subgroups or include subgroup performance in the calculation of accountability indicators. Research Purpose. This study examined the informational significance of Oklahoma’s A-F accountability grades relative to the policy objective of achievement equity. Informational significance as explained in self-determination theory provided a framework to explore the usefulness of an A-F grade for understanding achievement differences within and between schools. Research Design. We evaluated the informational significance of Oklahoma A-F school grades by analyzing reading and math test scores from over 25,000 students in 81 elementary and middle schools. The study was designed to address two questions: Do students in “A” and “B” schools have high average achievement and small achievement gaps compared to students in “D” and “F” schools? What is the difference in average achievement and achievement gaps between school grades when holding constant contextual school conditions? Results. We found test score gaps attributed to Free and Reduced Lunch qualification and minority status. Free and Reduced Lunch and minority students average about one standard deviation lower in math and reading than their peers. Test score gaps varied across A-F school grades with the largest gaps existing in “A” and “B” schools. HLM results showed that A-F grades do not differentiate schools by effectiveness levels. For reading, we did not find statistically significant main effects attributed to letter grades. For math, the only statistically significance difference was between students in “A” and “B” schools and students in “F” schools. This difference had a small effect size. School grades did moderate achievement gaps, but gaps moved in a direction opposite from what would be desired of an accountability system that measured achievement equity. Conclusions. Progress made under NCLB in exposing achievement inequity in the U.S. has taken a step back with Oklahoma’s A-F school grades. Our evidence suggests that a composite letter grade provides very little meaningful information about achievement differences. Informational Significance of A-F Grades 3 Executive Summary As states implement new accountability systems, there is growing concern that attention to achievement gaps and the performance of marginalized children has faded (Ayers, 2011; Hall, 2013). Initial evidence shows that concerns raised from document analysis of waiver applications are legitimate. Ushomirsky, Wiliams, and Hall (2014) found that schools in Florida, Kentucky, and Minnesota earned high effectiveness rankings despite low test performance from minority and low-income students. In some cases, minority and low-income students in lower ranked schools outperformed peers in higher ranked schools. Oklahoma uses an A-F ranking system as the basis of State recognition and intervention. High letter grades, A’s and B’s, lead to rewards and public acclaim whereas low grades, D’s and F’s, impose mandated turnaround plans on schools. Although the A-F grading system was not designed specifically to measure achievement gaps, it is assumed that ranking schools by letter grades can support efforts to equalize achievement distributions. In this study we examined the informational significance of letter grades relative to the policy objective of achievement equity. The purpose was to determine if students in “A” and “B” schools had high average achievement and smaller achievement gaps compared to students in “D” and “F” schools, and to explore differences in average achievement when holding constant conditions that schools do not control. Improvement Assumptions and Accountability Test-based accountability assumes that external contingencies (e.g. threats or rewards) have instrumental value in reinforcing desired actions and outcomes (Ryan & Deci, 2002, 2012; Polikoff, McEachin, Wrabel, Duque, 2014). The belief is that threats and sanctions can provoke internal change by eliciting the collective will to make instructional systems efficient and effective (Mintrop & Sunderman, 2009). Accordingly, schools falling short of measurable objectives encounter public scorn, and face mandated improvement plans, loss of students through choice options, prescribed reform models, reconstitution, or in some cases closure (Sahlberg, 2008). In the design of test-based accountability, a body of divergent research findings on external control was either dismissed by policy makers or not known. The weight of the psychological evidence indicates that contingent reinforcement withers under the strain of complex, conceptual tasks like teaching and learning (Ryan & Deci, 2002; Ryan & Weinstain, 2009). Performance information used as an external control is inimical to work that involves professional discretion, adaptation, and cooperation among interdependent groups (Forsyth, Adams, & Hoy, 2011). An alternative theoretical lens used to explain optimal individual and group performance may well explain why achievement equity eluded NCLB and how performance information can play a supportive role in closing achievement gaps. The crucial adjustment is with the locus of causality, switching from external to internal motivators. Self-determination theory informs this adjustment. Effective accountability systems informed by self-determination theory depend on the functional significance of performance indicators. Functional significance is defined as the meaning and worth individuals place in an object, experience, or event (Ryan & Deci, 2012). The functional significance of accountability indicators can be informational or controlling (Ryan & Weinstein, 2009). Accountability indicators that have informational significance stand a better chance of generating the energy and capacity to narrow persistent achievement gaps Informational Significance of A-F Grades 4 (Ryan & Deci, 2012; Ryan & Weinstein, 2009). Informational significance comes from the diagnostic value associated with a performance indicator (Ryan & Weinstein, 2009). Clear and accurate information about learning processes and outcomes is needed to generate knowledge about student performance; this knowledge in turn can drive improvement decisions and actions. It is hard to see how appropriate action can be taken to close achievement gaps without first knowing how achievement varies within schools. If consequential decisions and actions are based on accountability indicators, the indicators should provide enough information to understand differences in student achievement. Findings We found higher average reading and math scores in “A” and “B” schools compared to “C”, “D”, and “F” schools, but test scores were not equally distributed within letter grades. The largest achievement gaps were in schools ranked as the most effective. FRL and minority students in “A” and “B” schools had average reading and math achievement below the overall sample mean and in some cases not different from FRL and minority students in schools with lower letter grades. In “A” and “B” schools, FRL students had an average reading score about .32 standard deviations below the mean. This average score was similar to average reading score of -.35 for FRL students in “D” schools and an average reading score of -.40 for FRL students in “F” schools. Average performance of FRL students was nearly equivalent across letter grades. HLM results raise additional concerns about the informational significance of A-F grades. After removing achievement variance attributed to factors unrelated to teaching or school effectiveness, letter grades were unable to differentiate schools by average student achievement. In reading, average test scores in “A”, “B”, “C”, and “D” schools were similar. Math results were not much different than reading. Perhaps the most troubling finding was that “A” and “B” schools were least effective for poor, minority children, while “D” and “F” schools were most effective. Conclusion Progress made under NCLB in exposing achievement inequity in the US has taken a step back with Oklahoma’s A-F school grades. Our evidence suggests that a composite letter grade does not provide a clear signal or simple interpretation of achievement differences within and between schools. No meaningful information about achievement gaps can be obtained from a letter grade. We cannot conclude, for instance, that “A” and “B” schools have high average and equitable achievement. We also cannot conclude that FRL and minority students in “D” and “F” schools perform worse on average than peers in higher ranked schools. Herein lies a fundamental problem with the informational significance of A-F accountability grades: grades do not provide the right information to understand achievement patterns in schools. Without knowledge of test score differences, it is hard to see how appropriate action can be taken to improve learning outcomes. Informational Significance of A-F Grades 5 The Informational Significance of A-F School Accountability Grades Despite problems with accountability systems under No Child Left Behind (NCLB), the policy has been widely commended for exposing the depth and breadth of educational inequality in the United States. Achievement equity remains a target of accountability systems approved for NCLB waivers. As of February 2014, 42 states, Washington, DC, and eight school districts in California were operating under approved NCLB flexibility waivers. At a minimum, approved accountability systems need to set ambitious Annual Measurable Objectives (AMO) in reading/language arts and math, to recognize “reward” schools and identify low performing schools as “priority” and “focus” schools, to measure student growth, to support efforts to close achievement gaps, and to support “priority” and “reward” schools in building capacity and improving performance (US Department of Education, 2012). As states implement new accountability systems, there is growing concern that attention to achievement gaps and the performance of marginalized children has faded (Ayers, 2011; Hall, 2013). Many approved accountability plans no longer report achievement by student subgroups or include subgroup performance in the calculation of accountability indicators (CEP, 2012; Domaleski & Perie, 2013; Polikoff, McEachin, Wrabel, & Duque, 2014). Some states have opted to combine historically marginalized students into a “super subgroup,” to use growth of the bottom 25% of students in a school to satisfy the achievement gap reporting requirement, and to evaluate school performance with a composite indicator (Ayers, 2011; McNeill, 2012). These changes have the potential to produce a performance indicator that effectively hides achievement disparities within and between schools (Hall, 2013). Initial evidence shows that concerns raised from document analysis of waiver applications are legitimate. Ushomirsky, Wiliams, and Hall (2014) found that schools in Florida, Informational Significance of A-F Grades 6 Kentucky, and Minnesota earned high effectiveness rankings despite low test performance from minority and low-income students. In some cases, minority and low-income students in lower ranked schools outperformed peers in higher ranked schools. We intensify the scrutiny of new accountability systems by examining the informational significance of Oklahoma’s A-F school grades. Oklahoma uses an A-F ranking system as the basis of State recognition and intervention. High letter grades, A’s and B’s, lead to rewards and public acclaim whereas low grades, D’s and F’s, impose mandated turnaround plans on schools. Although the A-F grading system was not designed specifically to measure achievement gaps, it is assumed that ranking schools by letter grades can support efforts to equalize achievement distributions. This study examined the informational significance of letter grades relative to the policy objective of achievement equity. Informational significance as explained in self-determination theory provided a framework to explore the usefulness of an accountability grade for understanding achievement differences within and between schools. This study does not measure the validity of inferences based on a letter grade. Rather, the purpose was to determine if students in “A” and “B” i schools had high average achievement and smaller achievement gaps compared to students in “D” and “F” schools, and to explore differences in average achievement when holding constant conditions that schools do not control. Test-Based Accountability and Achievement Equity We use a narrow definition of achievement equity in this study, focusing on racial and economic test score gaps that many reform policies target (Lee, 2006, 2008; Fusarelli, 2004). Under NCLB, test-based accountability became the primary policy instrument to redress achievement disparities (Lee, 2008). As it turns out, the complexity of achievement equity exceeded the structure and function of test-based accountability (Lee, 2006; Harris, 2011; Informational Significance of A-F Grades 7 Mintrop & Sunderman, 2009). Trend data from the National Assessment of Educational Progress (NAEP) indicate that progress made in the 1970’s and 1980’s in narrowing achievement gaps stalled from 1990 – 2004 (Lee, 2006; Mintrop & Sunderman, 2009; Rothstein, Jacobson, & Wilder, 2008) and from 2004-2012 (National Center for Education Statistics, 2013). Additionally, poverty gaps actually increased in two thirds of the states over the last decade (Quality Counts, 2014). Several explanations exist for persistent test score gaps. Differences in learning opportunities, resource disparities, school capacity, and teaching quality partly explain lower average achievement of poor and minority students (Barton & Coley, 2010). More specific to this study is the quality and use of accountability indicators. Test-based accountability under NCLB used simplistic performance indicators in a targeted way: to reward schools meeting yearly objectives and to sanction schools falling short of annual achievement targets (Harris, 2011; Mintrop & Sunderman, 2009). Persistent achievement gaps during the NCLB era raise questions about the improvement assumptions of test-based accountability (Ryan & Weinstain, 2009). Improvement Assumptions and Accountability Test-based accountability assumes that external contingencies (e.g. threats or rewards) have instrumental value in reinforcing desired actions and outcomes (Ryan & Deci, 2002, 2012; Polikoff, McEachin, Wrabel, Duque, 2014). The belief is that threats and sanctions can provoke internal change by eliciting the collective will to make instructional systems efficient and effective (Mintrop & Sunderman, 2009). It is assumed that school actors, when confronted with punitive consequences for low achievement, will take action that leads to better outcomes (O’Day, 2002). Accordingly, schools falling short of measurable objectives encounter public Informational Significance of A-F Grades 8 scorn, and face mandated improvement plans, loss of students through choice options, prescribed reform models, reconstitution, or in some cases closure (Sahlberg, 2008). Test-based accountability relies on accountability indicators to identify underperforming schools so that pressure or sanctions can induce school actors to improve learning opportunities. Agency and expectancy theories have been used to explain how accountability systems work from outside of schools to bring about change within them (Polikoff, McEachin, Wrabel, Duque, 2014; Ryan & Weinstein, 2009). As suggested by these frameworks, clear achievement standards and accurate performance indicators function as an external motivator for goal attainment. Agency theory assumes that accountability information is a mechanism used by principals (i.e. school administrators, community members, legislators, tax payers, etc.) to ensure school agents (i.e. teachers) deliver student achievement (Polikoff, McEachin, Wrabel, & Duque, 2014). Expectancy theory assumes that rewards and threats motivate teachers to improve achievement as long as standards are clear and performance information is accurate (Finnigan & Gross, 2007). Through an agency and expectancy lens, the legitimacy and trustworthiness of accountability indicators affect the behavioral response of school members. In the design of test-based accountability, a body of divergent research findings on external control was either dismissed by policy makers or not known. The weight of the psychological evidence indicates that contingent reinforcement withers under the strain of complex, conceptual tasks like teaching and learning (Ryan & Deci, 2002; Ryan & Weinstain, 2009). Performance information used as an external control is inimical to work that involves professional discretion, adaptation, and cooperation among interdependent groups (Forsyth, Adams, & Hoy, 2011). An alternative theoretical lens used to explain optimal individual and group performance may well explain why achievement equity eluded NCLB and how Informational Significance of A-F Grades 9 performance information can play a supportive role in closing achievement gaps. The crucial adjustment is with the locus of causality, switching from external to internal motivators. Selfdetermination theory informs this adjustment. Self-Determination Theory Applied to Accountability The fundamental assumption of self-determination theory is that individuals are inherently oriented toward growth and goal fulfillment (Ryan & Deci, 2002). Accordingly, the drive and determination to excel are internal, primal states that require nurturing not coercive control by external mechanisms. External mechanisms, like the use of performance information, fuel motivation and effective behavior by satisfying the innate psychological needs of autonomy, competence, and relatedness (Adams, Forsyth, Dollarhide, Miskell, & Ware, 2015; Ryan & Deci, 2012). Here autonomy does not mean independence; rather, it is a cognitive belief embodied in the volitional and purposeful action of individuals (Williams, 2002). Competence is feeling effective in one’s task and having confidence in one’s ability to execute actions required to achieve a challenging outcome. Relatedness comes from supportive social connections that foster feelings of belonging and psychological security within a group or organization (Reeve, 2002). Schools bring to life teaching and learning when structures and processes promote interactions that support autonomy, competence, and relatedness (Reeve & Jang, 2006). On the other hand, teaching and learning become uninspired and stale when psychological needs are thwarted by controlling structures (Reeve & Halusic, 2009). Performance indicators can be used to build professional capacity, but to do so the information needs to be used in ways that enhance the desire, creativity, and energy of teachers and students to press for academic excellence (Ryan & Weinstein, 2009). Informational Significance of A-F Grades 10 NLCB was not designed to build professional capacity; its intent was to hold schools accountable for past results. Paradoxically, NCLB added considerable noise, confusion, and dysfunction to many of the low performing, high need schools it promised to reform (Moller, 2008; O’Day, 2002; Sirotnik, 2005). When considered through self-determination theory, the anemic performance of accountability is not surprising (Ryan & Weinstain, 2009). Accountability indices derived from aggregated test scores cover a very thin slice of the performance pie. The complexity of school work, more directly the complexity of teaching and learning, far exceeds the narrow parameters of a composite achievement index compiled from curricular tests administered at one occasion during the school year (Moller, 2008). Low quality accountability information used to trigger sanctions constrict meaningful learning opportunities, hinder innovation and risk-taking, and undermine motivation (Sahlberg, 2008). Effective accountability systems informed by self-determination theory depend on the functional significance of performance indicators. Functional significance is defined as the meaning and worth individuals place in an object, experience, or event (Ryan & Deci, 2012). The functional significance of accountability indicators can be informational or controlling (Ryan & Weinstein, 2009). Using indicators as an external control has arguably diminished innovation, creativity, and joy of teaching and learning in the very schools that NCLB intended to reform (Darling-Hammond, 2006; Feuer, 2008; Sunderman & Kim, 2005). From a controlling perspective, NCLB accountability systems failed to generate the legitimacy and trustworthiness needed for performance indicators to be used as a tool to reform schools. Accountability indicators that have informational significance stand a better chance of generating the energy and capacity to narrow persistent achievement gaps (Ryan & Deci, 2012; Ryan & Weinstein, 2009). Informational significance comes from the diagnostic value Informational Significance of A-F Grades 11 associated with a performance indicator (Ryan & Weinstein, 2009). Clear and accurate information about learning processes and outcomes is needed to generate knowledge about student performance; this knowledge in turn can drive improvement decisions and actions. It is hard to see how appropriate action can be taken to close achievement gaps without first knowing how achievement varies within schools. If consequential decisions and actions are based on accountability indicators, the indicators should provide enough information to understand differences in student achievement. Informational significance, as understood in self-determination theory, is the standard by which we evaluate the performance of Oklahoma’s A-F grades. We do not test the construct validity or reliability of letter grades; instead, our concern is with the ability of the grade to yield meaningful and useful information about achievement differences between student groups. Our assessment of informational significance is based on the degree to which school grades reflect achievement gaps within and between schools. This purpose stands in contrast to validity studies that use theory and empirical evidence to evaluate the ability of a measure to yield truthful judgments about the object it purports to measure (Messick, 1995; Miller, 2008). The State may have not have intended for A-F letter grades to specifically measure achievement gaps, but the policy objective is to improve achievement equity. For achievement equity, we evaluate informational significance by what a letter grade reveals about achievement gaps for FRL and minority students. To keep attention focused on high achievement for all students, a composite letter grade must reflect test score differences within and between schools (Linn, 2005, 2008). High grades, “A’s” and “B’s”, logically suggest strong achievement for all students in all subject areas. Low grades largely suggest lower average achievement and large achievement gaps. If A-F grades reflect subgroup differences, Informational Significance of A-F Grades 12 they may have value for equalizing achievement outcomes. If they do not, they fail the test of informational significance. Method Our purpose was to assess the informational significance of Oklahoma’s school grades in related to the policy objective of closing achievement gaps within and between schools. As such, we do not evaluate the validity of letter grades for judgments about achievement equity. The objective was to evaluate the usefulness of Oklahoma’s A-F grades for understanding achievement differences. Grades should be meaningful for determining the average achievement of all students and student subgroups. We asked two questions: First, do students in “A” and “B” schools have high average achievement and small achievement gaps compared to students in “D” and “F” schools? And second, what is the difference in average achievement and achievement gaps between school grades when holding school context constant? Composition of Oklahoma’s A-F School Grades Oklahoma uses a single letter grade as an indicator of Annual Measurable Objectives (AMO), to classify “reward,” “priority,” or “focus” schools, and to rank schools by effectiveness (Oklahoma State Department of Education, 2012). School grades for the 2012-1013 school year were calculated using a formula that converts test scores into categorical data, categorical data back into a continuous index, and a continuous index into a summative letter grade. The final composite grade is derived from two components: (1) student achievement and (2) student growth (Ayers, 2011; OCEP & CERE, 2012; Oklahoma State Department of Education, 2014). The student achievement component makes up 50% of the school grade. Student test scores from state math, reading, science, and social studies exams are used to calculate a school’s Performance Index (PI). The PI is calculated from a simple binary scale. Students scoring Informational Significance of A-F Grades 13 below proficiency for each tested subject are assigned a zero and students who score proficient or above are assigned a one. The total score for all tested subjects is divided by the total number of tests taken to calculate the PI score for a school. This produces a PI score ranging from 0-100. The PI score is then multiplied by .50 for the calculation of the final composite grade (Oklahoma State Department of Education, 2014). Student growth makes up the other 50 percent of the composite letter grade. Only math and reading/English exams are used for the growth index. Growth is composed of overall student growth (25%) and growth of the bottom quartile of students in a school (25%). For both components, the growth score is calculated by first calculating the total number of students in the school who either scored proficient/advanced for both testing periods, who increased a proficiency level in the current testing period, or who showed a growth in the test score that was above the state average for growth. This number is then divided by the total number of eligible students to arrive at an overall growth index that ranges from 0-100. The overall growth index is then multiplied by .25 for the calculation of the composite school grade. Growth of the bottom quartile is similarly calculated (Oklahoma State Department of Education, 2014). Up to 10 bonus points are awarded to schools based on attendance rates, advance course participation, dropout rates, and return rate of parent “climate” surveys. The PI score, student growth, and bonus points are summed to arrive at an overall Index score that ranges from 0-100. Index scores between 90-100 receive an A, 80-89 a B, 70-79 a C, 60-69 a D, and below 59 an F (Oklahoma State Department of Education, 2014). Data Source Analyses were based on 2012-2013 reading and math test scores of over 25,000 students from 81 urban, urban fringe, and suburban elementary and middle schools. Schools were Informational Significance of A-F Grades 14 sampled from three contiguous districts in a single metropolitan area. Achievement data were used from students in 3rd, 4th, 5th, 6th, 7th, and 8th grades. Table 1 contains descriptive data for the sample of students and schools. Valid math scores were obtained from 25,663 students and valid reading scores from 25,469 students. Approximately 45% of the students qualified for Free or Reduced Lunch (FRL), 42% identified as a minority ethnic group, and 52% identified as nonminority Caucasian. Scale scores from the state curricular exams in reading and math were used to operationalize achievement. Scale scores range from a low of 400 to a high of 990. The average reading scale score for the sample was 747 with a standard deviation of 90. The average math scale score was 759 with a standard deviation of 92. The school sample shows that the average FRL rate was 70%; the average minority composition was 60%, 14% of the schools earned school grades of “A”; 19% earned grades of “B”; 4% earned grades of “C”; 20% earned grades of “D”; 43% grades of “F”. Of the sampled schools, 62 were elementary schools and 19 middle schools. [Insert Table 1 About Here] Analytical Approach Two techniques were used in the analysis. First, consistent with conventional practices to report test score gaps (Jencks & Phillips, 1998; Reardon, 2011), we standardized scale scores to a mean of 0 and a standard deviation of 1. We then report mean differences between FRL and non-FRL and minority and non-minority students in “A”, “B”, “C”, “D”, and “F” schools. This approach is useful for examining differences in the achievement status of students. School grades, however, rank schools by effectiveness, and as such they must measure what schools and teachers control by accounting for achievement variance attributed to different Informational Significance of A-F Grades 15 school context. Harris (2011) refers to this as the cardinal rule of accountability, schools should be held accountable for what they do. When an indicator is used to rank schools, simple descriptive data lack the power to control for alternative explanations of test score differences (Carlson, 2006; Forte, 2010; Harris, 2011). For this reason, we used a multi-level modeling approach to estimate mean differences and achievement gaps after controlling for factors that are unrelated to teaching effectiveness or school practices. We followed a conventional multilevel model building process in HLM 7.0. The first step was to decompose achievement variance to within school and between school components with an unconditional random effects ANOVA. Results were used to calculate the IntraClass Correlation Coefficient (ICC), the percent of achievement variance attributed to school and nonschool factors. We tested the effects of student characteristics on achievement with a Random Coefficients regression. Student variables were grand-mean centered in this model. Grand-mean centering has a computational advantage over group-mean centered or un-centered models in that it controls for any shared variance between individual and group level predictors. Significant student variables were retained and set to vary randomly across schools. Non-random student effects were fixed to the school level. Unconditional Model Level 1: Achij = β0j + rij Level 2: β0j = γ00 + uoj P = σ2 uo / σ2 uo + σ2 eo Random Coefficient Regression Level 1: Achij = β0j + β1j (Minority Statusij) + β2j (FRL Statusij) + rij Level 2: β0j = γ00 + uoj Informational Significance of A-F Grades 16 β1j = γ01 + uoj β2j = γ02 + uoj The final step was to test a random coefficient slopes and intercepts as outcomes model with all significant student and school variables. We changed the centering to group-mean in this model to allow for a more accurate estimation of differences in level one slopes across schools (Enders & Tofighi, 2007). To further increase the reliability of the slope estimation, we used the state calculated school index score as a single predictor variable. The index score is a continuous variable that is used to determine the categorical letter grade. Using a single continuous variable as opposed to multiple categorical variables improves the degrees of freedom and yields a more reliable estimate of variation in level one slopes (Hox, 2012). Estimates represent the actual difference in scale scores after controlling for factors not related to teaching practices and school performance. Random Coefficient Slopes and Intercepts as Outcomes Model Level 1: Achij = β0j + β1j (Minority Statusij) + β2j (FRL Statusij) + rij Level 2: β0j = γ00 + γ01 (C) + γ02 (D) + γ03 (F) + γ04 (% Minority) + γ05 (FRL Rate) + uoj β1j = γ00 + γ11 (IndexScore) + uoj β2j = γ00 + γ21 (IndexScore) + uoj Achij = is an individual’s estimated average achievement, i, in the average school, j. β0j = is the school achievement mean for math achievement β1j = Minority achievement gap β2j = FRL achievement gap γ00 = grand mean for achievement Informational Significance of A-F Grades 17 γ01 = is the difference in average achievement between A/B schools and C schools γ02 = is the difference in average achievement between A/B schools and D schools γ03 = is the difference in average achievement between A/B schools and F schools γ04 = is the effect of school % Minority on achievement γ05 = is the effect of FRL rate on student achievement γ11 = cross-level interaction of minority achievement and Index Score γ21 = cross-level interaction of FRL achievement and Index Score Results We organized results by the two research questions: Do students in “A” and “B” schools have high average achievement and small achievement gaps compared to students in “D” and “F” schools. What is the difference in average achievement and achievement gaps between school grades when holding context constant? Average Achievement and Achievement Gaps As reported in Table 2, students in “A” and “B” schools had higher average reading and math scores than students in “C”, “D”, and “F” schools. Students in “A” schools had an average reading score about .34 standard deviations above the sample mean and an average math score about .39 standard deviations above the sample mean. Students in “B” schools had an average reading score about .12 standard deviations above the sample mean and average math score about .11 standard deviations above. Average reading and math scores in “C”, “D”, and “F” schools were below the sample mean and around one standard deviation less than the average reading and math scores in “A” schools. [Insert Table 2 About Here] Informational Significance of A-F Grades 18 We did find test score gaps for FRL students (Table 2 and Figures 1 and 2). In the overall sample, FRL students averaged reading and math scores nearly one standard deviation lower than non FRL students. The overall test score gap was close to one standard deviation for reading and math. Both reading and math gaps varied across school-assigned letter grades. In “A” schools the reading gap was .83 standard deviations with the average FRL student scoring .31 standard deviations below the mean and the average non FRL student scoring nearly .52 standard deviations above the mean. The math gap in “A” scores was about .75 standard deviations with the average math score of FRL students falling -.19 standard deviations below the mean and the average math score for non FRL students at about .56 standard deviations above the mean. [Insert Figure 1 About Here] For “B” schools, the FRL reading and math gaps were about 1 standard deviation. FRL students in “B” schools had an average reading score of -.33 and math score of -.31 standard deviations below the mean. Non FRL students had average reading and math scores of .38 and .36 standard deviations above the mean. Smaller FRL gaps were found in “C”, “D”, and “F” schools. The average reading difference in “C” schools was about .56 standard deviations and the average math difference was about .68. For “D” schools, differences were about .34 standard deviations for both reading and math, and in “F” schools the average reading difference was less than .02 standard deviations (with FRL students having a slightly higher average) and nearly .26 standard deviations for math. [Insert figure 2 About Here] The minority test score gap followed a similar pattern as the FRL (Table 3). The overall minority difference in reading and math scores was about 1 standard deviation. In reading, the Informational Significance of A-F Grades 19 average minority student scored at -.28 standard deviations below the sample mean whereas the average non-minority student scored .27 deviations above. In math, the average minority student scored -.31 standard deviations below the mean and the average non-minority was .27 standard deviations above. [Insert Table 3 About Here] Test gaps for minority students varied by letter grade. The largest minority gaps in reading and math (over one standard deviation) were found in “B” rated schools (Figures 3 and 4). The minority reading gap in “A” schools was .49 standard deviations while the minority math gap was .59 standard deviations. For “C” schools the average reading gap was .64 standard deviations and the average math gap was .37. Smaller differences between minority and nonminority students were found in “D” and “F” schools. For “D” schools the minority reading gap was about .24 standard deviations and the math about .35. For “F” schools, minority gaps were .30 and .25 standard deviations respectively. [Insert Figures 3 and 4 about Here] HLM Results We first report the variance decomposition from the unconditional random effects ANOVA models. Results show achievement variance that is attributed to student and school differences. Student differences accounted for 72 percent of variance in reading and 70 percent in math. Schools, on the other hand, accounted for 28 percent of the reading variance and 30 percent of the math variance (Table 2). To address the research question, we examined the main effects of letter grades and the moderating effect of letter grades on achievement gaps. [Insert Table 2 About Here] Informational Significance of A-F Grades 20 Small Main Effects Table 3 displays average differences in the math and reading scale scores after controlling for student (FRL and minority status) and school characteristics (FRL rate and percent of Caucasian students). For reading, we did not find statistically significant achievement differences attributed to school letter grade. Further, the estimated differences were small and considerably less than the standard deviation for the sample and the average standard error for the reading assessment (SEM = 33) (CTP McGraw Hill, 2013). Students in schools receiving a “C” grade averaged 3 scale points lower than the average reading scores for students in “A” and “B” schools. The average reading score for students in “D” schools was 1 scale point less than the average student scores in “A” and “B” schools. The largest difference, 31 scale points, was between students “F” ranked and students “A” and “B” ranked schools. The average difference, however, was not statistically significant and fell within the range of the standard error for the reading assessment (SEM = 33). Letter grades performed only slightly better in explaining differences in average math scores. We did not find statistically significant differences in average math achievement between students in “C” schools and students in “A” and “B” ranked schools. The estimated difference of 11 scale points was small (Cohen’s d = .11) and fell within the average measurement error of the math test (SEM = 22). The average math difference of 25 scale points for students in “D” schools and students in “A” and “B” schools was also not statistically significance at p<.05. This estimated difference was small (Cohen’s d = .25) and around the measurement error of the test. The difference of 42 scale points in average math achievement between “F” and “A” and “B” ranked schools was statistically significant with a small effect size (Cohen’s d = .44). Informational Significance of A-F Grades 21 [Insert Table 3 About Here] Moderating Effects of Letter Grades Consistent with the test score gaps we reported in the previous section, FRL and minority achievement gaps were lower in schools with the lowest school index scores. For FRL students, within-school achievement gaps increased proportionally to increases in the school index score for reading and math. Negative parameter estimates for reading (γ11 = -0.44, p < 01) and math γ21 = -0.53, p < .01) indicate a decline in the average achievement of FRL students as index scores increase. Figures 5 and 6 illustrate the negative relationship between FRL gaps and the school index score. As index scores increased, reading and math gaps widen. Additionally, average reading and math achievement of FRL students was considerably lower in schools with the highest index scores compared to schools with lower index scores. [Insert Figures 5 and 6 About Here] The relationship between index score and minority test score gaps was similar to FRL, but not as strong. Average reading achievement of minority students decreased (γ11 = -0.35, p < .05) as school index scores increased. Similarly, average math scores of minority students decreased (γ21 = -0.31, p < .05) as index scores increased. Figures 7 and 8 illustrate the changes in the minority test score gap by school index score. Notice that compared to the FRL gap, the slope of the line for minority students is not as steep and the average gap in schools with the best index scores is not as large. [Insert Figures 7 and 8 About Here] Informational Significance of A-F Grades 22 Discussion Informational significance provides a different framework to evaluate accountability indicators. Unlike validity studies that evaluate measurement quality, informational significance targets the usefulness of an accountability indicator. An indicator may achieve a degree of validity but not have value or utility for decisions affecting policy and practice. To support achievement equity, letter grades should be capable of explaining high and equitable achievement within and between schools. Oklahoma’s grades did not meet this standard. We found that A-F letter grades end up hiding achievement gaps rather than revealing them. When analyzing test score gaps, we found higher average reading and math scores in “A” and “B” schools compared to “C”, “D”, and “F” schools, but test scores were not equally distributed within letter grades. The largest achievement gaps were in schools ranked as the most effective. FRL and minority students in “A” and “B” schools had average reading and math achievement below the overall sample mean and in some cases not different from FRL and minority students in schools with lower letter grades. In “A” and “B” schools, FRL students had an average reading score about -.32 standard deviations below the mean. This average score was similar to average reading score of -.35 for FRL students in “D” schools and an average reading score of -.40 for FRL students in “F” schools. Average performance of FRL students was nearly equivalent across letter grades. Informational significance partly depends on a clear and accurate indication of achievement patterns within and across schools. Our evaluation of test score gaps suggests that Oklahoma’s A-F grades do not provide a clear signal of achievement for poor and minority students. Some “A” and “B” schools likely had high and equitable student achievement, but it is also true that schools with large test score gaps for FRL and minority students were rated as Informational Significance of A-F Grades 23 effective. Herein lies the crux of the issue: grades do not sort out schools with high and equitable achievement from schools with high average achievement and large achievement gaps. Not knowing the relative achievement of FRL and minority students leads to inaccurate judgments about school quality and diminishes the usefulness of letter grades. To be meaningful, grades need to reflect the performance of all students and student subgroups. Oklahoma’s A-F letter grades fail this test by making it possible for schools to receive “A’s” and “B’s” while failing to serve their FRL and minority students. HLM results raise additional concerns about the informational significance of A-F grades. After removing achievement variance attributed to factors unrelated to teaching or school effectiveness, letter grades were unable to differentiate schools by average student achievement. In reading, average test scores in “A”, “B”, “C”, and “D” schools were similar. The lower average reading achievement we found in “F” schools does not correspond to the performance difference one would reasonably expect between an A and an F. Math results were not much different than reading. Perhaps the most troubling finding was that “A” and “B” schools were least effective for poor, minority children, while “D” and “F” schools were most effective. Rather than supporting schools in closing achievement gaps, the intent of NCLB waivers, the Oklahoma system rewards schools with high grades even when large achievement gaps exist. Informational significance is lost on grades that hide achievement variance within and between schools, making any diagnostic and improvement use of A-F grades ineffectual. Evidence that FRL and minority students had higher achievement in “D” and “F” schools than their counterparts attending “A” and “B” schools challenges the formula used to calculate school grades. The distribution of letter grades would change quite drastically if the state assigned achievement gaps the same weight it assigns to achievement status. In our sample of Informational Significance of A-F Grades 24 schools, several “D” and “F” schools would become “C” or “B” schools, and many “A” and “B” schools would become “C” or “D” schools. Poor, minority students end up being left behind when grades obscure achievement differences within schools. In some instances, letter grades do not reflect the achievement of all students and student sub-groups, and in other cases schools showing some progress may be misidentified as needing urgent improvement. The absence of informational significance means that school grades cannot be used to nurture the human and social capacity under which effective schools adapt to their external environments and to the needs of their students. School grades deliver little informational value to teachers and administrators. They hide achievement differences, they cannot be disaggregated by content standards, and they do not measure student growth toward college, citizenship, and career ready expectations. Furthermore, school grades cannot be used to measure the effectiveness of improvement strategies or interventions; any change from one year to the next is just as likely attributable to factors outside school control than to what happens within schools and classrooms. School grades have limited use as an external control as well. Grades that obscure achievement differences encourage misguided judgments about school effectiveness and misplaced reform pressure. For instance, “D” and “F” can use additional support and resources, but instead they face mandated interventions that do not address the sources of diminished capacity. In contrast, “A” and “B” schools encounter no external pressure or incentives to track achievement of FRL and minority students. In fact, “A” and “B” schools can be rewarded even if low-income and minority student achievement lag behind students with more social advantages. Informational Significance of A-F Grades 25 Problems identified with accountability indicators under NCLB compound in Oklahoma’s A-F grading system. First, the system uses proficiency scores for its calculation of student achievement and student growth. Proficiency scores are a simple metric to describe achievement status in the aggregate, but their accuracy erodes when used as the basis for ranking schools for the purpose of policy decisions or passing judgments of school effectiveness (Carlson, 2006; Forte, 2010; Ho, 2008; Linn, 2005). Second, the system hides achievement of poor and minority students by using the growth of the bottom 25 percent to satisfy the achievement equity expectation. To keep the spotlight on achievement equity, Oklahoma’s policy would, at the minimum, need to report proficiency scores by student subgroups and account for subgroup performance in calculations of the student achievement and student growth components. The State does neither, effectively ignoring poor, minority students in its calculations and reporting. Finally, assumptions of letter grades do not correspond with the dynamic nature of schools and student learning. School performance is multifaceted and varies across subjects, classrooms, and students. Instead of measuring and reporting variability, grades treat teaching and learning in schools as fixed processes. As a result, lower achieving students receive the same performance status as higher achieving students, essentially ignoring variance that can help schools recognize and respond to unmet student needs. NCLB waivers were designed to provide states flexibility in developing a fair and focused accountability system to support continuous improvement (US Department of Education, 2012). A policy that rewards schools for large FRL and minority achievement gaps and penalizes schools whose poor, minority students outperform peers in more affluent schools, is neither fair nor supportive of continuous improvement. Informational Significance of A-F Grades 26 Rather than advancing achievement equity, the intent of the federal NCLB waiver, letter grades seem to exploit achievement levels that derive from wealth and social advantage, while obscuring a school’s failure to serve all children. To advance achievement equity, educators need to understand common sources of achievement variance within schools. Letter grades, however, collapse achievement variance into a single composite indicator. No measure of school performance can yield accurate results if the majority of variance in student achievement is concealed by the indicator (Forte, 2010). As a practical consequence, grades end up classifying some schools as “A” and “B” schools when they are failing to meet the learning needs of all students and other schools as “D” and “F” schools when they are making progress with poor, minority students. Conclusion Progress made under NCLB in exposing achievement inequity in the US has taken a step back with Oklahoma’s A-F school grades. Our evidence suggests that a composite letter grade does not provide a clear signal or simple interpretation of achievement differences within and between schools. No meaningful information about achievement gaps can be obtained from a letter grade. We cannot conclude, for instance, that “A” and “B” schools have high average and equitable achievement. We also cannot conclude that FRL and minority students in “D” and “F” schools perform worse on average than peers in higher ranked schools. Herein lies a fundamental problem with the informational significance of A-F accountability grades: grades do not provide the right information to understand achievement patterns in schools. Without knowledge of test score differences, it is hard to see how appropriate action can be taken to improve learning outcomes. Informational Significance of A-F Grades 27 Although our evidence is limited to one state, many components of Oklahoma’s system are similar to those used in other states (Howe & Murry, 2015). Other states use proficiency bands without reporting results by subgroups, they use achievement of the bottom 25% to fulfill the achievement gap requirement, and they use a composite indicator to judge school effectiveness (Domaleski & Perie, 2013; Polikoff, McEachin, Wrabel, & Duque, 2014). These three components are likely to behave the same way in other state accountability systems. It is not variability in schools that presents a problem, but rather weaknesses of the components to measure achievement variance within schools. We cannot conclude with certainty that effects found in our sample will appear in a larger, more representative sample of states, districts and schools. What is clear is that additional research on new state accountability policies is needed. With states using different accountability designs, it is important for researchers to identify system components capable of yielding valid inferences of school performance. As long as accountability carries with it high stakes consequences, state governments have a legal and ethical responsibility to ensure that accountability systems accurately distinguish among different levels of school effectiveness. Informational Significance of A-F Grades 28 Table 1. Descriptive Student and School Data Reading Sample by Student Composition and Mean Test Score Minority .42 Non-Minority .58 Free/Reduced Lunch .45 Reading Scale Score 746 Math Sample by Student Composition and Test Score Minority .54 Non-Minority .46 Free/Reduced Lunch .78 Math Scale Score 759 School Sample Free/Reduced Lunch Rate 70% Minority Composition 60% “A” Schools .14 “B” Schools .19 “C” Schools .04 “D” Schools .20 “F” Schools .43 SD Min Max .50 .50 .42 89.86 0 0 0 400 1.0 1.0 1.0 990.00 .50 .50 .42 92 0 0 0 400 1.0 1.0 1.0 990 28.5 21.0 ------ 5.2% 15% ------ 100% 99% ------ Note. N=81 elementary and middle schools from three contiguous districts in one metropolitan area. We had valid reading scores for 25,469 students and valid math scores for 25,663 students. Informational Significance of A-F Grades 29 Table 2. Differences in reading and math test scores by FRL status and school grade. Grade F D C B A Total FRLStatus NonFRL FRL Total NonFRL FRL Total NonFRL FRL Total NonFRL FRL Total NonFRL FRL Total NonFRL FRL Total Reading -.39 -.41 -.41 -.01 -.35 -.27 .16 -.40 -.22 .38 -.33 .12 .52 -.31 .34 .37 -.35 .05 Math -.15 -.41 -.36 -.28 -.53 -.47 .21 -.47 -.25 .36 -.31 .11 .56 -.19 .39 .38 -.37 .04 Note. Test scores were standardized to a mean of 0 and a standard deviation of 1. Values represent that average deviation from the sample mean. Informational Significance of A-F Grades 30 Figure 1. Mean differences in reading test scores by FRL status and school letter grades. Test scores were standardized to a mean of 0 and a standard deviation of 1. Values are reported in standard deviation units. FRL students are coded as 1 and non FRL students are coded as 0. Informational Significance of A-F Grades 31 Figure 2. Mean differences in math test scores by FRL status and school letter grades. Test scores were standardized to a mean of 0 and a standard deviation of 1. Values are reported in standard deviation units. FRL students are coded as 1 and non FRL students coded as 0. Informational Significance of A-F Grades 32 Table 3. Differences in reading and math test scores by minority status and school grade. Grade F D C B A Total Minority Status Non-Minority Minority Total Non-Minority Minority Total Non-Minority Minority Total Non-Minority Minority Total Non-Minority Minority Total Non-Minority Minority Total Reading -.17 -.47 -.41 -.10 -.34 -.27 .19 -.43 -.22 .26 -.18 .12 .42 -.07 .34 .27 -.28 .04 Math -.18 -.43 -.36 -.25 -.60 -.47 .12 -.50 -.25 .27 -.25 .11 .46 .13 .39 .27 -.31 .04 Note. Test scores were standardized to a mean of 0 and a standard deviation of 1. Values represent that average deviation from the sample mean. Informational Significance of A-F Grades 33 Figure 3. Mean differences in reading test scores by Minority status and school letter grades. Test scores were standardized to a mean of 0 and a standard deviation of 1. Values are reported in standard deviation units. Minority students are coded as 1 and non-minority students coded as 0. Informational Significance of A-F Grades 34 Figure 4. Mean differences in reading test scores by Minority status and school letter grades. Test scores were standardized to a mean of 0 and a standard deviation of 1. Values are reported in standard deviation units. Minority students are coded as 1 and non-minority coded as 0. Informational Significance of A-F Grades 35 Table 4. Variance Components and IntraClass Correlation Coefficients Variable σ2 % Student Variance τ Reading Achievement 6465.72 72 % 2519.28 .28 3104.53** Math Achievement 6727.21 70 % 2947.83 .30 3211.79** Note. ** p < .01. σ2 is the achievement variance attributed to student differences. attributed to school differences. ICC(1) Chi Square τ is the achievement variance Informational Significance of A-F Grades 36 Table 3. Main effects of A-F letter grades and moderating effect of index score on FRL and Minority slopes Fixed Effects Mean Reading Mean Math Differences Differences Intercept 712.41 (2.20)** 724 (2.79)** FRL Rate -0.86 (.19)** -0.96 (.23)** Percent White 0.46 (.22)* 0.87 (.37)* C Schools -2.95 (13.3) -11.45 (13.2) D Schools -1.13 (10.11) -25.07 (14.87) F Schools -31.18 (17.23) -42.28 (12.65)** -28.10 (3.71)** -30.43 (3.83)** -0.45 (.18)** -0.52 (0.18)** -31.76 (3.53)** -31.65 (2.92)** -0.35 (.15)* -0.31 (.15)* Deviance (-2 Log likelihood) 82012 81937 Δ Deviance 990** 998** Explained Between School Variability 91% 85 % FRL Slope Index Score Minority Slope Index Score Note. * p<.05, **p<.01. We had valid reading data for 25,469 students and valid math data for 25,663 from 81 elementary and middle schools. Estimates come from random intercept and slopes as outcomes models. Standard errors are reported in parentheses. Student variables include FRL status and minority status. Contextual controls include percent minority and FRL rate. Student variables were group-mean centered in the full model and full maximum likelihood estimation was used. The change in deviance represents the change from the unconditional model to the final slopes and intercepts as outcomes model. Scale scores range from 400-990. Informational Significance of A-F Grades 37 732 READING 721 Non- FRL 710 FRL 698 687 -35.06 -16.81 1.44 19.69 37.94 INDEXSCORE Figure 5. Graph from intercepts and slopes as outcomes model of reading achievement. Results show a larger FRL gap in reading achievement as index score increases. Informational Significance of A-F Grades 38 747 MATH 734 Non- FRL 722 FRL 709 697 -35.06 -16.81 1.44 19.69 37.94 INDEXSCORE Figure 6. Graph from intercepts and slopes as outcomes model of math achievement. Results show a larger FRL gap in math achievement as index score increases. Informational Significance of A-F Grades 39 729 READING 719 Non- Minority 709 Minority 699 689 -35.06 -16.81 1.44 19.69 37.94 INDEXSCORE Figure 7. Graph from intercepts and slopes as outcomes model of Reading achievement. Results show a larger Minority gap in reading achievement as index score increases. Informational Significance of A-F Grades 40 741 MATH 730 Non- Minority 719 Minority 708 697 -35.06 -16.81 1.44 19.69 37.94 INDEXSCORE Figure 8. Graph from intercepts and slopes as outcomes model of Reading achievement. Results show a larger Minority gap in math achievement as index score increases. Informational Significance of A-F Grades 41 References Adams, C. M., Forsyth, P. B., Dollarhide, E. Miskell, R. C., & Ware, J. K. (2015). Selfregulatory climate: A social resource for student regulation and achievement. Teachers College Record, 117, 1-28. Ayers, J. (2011). No child left behind wiaver applications: Are they ambitious and achievable? Center for American Progress, Washington, DC. Retrieved from http://files.eric.ed.gov/fulltext/ED535638.pdf Baard, P. P. (2002). Intrinsic need satisfaction in organizations: A motivational basis of success in for-profit and not-for-profit settings. In E. Deci and R. Ryan (Eds.) Handbook of SelfDetermination Research, (pp. 255-276). Rochester, NY: University of Rochester Press. Baker, E. L., & Linn, R., L. (2002). Validity issues for accountability systems. CSE Technical Report 585. National Center for Research on Evaluation, Standards, and Student Testing. University of California, Los Angles. Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F. Linn, R. L., Ravitch, D. Rothstein, R. Shavelson, R. J., & Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers. The Economic Policy Institute (Retrieved: http://epi.3cdn.net/724cd9a1eb91c40ff0_hwm6iij90.pdf). Barton, P. E., & Coley, R. J. (2010). The black-white achievement gap: When progress stopped. Princeton, NJ: Educational Testing Services. Retrieved from: http://files.eric.ed.gov/fulltext/ED511548.pdf Booher-Jennings, J. (2005). Educational triage and the Texas accountability system. American Educational Research Journal, 42(2), 231-268. Bryk, A. S. (2009). Support a science of performance improvement. Phi Delta Kappan, 90(8), 597-600. Carlson, D. (2006). Focusing state educational accountability systems: Four methods of judging school quality and progress. Dover, NH: The Center for Assessment. Retrieved from http://www.nciea.org/publications/Dale020402.pdf CEP (2012). Accountability issues to watch under NCLB waivers. The George Washington University, Center on Education Policy. Retrieved. http://files.eric.ed.gov/fulltext/ED535955.pdf Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum. Darling-Hammond, L. (2006). Securing the right to learn: Policy and practice for powerful teaching and learning. Educational Researcher, 35(7), 13-24. Informational Significance of A-F Grades 42 Domaleski, C., & Perie, M. (2013). Promoting equity in state education accountability systems. National Center for the Improvement of Educational Assessment, Center for Educational Testing and Evaluation, University of Kansas. Enders, C. K., & Tofighi, D. (2003). Centering predictor variables in cross-sectional multilevel models: A new look at an old issue. Psychological Methods, 12(2), 121-138. Feuer, M. J. (2008). Future directions for educational accountability: Notes for a poitical economy of measurement. In K. Ryan & L. Shepard (Eds.), The future of test-based educational accountability, (pp293-306). New York: Routledge. Figlio, D. N., & Getzler, L. S. (2002). Accountability, ability and disability: gaming the system. Working paper 9307. Cambridge, MA: National Bureau of Economic Research. Finnigan, K. S., & Gross, B. (2007). Do accountability policy sanctions influence teacher motivation? Lessons from Chicago’s low-performing schools. American Educational Research Journal, 41(3), 594-630. Forsyth, P. B., Adams, C. M., & Hoy, W. K. (2011). Collective trust: Why schools can’t improve without it. New York, NY: Teachers College Press. Forte, E. (2010). Examining the assumptions underlying the NCLB federal accountability policy on school improvement. Educational Psychologist, 45(2), 76-88. Fusarelli, L. D. (2004). The potential impact of the No Child Left Behind Act on equity and diversity in American education. Educational Policy, 18(1), 71-94. Haladyna, T. M., Nolen, S. R., & Haas, N. S. (1991). Raising standardized achievement test scores and the origins of test score pollution. Educational Researcher, 42(17), 2-7. Hall, D. (2013). A step forward or a step back? State accountability in the waiver era. The Education Trust. Retrieved at http://files.eric.ed.gov/fulltext/ED543222.pdf Hamilton, L.S., Schwartz, H.L., Stecher, B.M., & Steele, J.L. (2013). Improving accountability through expanded measures of performance. Journal of Educational Administration, 51(4), 453-475. Harris, D. N. (2011). Value-added measures in education: What every educator needs to know. Cambridge, MA: Harvard Press. Heck, R. H. (2009). Teacher effectiveness and student achievement: Investigating a multilevel cross-classified model. Journal of Educational Administration, 47, 227-249. Heilig, V. J., & Darling-Hammond, L. (2008). Accountability Texas-style: The progress and learning of urban students in a high-stakes testing context. (2), 75-110. Informational Significance of A-F Grades 43 Educational Evaluation and Policy Analysis, 30 Ho, A D. (2008). The problem with “proficiency”: Limitations of statistics and policy under No Child Left Behind. Educational Researcher, 37(6), 351-360. Howe, K.R. & Murray, K. (2015). Why School Report Cards Merit a Failing Grade. Boulder, CO: National Education Policy Center. Retrieved from http://nepc.colorado.edu/publication/why-school-report-cards-fail. Hox, J. J. (2010). Multilevel analysis: Techniques and applicaitons (2nd ed.). New York, NY: Routledge. Jencks, C., & Phillips, M. (1998). America’s next achievement test: Closing the blackwhite test score gaps. The American Prospect, 9(40), 44-53. Kane, T. J., & Staiger, D. O. (2002). The promise and pitfalls of using imprecise school accountability measures. Journal of Economic Perspectives, 16(1), 91-114. King, B., & Minium, E. (2003). Statistical Reasoning in Psychology and Education, 4th ed., Hoboken: Wiley. Lee, J. (2008). Is test-driven external accountability effective? Synthesizing the evidence from cross-state causal-comparative and correlational studies. Review of Educational Research, 78(3), 608-644. Lee, J. (2006). Tracking achievement gaps and assessing the impact of NCLB on the gaps: An in-depth look into national and state reading and math outcome trends. The Civil Rights Project at Harvard University. President and Fellows of Harvard College. Lee, J. (2002). Racial and ethnic achievement gap trends: Reversing the progress toward equity? Educational Researcher, 31(1), 3-12. Linn, R. L. (2008). Educational accountability systems. In K. Ryan and L. Shepard (Eds.), The future of test-based educational accountability (pp. 3-24). New York, NY: Routledge. Linn, R. L. (2005). Conflicting demands of No Child Left Behind and state systems: Mixed messages about school performance. Education Policy Analysis Archives, 13(33), 1-20. Linn, R. L., & Haug, C. (2002). Stability of school building accountability scores and gains. CSE Technical Report 561. National Center for Research on Evaluation, Standards, and Student Testing. University of California, Los Angeles. McNeil, M. (2012). States punch reset button with NCLB waivers. Education Week. Retrieved from Informational Significance of A-F Grades 44 http://www.edweek.org/ew/articles/2012/10/17/08waiver_ep.h32.html?tkn=NSLFJ%2BWQnkq PlIMGUAUBakJda6JiHNTaJZDt&intc=es. Messick, S. (1995). Validity of psychological assessment; Validation of inferences from persons’ responses and performances as scientific inquiry into school meaning. American Psychologist, 50(9), 741-749. Miller, D. M. (2008). Data for school improvement and educational accountability: reliability and validity in practice. In K. Ryan & L. Shepard (Eds.), The future of test-based educational accountability (pp 249-262). New York: Routledge Mintrop, H., & Sunderman, G. L. (2009). Predictable failure of federal sanctions-driven accountability for school improvement and why we may retain it anyway. Educational Researcher, 38(5), 353-364. Moller, J. (2008). School leadership in an age of accountability: Tensions between managerial and professional accountability. Journal of Educational Change. Available on-line: 10.1007/s10833-008-9078-6. National Center for Education Statistics. (2013). NAEP 2012: Trends in Academic Progress, Reading 1971-2012, Math 1973-2012. US Department of Education. Retrieved from http://nces.ed.gov/nationsreportcard/subject/publications/main2012/pdf/2013456.pdf Neal, D. & Schanzenbach, D. W. (2010). Left behind by design: Proficiency counts and test-based accountability. The Review of Economics and Statistics, 92(2), 263-283. Nye, B., Konstantopoulos, S. & Hedges, L. V. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26, 237-257. OCEP & CERE. (2012). An examination of the Oklahoma State Department of Education’s A-F report card. The Oklahoma Center for Education Policy, University of Oklahoma, and The Center for Educational Research and Evaluation, Oklahoma State University. O’Day, J. A. (2002). Complexity, accountability, and school improvement. Harvard Educational Review, 72(3), 293-329. Oklahoma State Department of Education. (2012). Oklahoma school testing program, Oklahoma core curriculum tests, Grades 3 to 8 assessments. Pearson. Oklahoma State Department of Education. (2014l). 2014 A-F Report Card Technical Guide. Retrieved from: http://ok.gov/sde/sites/ok.gov.sde/files/documents/files/AtoF_Report_Card_Technical_Guide_828-2014.pdf. Informational Significance of A-F Grades 45 Quality Counts. (2014). District disruption and revival: School systems reshape to compete and improve. Education Week. Retrieved from: http://www.edweek.org/ew/toc/2014/01/09/index.html Pearson, Inc. (2012) Technical Report of the Oklahoma School Testing Program, Oklahoma Core Curriculum Tests, Grades 3 to 8 Assessments, Pearson, inc. Polikoff, M. McEachin, A., Wrabel, S. Duque, M. (2014). The waive of the future? School accountability in the waiver era. Educational Researcher. Retrieved from: http://wwwbcf.usc.edu/~polikoff/Waivers.pdf Popham, J. (2007). The no-win accountability game. In C. Glickman (Ed.), Letters to the next president: What we can do about the real crisis in public education (pp. 166-173). New York: Teachers College Press. Raudenbush, S. W. (2004). Schooling, statistics, and poverty: Can we measure school improvement? Princeton, NJ: Educational Testing Service. Reardon, S. F. (2011). The widening academic achievement gap between the rich and poor: New evidence and possible explanations. In G. J. Duncan & R. J. Murnane (Eds), Wither opportunity? Rising inequality, schools, and children’s life chances, (pp.91-116). New York, NY: Russell Sage Foundation. Reeve, J. (2002). Self-determination theory applied to educational settings. In E. Deci and R. Ryan (Eds.), Handbook of Self-Determination Research, (pp. 183-204). Rochester, NY: University of Rochester Press. Reeve, J., & Halusic, M. (2009). How k-12 teachers can put self-determination theory principles into practice. Theory and Research in Education, 7(2), 145-154. Reeve, J., & Jang. H. (2006). What teachers say and do to support students’ autonomy during a learning activity. Journal of Educational Psychology, 98(1), 209-218. Rothstein, R. (2009). Getting accountability right. Education Week. Retrieved from http://www.csun.edu/~krowlands/Content/SED610/reform/Getting%20Accountability%20Right. pdf. Rothstein, R., Jacobson, R., & Wilder, T. (2008). Grading education: Getting accountability right. New York, NY: Teachers College. Ryan, R. M., & Deci, E. L. (2012). Overview of self-determination theory: An organismic dialectical perspective. In R. Ryan (Ed.), The Oxford Handbook of Human Motivation, (pp.3-33). Oxford: Oxford University Press. Ryan, R. M. & Deci, E. L (2002). Overview of self-determination theory: An organismic Informational Significance of A-F Grades 46 dialectical perspective. In E. Deci and R. Ryan (Eds.), Handbook of Self-Determination Theory Research, (pp. 3-36). Rochester, NY: University of Rochester Press. Ryan, R. M. & Weinstein, N. (2009). Undermining quality teaching and learning: A selfdetermination theory perspective on high-stakes testing. Theory and Research in Education, 7(2), 224-233. Sahlberg, P. (2008). Rethinking accountability in a knowledge society. Journal of Educational Change. Published on-line: 10.1007/s10833-008-9098-2. Schlechty, P. C. (2010). Leading for Learning: How to transform schools into learning organizations. San Francisco, CA: Wiley. Schwartz, H. L., Hamilton, L. S., Stecher, B. M., & Steele, J. L. (2011). Expanded Measures of School Performance. Technical Report: Rand Corporation. Sirotnik, K. A. (2005). Holding accountability accountable. What ought to matter in public education. New York: Teachers College Press. Sirotnik, K. A. (2002). Promoting Responsible Accountability in Schools and Education. The Phi Delta Kappan, 83(9), 662-673. Sunderman, G. L., & Kim, J. S. (2005). Measuring academic proficiency under the No Child Left Behind Act; Implications for educational equity. Educational Researcher, 34(8), 3-13. US Department of Education. (2012). EASA Flexibility – Request. Retrieved from: http://www2.ed.gov/policy/elsec/guid/esea-flexibility/index.html Ushomirsky, N., Wiliams, D., Hal, D. (2014). Making sure all children matter” getting school accountability signals right. Washington, DC: The Education Trust. Whitford, B. L., & Jones, J. (2000). Accountability, Assessment, and Teacher Commitment: Lessons from Kentucky’s Reform Efforts. New York: State University of New York. Williams, G. C. (2002). Improving patients’ health through supporting the autonomy of patients and providers. In E. Deci and R. Ryan (Eds.), Handbook of Self-Determination Theory Research, (pp. 233-254). Rochester, NY: University of Rochester Press. National Center for Education Statistics (2013). The Nation's Report Card: Trends in Academic Progress 2012 (NCES 2013–456). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education, Washington, D.C. i When use quotation marks when referring to letter grades associated with a schools ranking. No quotation marks are used for a general reference of letter grades.
© Copyright 2024