Download Report

Informational Significance of A-F Grades 1
RUNNING HEAD: Informational Significance of A-F Grades
The Informational Significance of A-F School Accountability Grades
Curt M. Adams
Patrick B. Forsyth
Jordan K. Ware
University of Oklahoma, Oklahoma Center for Education Policy
Mwrumba Mwavita
Oklahoma State University, Center for Educational Evaluation and Research
Corresponding Author: Curt M. Adams, 4502 E. 41st Street, Tulsa, OK 74135; 918.671.9637;
[email protected].
Curt M. Adams is an associate professor in the department of Educational Leadership and Policy
Studies at the University of Oklahoma, and co-director of the Oklahoma Center for Education
Policy. Recent publications include the journal article, Self-Regulatory Climate: A social
resource for student regulation and achievement (Teachers College Record, 2015), and the book
Quantitative Research in Education: A Primer, with Wayne Hoy (SAGE, 2015).
Patrick B. Forsyth is professor of education at the University of Oklahoma, co-director of the
Center for Education Policy, and former executive director of UCEA and The National Policy
Board for Educational Administration. Recent books include Trust and School Life (2014,
Springer) with D. Van Maele and M. Van Houtte, and Collective Trust (2011, Teachers College
Press) with C.M. Adams and W.K. Hoy.
Jordan Ware is a post-doctoral social scientist at the University of Oklahoma and the Oklahoma
Center for Education Policy where he directs the project to measure and improve school
capacity. He studies neighborhood poverty and its effects on learning and development.
Mwarumba Mwavita is an assistant professor of Research, Evaluation, Measurement, and
Statistics at Oklahoma State University. He is also Director of the Center for Educational
Research and Evaluation.
Description: This study evaluates the information significance of Oklahoma A-F school
accountability grades relevant to the policy objective of achievement equity.
In press with Teachers College Record. Do not cite without permission of the author.
Informational Significance of A-F Grades 2
The Informational Significance of A-F School Accountability Grades
Abstract
Background/Context. Despite problems with accountability systems under No Child Left
Behind, the policy has been widely commended for exposing the depth and breadth of
educational inequality in the United States. As states implement new accountability systems,
there is growing concern that attention to achievement gaps and the performance of marginalized
children has faded. Many approved accountability plans no longer report achievement by
student subgroups or include subgroup performance in the calculation of accountability
indicators.
Research Purpose. This study examined the informational significance of Oklahoma’s A-F
accountability grades relative to the policy objective of achievement equity. Informational
significance as explained in self-determination theory provided a framework to explore the
usefulness of an A-F grade for understanding achievement differences within and between
schools.
Research Design. We evaluated the informational significance of Oklahoma A-F school grades
by analyzing reading and math test scores from over 25,000 students in 81 elementary and
middle schools. The study was designed to address two questions: Do students in “A” and “B”
schools have high average achievement and small achievement gaps compared to students in “D”
and “F” schools? What is the difference in average achievement and achievement gaps between
school grades when holding constant contextual school conditions?
Results. We found test score gaps attributed to Free and Reduced Lunch qualification and
minority status. Free and Reduced Lunch and minority students average about one standard
deviation lower in math and reading than their peers. Test score gaps varied across A-F school
grades with the largest gaps existing in “A” and “B” schools. HLM results showed that A-F
grades do not differentiate schools by effectiveness levels. For reading, we did not find
statistically significant main effects attributed to letter grades. For math, the only statistically
significance difference was between students in “A” and “B” schools and students in “F”
schools. This difference had a small effect size. School grades did moderate achievement gaps,
but gaps moved in a direction opposite from what would be desired of an accountability system
that measured achievement equity.
Conclusions. Progress made under NCLB in exposing achievement inequity in the U.S. has
taken a step back with Oklahoma’s A-F school grades. Our evidence suggests that a composite
letter grade provides very little meaningful information about achievement differences.
Informational Significance of A-F Grades 3
Executive Summary
As states implement new accountability systems, there is growing concern that attention
to achievement gaps and the performance of marginalized children has faded (Ayers, 2011; Hall,
2013). Initial evidence shows that concerns raised from document analysis of waiver
applications are legitimate. Ushomirsky, Wiliams, and Hall (2014) found that schools in Florida,
Kentucky, and Minnesota earned high effectiveness rankings despite low test performance from
minority and low-income students. In some cases, minority and low-income students in lower
ranked schools outperformed peers in higher ranked schools.
Oklahoma uses an A-F ranking system as the basis of State recognition and intervention.
High letter grades, A’s and B’s, lead to rewards and public acclaim whereas low grades, D’s and
F’s, impose mandated turnaround plans on schools. Although the A-F grading system was not
designed specifically to measure achievement gaps, it is assumed that ranking schools by letter
grades can support efforts to equalize achievement distributions. In this study we examined the
informational significance of letter grades relative to the policy objective of achievement equity.
The purpose was to determine if students in “A” and “B” schools had high average achievement
and smaller achievement gaps compared to students in “D” and “F” schools, and to explore
differences in average achievement when holding constant conditions that schools do not control.
Improvement Assumptions and Accountability
Test-based accountability assumes that external contingencies (e.g. threats or rewards)
have instrumental value in reinforcing desired actions and outcomes (Ryan & Deci, 2002, 2012;
Polikoff, McEachin, Wrabel, Duque, 2014). The belief is that threats and sanctions can provoke
internal change by eliciting the collective will to make instructional systems efficient and
effective (Mintrop & Sunderman, 2009). Accordingly, schools falling short of measurable
objectives encounter public scorn, and face mandated improvement plans, loss of students
through choice options, prescribed reform models, reconstitution, or in some cases closure
(Sahlberg, 2008).
In the design of test-based accountability, a body of divergent research findings on
external control was either dismissed by policy makers or not known. The weight of the
psychological evidence indicates that contingent reinforcement withers under the strain of
complex, conceptual tasks like teaching and learning (Ryan & Deci, 2002; Ryan & Weinstain,
2009). Performance information used as an external control is inimical to work that involves
professional discretion, adaptation, and cooperation among interdependent groups (Forsyth,
Adams, & Hoy, 2011).
An alternative theoretical lens used to explain optimal individual and group performance
may well explain why achievement equity eluded NCLB and how performance information can
play a supportive role in closing achievement gaps. The crucial adjustment is with the locus of
causality, switching from external to internal motivators. Self-determination theory informs this
adjustment.
Effective accountability systems informed by self-determination theory depend on the
functional significance of performance indicators. Functional significance is defined as the
meaning and worth individuals place in an object, experience, or event (Ryan & Deci, 2012).
The functional significance of accountability indicators can be informational or controlling
(Ryan & Weinstein, 2009). Accountability indicators that have informational significance stand
a better chance of generating the energy and capacity to narrow persistent achievement gaps
Informational Significance of A-F Grades 4
(Ryan & Deci, 2012; Ryan & Weinstein, 2009). Informational significance comes from the
diagnostic value associated with a performance indicator (Ryan & Weinstein, 2009). Clear and
accurate information about learning processes and outcomes is needed to generate knowledge
about student performance; this knowledge in turn can drive improvement decisions and actions.
It is hard to see how appropriate action can be taken to close achievement gaps without first
knowing how achievement varies within schools. If consequential decisions and actions are
based on accountability indicators, the indicators should provide enough information to
understand differences in student achievement.
Findings
We found higher average reading and math scores in “A” and “B” schools compared to
“C”, “D”, and “F” schools, but test scores were not equally distributed within letter grades. The
largest achievement gaps were in schools ranked as the most effective. FRL and minority
students in “A” and “B” schools had average reading and math achievement below the overall
sample mean and in some cases not different from FRL and minority students in schools with
lower letter grades. In “A” and “B” schools, FRL students had an average reading score about .32 standard deviations below the mean. This average score was similar to average reading score
of -.35 for FRL students in “D” schools and an average reading score of -.40 for FRL students in
“F” schools. Average performance of FRL students was nearly equivalent across letter grades.
HLM results raise additional concerns about the informational significance of A-F
grades. After removing achievement variance attributed to factors unrelated to teaching or
school effectiveness, letter grades were unable to differentiate schools by average student
achievement. In reading, average test scores in “A”, “B”, “C”, and “D” schools were similar.
Math results were not much different than reading. Perhaps the most troubling finding was that
“A” and “B” schools were least effective for poor, minority children, while “D” and “F” schools
were most effective.
Conclusion
Progress made under NCLB in exposing achievement inequity in the US has taken a step
back with Oklahoma’s A-F school grades. Our evidence suggests that a composite letter grade
does not provide a clear signal or simple interpretation of achievement differences within and
between schools. No meaningful information about achievement gaps can be obtained from a
letter grade. We cannot conclude, for instance, that “A” and “B” schools have high average and
equitable achievement. We also cannot conclude that FRL and minority students in “D” and “F”
schools perform worse on average than peers in higher ranked schools. Herein lies a
fundamental problem with the informational significance of A-F accountability grades: grades
do not provide the right information to understand achievement patterns in schools. Without
knowledge of test score differences, it is hard to see how appropriate action can be taken to
improve learning outcomes.
Informational Significance of A-F Grades 5
The Informational Significance of A-F School Accountability Grades
Despite problems with accountability systems under No Child Left Behind (NCLB), the
policy has been widely commended for exposing the depth and breadth of educational inequality
in the United States. Achievement equity remains a target of accountability systems approved
for NCLB waivers. As of February 2014, 42 states, Washington, DC, and eight school districts
in California were operating under approved NCLB flexibility waivers. At a minimum,
approved accountability systems need to set ambitious Annual Measurable Objectives (AMO) in
reading/language arts and math, to recognize “reward” schools and identify low performing
schools as “priority” and “focus” schools, to measure student growth, to support efforts to close
achievement gaps, and to support “priority” and “reward” schools in building capacity and
improving performance (US Department of Education, 2012).
As states implement new accountability systems, there is growing concern that attention
to achievement gaps and the performance of marginalized children has faded (Ayers, 2011; Hall,
2013). Many approved accountability plans no longer report achievement by student subgroups
or include subgroup performance in the calculation of accountability indicators (CEP, 2012;
Domaleski & Perie, 2013; Polikoff, McEachin, Wrabel, & Duque, 2014). Some states have
opted to combine historically marginalized students into a “super subgroup,” to use growth of the
bottom 25% of students in a school to satisfy the achievement gap reporting requirement, and to
evaluate school performance with a composite indicator (Ayers, 2011; McNeill, 2012). These
changes have the potential to produce a performance indicator that effectively hides achievement
disparities within and between schools (Hall, 2013).
Initial evidence shows that concerns raised from document analysis of waiver
applications are legitimate. Ushomirsky, Wiliams, and Hall (2014) found that schools in Florida,
Informational Significance of A-F Grades 6
Kentucky, and Minnesota earned high effectiveness rankings despite low test performance from
minority and low-income students. In some cases, minority and low-income students in lower
ranked schools outperformed peers in higher ranked schools. We intensify the scrutiny of new
accountability systems by examining the informational significance of Oklahoma’s A-F school
grades. Oklahoma uses an A-F ranking system as the basis of State recognition and intervention.
High letter grades, A’s and B’s, lead to rewards and public acclaim whereas low grades, D’s and
F’s, impose mandated turnaround plans on schools. Although the A-F grading system was not
designed specifically to measure achievement gaps, it is assumed that ranking schools by letter
grades can support efforts to equalize achievement distributions.
This study examined the informational significance of letter grades relative to the policy
objective of achievement equity. Informational significance as explained in self-determination
theory provided a framework to explore the usefulness of an accountability grade for
understanding achievement differences within and between schools. This study does not
measure the validity of inferences based on a letter grade. Rather, the purpose was to determine
if students in “A” and “B” i schools had high average achievement and smaller achievement gaps
compared to students in “D” and “F” schools, and to explore differences in average achievement
when holding constant conditions that schools do not control.
Test-Based Accountability and Achievement Equity
We use a narrow definition of achievement equity in this study, focusing on racial and
economic test score gaps that many reform policies target (Lee, 2006, 2008; Fusarelli, 2004).
Under NCLB, test-based accountability became the primary policy instrument to redress
achievement disparities (Lee, 2008). As it turns out, the complexity of achievement equity
exceeded the structure and function of test-based accountability (Lee, 2006; Harris, 2011;
Informational Significance of A-F Grades 7
Mintrop & Sunderman, 2009). Trend data from the National Assessment of Educational
Progress (NAEP) indicate that progress made in the 1970’s and 1980’s in narrowing
achievement gaps stalled from 1990 – 2004 (Lee, 2006; Mintrop & Sunderman, 2009; Rothstein,
Jacobson, & Wilder, 2008) and from 2004-2012 (National Center for Education Statistics, 2013).
Additionally, poverty gaps actually increased in two thirds of the states over the last decade
(Quality Counts, 2014).
Several explanations exist for persistent test score gaps. Differences in learning
opportunities, resource disparities, school capacity, and teaching quality partly explain lower
average achievement of poor and minority students (Barton & Coley, 2010). More specific to
this study is the quality and use of accountability indicators. Test-based accountability under
NCLB used simplistic performance indicators in a targeted way: to reward schools meeting
yearly objectives and to sanction schools falling short of annual achievement targets (Harris,
2011; Mintrop & Sunderman, 2009). Persistent achievement gaps during the NCLB era raise
questions about the improvement assumptions of test-based accountability (Ryan & Weinstain,
2009).
Improvement Assumptions and Accountability
Test-based accountability assumes that external contingencies (e.g. threats or rewards)
have instrumental value in reinforcing desired actions and outcomes (Ryan & Deci, 2002, 2012;
Polikoff, McEachin, Wrabel, Duque, 2014). The belief is that threats and sanctions can provoke
internal change by eliciting the collective will to make instructional systems efficient and
effective (Mintrop & Sunderman, 2009). It is assumed that school actors, when confronted with
punitive consequences for low achievement, will take action that leads to better outcomes
(O’Day, 2002). Accordingly, schools falling short of measurable objectives encounter public
Informational Significance of A-F Grades 8
scorn, and face mandated improvement plans, loss of students through choice options, prescribed
reform models, reconstitution, or in some cases closure (Sahlberg, 2008). Test-based
accountability relies on accountability indicators to identify underperforming schools so that
pressure or sanctions can induce school actors to improve learning opportunities.
Agency and expectancy theories have been used to explain how accountability systems
work from outside of schools to bring about change within them (Polikoff, McEachin, Wrabel,
Duque, 2014; Ryan & Weinstein, 2009). As suggested by these frameworks, clear achievement
standards and accurate performance indicators function as an external motivator for goal
attainment. Agency theory assumes that accountability information is a mechanism used by
principals (i.e. school administrators, community members, legislators, tax payers, etc.) to ensure
school agents (i.e. teachers) deliver student achievement (Polikoff, McEachin, Wrabel, & Duque,
2014). Expectancy theory assumes that rewards and threats motivate teachers to improve
achievement as long as standards are clear and performance information is accurate (Finnigan &
Gross, 2007). Through an agency and expectancy lens, the legitimacy and trustworthiness of
accountability indicators affect the behavioral response of school members.
In the design of test-based accountability, a body of divergent research findings on
external control was either dismissed by policy makers or not known. The weight of the
psychological evidence indicates that contingent reinforcement withers under the strain of
complex, conceptual tasks like teaching and learning (Ryan & Deci, 2002; Ryan & Weinstain,
2009). Performance information used as an external control is inimical to work that involves
professional discretion, adaptation, and cooperation among interdependent groups (Forsyth,
Adams, & Hoy, 2011). An alternative theoretical lens used to explain optimal individual and
group performance may well explain why achievement equity eluded NCLB and how
Informational Significance of A-F Grades 9
performance information can play a supportive role in closing achievement gaps. The crucial
adjustment is with the locus of causality, switching from external to internal motivators. Selfdetermination theory informs this adjustment.
Self-Determination Theory Applied to Accountability
The fundamental assumption of self-determination theory is that individuals are
inherently oriented toward growth and goal fulfillment (Ryan & Deci, 2002). Accordingly, the
drive and determination to excel are internal, primal states that require nurturing not coercive
control by external mechanisms. External mechanisms, like the use of performance information,
fuel motivation and effective behavior by satisfying the innate psychological needs of autonomy,
competence, and relatedness (Adams, Forsyth, Dollarhide, Miskell, & Ware, 2015; Ryan & Deci,
2012).
Here autonomy does not mean independence; rather, it is a cognitive belief embodied in
the volitional and purposeful action of individuals (Williams, 2002). Competence is feeling
effective in one’s task and having confidence in one’s ability to execute actions required to
achieve a challenging outcome. Relatedness comes from supportive social connections that
foster feelings of belonging and psychological security within a group or organization (Reeve,
2002). Schools bring to life teaching and learning when structures and processes promote
interactions that support autonomy, competence, and relatedness (Reeve & Jang, 2006). On the
other hand, teaching and learning become uninspired and stale when psychological needs are
thwarted by controlling structures (Reeve & Halusic, 2009). Performance indicators can be used
to build professional capacity, but to do so the information needs to be used in ways that enhance
the desire, creativity, and energy of teachers and students to press for academic excellence (Ryan
& Weinstein, 2009).
Informational Significance of A-F Grades 10
NLCB was not designed to build professional capacity; its intent was to hold schools
accountable for past results. Paradoxically, NCLB added considerable noise, confusion, and
dysfunction to many of the low performing, high need schools it promised to reform (Moller,
2008; O’Day, 2002; Sirotnik, 2005). When considered through self-determination theory, the
anemic performance of accountability is not surprising (Ryan & Weinstain, 2009).
Accountability indices derived from aggregated test scores cover a very thin slice of the
performance pie. The complexity of school work, more directly the complexity of teaching and
learning, far exceeds the narrow parameters of a composite achievement index compiled from
curricular tests administered at one occasion during the school year (Moller, 2008). Low quality
accountability information used to trigger sanctions constrict meaningful learning opportunities,
hinder innovation and risk-taking, and undermine motivation (Sahlberg, 2008).
Effective accountability systems informed by self-determination theory depend on the
functional significance of performance indicators. Functional significance is defined as the
meaning and worth individuals place in an object, experience, or event (Ryan & Deci, 2012).
The functional significance of accountability indicators can be informational or controlling
(Ryan & Weinstein, 2009). Using indicators as an external control has arguably diminished
innovation, creativity, and joy of teaching and learning in the very schools that NCLB intended
to reform (Darling-Hammond, 2006; Feuer, 2008; Sunderman & Kim, 2005). From a controlling
perspective, NCLB accountability systems failed to generate the legitimacy and trustworthiness
needed for performance indicators to be used as a tool to reform schools.
Accountability indicators that have informational significance stand a better chance of
generating the energy and capacity to narrow persistent achievement gaps (Ryan & Deci, 2012;
Ryan & Weinstein, 2009). Informational significance comes from the diagnostic value
Informational Significance of A-F Grades 11
associated with a performance indicator (Ryan & Weinstein, 2009). Clear and accurate
information about learning processes and outcomes is needed to generate knowledge about
student performance; this knowledge in turn can drive improvement decisions and actions. It is
hard to see how appropriate action can be taken to close achievement gaps without first knowing
how achievement varies within schools. If consequential decisions and actions are based on
accountability indicators, the indicators should provide enough information to understand
differences in student achievement.
Informational significance, as understood in self-determination theory, is the standard by
which we evaluate the performance of Oklahoma’s A-F grades. We do not test the construct
validity or reliability of letter grades; instead, our concern is with the ability of the grade to yield
meaningful and useful information about achievement differences between student groups. Our
assessment of informational significance is based on the degree to which school grades reflect
achievement gaps within and between schools. This purpose stands in contrast to validity studies
that use theory and empirical evidence to evaluate the ability of a measure to yield truthful
judgments about the object it purports to measure (Messick, 1995; Miller, 2008). The State may
have not have intended for A-F letter grades to specifically measure achievement gaps, but the
policy objective is to improve achievement equity.
For achievement equity, we evaluate informational significance by what a letter grade
reveals about achievement gaps for FRL and minority students. To keep attention focused on
high achievement for all students, a composite letter grade must reflect test score differences
within and between schools (Linn, 2005, 2008). High grades, “A’s” and “B’s”, logically suggest
strong achievement for all students in all subject areas. Low grades largely suggest lower
average achievement and large achievement gaps. If A-F grades reflect subgroup differences,
Informational Significance of A-F Grades 12
they may have value for equalizing achievement outcomes. If they do not, they fail the test of
informational significance.
Method
Our purpose was to assess the informational significance of Oklahoma’s school grades in
related to the policy objective of closing achievement gaps within and between schools. As
such, we do not evaluate the validity of letter grades for judgments about achievement equity.
The objective was to evaluate the usefulness of Oklahoma’s A-F grades for understanding
achievement differences. Grades should be meaningful for determining the average achievement
of all students and student subgroups. We asked two questions: First, do students in “A” and
“B” schools have high average achievement and small achievement gaps compared to students in
“D” and “F” schools? And second, what is the difference in average achievement and
achievement gaps between school grades when holding school context constant?
Composition of Oklahoma’s A-F School Grades
Oklahoma uses a single letter grade as an indicator of Annual Measurable Objectives
(AMO), to classify “reward,” “priority,” or “focus” schools, and to rank schools by effectiveness
(Oklahoma State Department of Education, 2012). School grades for the 2012-1013 school year
were calculated using a formula that converts test scores into categorical data, categorical data
back into a continuous index, and a continuous index into a summative letter grade. The final
composite grade is derived from two components: (1) student achievement and (2) student
growth (Ayers, 2011; OCEP & CERE, 2012; Oklahoma State Department of Education, 2014).
The student achievement component makes up 50% of the school grade. Student test
scores from state math, reading, science, and social studies exams are used to calculate a school’s
Performance Index (PI). The PI is calculated from a simple binary scale. Students scoring
Informational Significance of A-F Grades 13
below proficiency for each tested subject are assigned a zero and students who score proficient
or above are assigned a one. The total score for all tested subjects is divided by the total number
of tests taken to calculate the PI score for a school. This produces a PI score ranging from 0-100.
The PI score is then multiplied by .50 for the calculation of the final composite grade (Oklahoma
State Department of Education, 2014).
Student growth makes up the other 50 percent of the composite letter grade. Only math
and reading/English exams are used for the growth index. Growth is composed of overall
student growth (25%) and growth of the bottom quartile of students in a school (25%). For both
components, the growth score is calculated by first calculating the total number of students in the
school who either scored proficient/advanced for both testing periods, who increased a
proficiency level in the current testing period, or who showed a growth in the test score that was
above the state average for growth. This number is then divided by the total number of eligible
students to arrive at an overall growth index that ranges from 0-100. The overall growth index is
then multiplied by .25 for the calculation of the composite school grade. Growth of the bottom
quartile is similarly calculated (Oklahoma State Department of Education, 2014).
Up to 10 bonus points are awarded to schools based on attendance rates, advance course
participation, dropout rates, and return rate of parent “climate” surveys. The PI score, student
growth, and bonus points are summed to arrive at an overall Index score that ranges from 0-100.
Index scores between 90-100 receive an A, 80-89 a B, 70-79 a C, 60-69 a D, and below 59 an F
(Oklahoma State Department of Education, 2014).
Data Source
Analyses were based on 2012-2013 reading and math test scores of over 25,000 students
from 81 urban, urban fringe, and suburban elementary and middle schools. Schools were
Informational Significance of A-F Grades 14
sampled from three contiguous districts in a single metropolitan area. Achievement data were
used from students in 3rd, 4th, 5th, 6th, 7th, and 8th grades. Table 1 contains descriptive data for the
sample of students and schools. Valid math scores were obtained from 25,663 students and valid
reading scores from 25,469 students. Approximately 45% of the students qualified for Free or
Reduced Lunch (FRL), 42% identified as a minority ethnic group, and 52% identified as nonminority Caucasian.
Scale scores from the state curricular exams in reading and math were used to
operationalize achievement. Scale scores range from a low of 400 to a high of 990. The average
reading scale score for the sample was 747 with a standard deviation of 90. The average math
scale score was 759 with a standard deviation of 92. The school sample shows that the average
FRL rate was 70%; the average minority composition was 60%, 14% of the schools earned
school grades of “A”; 19% earned grades of “B”; 4% earned grades of “C”; 20% earned grades
of “D”; 43% grades of “F”. Of the sampled schools, 62 were elementary schools and 19 middle
schools.
[Insert Table 1 About Here]
Analytical Approach
Two techniques were used in the analysis. First, consistent with conventional practices to
report test score gaps (Jencks & Phillips, 1998; Reardon, 2011), we standardized scale scores to a
mean of 0 and a standard deviation of 1. We then report mean differences between FRL and
non-FRL and minority and non-minority students in “A”, “B”, “C”, “D”, and “F” schools. This
approach is useful for examining differences in the achievement status of students.
School grades, however, rank schools by effectiveness, and as such they must measure
what schools and teachers control by accounting for achievement variance attributed to different
Informational Significance of A-F Grades 15
school context. Harris (2011) refers to this as the cardinal rule of accountability, schools should
be held accountable for what they do. When an indicator is used to rank schools, simple
descriptive data lack the power to control for alternative explanations of test score differences
(Carlson, 2006; Forte, 2010; Harris, 2011). For this reason, we used a multi-level modeling
approach to estimate mean differences and achievement gaps after controlling for factors that are
unrelated to teaching effectiveness or school practices.
We followed a conventional multilevel model building process in HLM 7.0. The first
step was to decompose achievement variance to within school and between school components
with an unconditional random effects ANOVA. Results were used to calculate the IntraClass
Correlation Coefficient (ICC), the percent of achievement variance attributed to school and nonschool factors. We tested the effects of student characteristics on achievement with a Random
Coefficients regression. Student variables were grand-mean centered in this model. Grand-mean
centering has a computational advantage over group-mean centered or un-centered models in that
it controls for any shared variance between individual and group level predictors. Significant
student variables were retained and set to vary randomly across schools. Non-random student
effects were fixed to the school level.
Unconditional Model
Level 1:
Achij = β0j + rij
Level 2:
β0j = γ00 + uoj
P = σ2 uo / σ2 uo + σ2 eo
Random Coefficient Regression
Level 1:
Achij = β0j + β1j (Minority Statusij) + β2j (FRL Statusij) + rij
Level 2:
β0j = γ00 + uoj
Informational Significance of A-F Grades 16
β1j = γ01 + uoj
β2j = γ02 + uoj
The final step was to test a random coefficient slopes and intercepts as outcomes model
with all significant student and school variables. We changed the centering to group-mean in
this model to allow for a more accurate estimation of differences in level one slopes across
schools (Enders & Tofighi, 2007). To further increase the reliability of the slope estimation, we
used the state calculated school index score as a single predictor variable. The index score is a
continuous variable that is used to determine the categorical letter grade. Using a single
continuous variable as opposed to multiple categorical variables improves the degrees of
freedom and yields a more reliable estimate of variation in level one slopes (Hox, 2012).
Estimates represent the actual difference in scale scores after controlling for factors not related to
teaching practices and school performance.
Random Coefficient Slopes and Intercepts as Outcomes Model
Level 1:
Achij = β0j + β1j (Minority Statusij) + β2j (FRL Statusij) + rij
Level 2:
β0j = γ00 + γ01 (C) + γ02 (D) + γ03 (F) + γ04 (% Minority) + γ05 (FRL Rate) +
uoj
β1j = γ00 + γ11 (IndexScore) + uoj
β2j = γ00 + γ21 (IndexScore) + uoj
Achij = is an individual’s estimated average achievement, i, in the average school, j.
β0j = is the school achievement mean for math achievement
β1j = Minority achievement gap
β2j = FRL achievement gap
γ00 = grand mean for achievement
Informational Significance of A-F Grades 17
γ01 = is the difference in average achievement between A/B schools and C schools
γ02 = is the difference in average achievement between A/B schools and D schools
γ03 = is the difference in average achievement between A/B schools and F schools
γ04 = is the effect of school % Minority on achievement
γ05 = is the effect of FRL rate on student achievement
γ11 = cross-level interaction of minority achievement and Index Score
γ21 = cross-level interaction of FRL achievement and Index Score
Results
We organized results by the two research questions: Do students in “A” and “B” schools
have high average achievement and small achievement gaps compared to students in “D” and
“F” schools. What is the difference in average achievement and achievement gaps between
school grades when holding context constant?
Average Achievement and Achievement Gaps
As reported in Table 2, students in “A” and “B” schools had higher average reading and
math scores than students in “C”, “D”, and “F” schools. Students in “A” schools had an average
reading score about .34 standard deviations above the sample mean and an average math score
about .39 standard deviations above the sample mean. Students in “B” schools had an average
reading score about .12 standard deviations above the sample mean and average math score
about .11 standard deviations above. Average reading and math scores in “C”, “D”, and “F”
schools were below the sample mean and around one standard deviation less than the average
reading and math scores in “A” schools.
[Insert Table 2 About Here]
Informational Significance of A-F Grades 18
We did find test score gaps for FRL students (Table 2 and Figures 1 and 2). In the
overall sample, FRL students averaged reading and math scores nearly one standard deviation
lower than non FRL students. The overall test score gap was close to one standard deviation for
reading and math. Both reading and math gaps varied across school-assigned letter grades. In
“A” schools the reading gap was .83 standard deviations with the average FRL student scoring .31 standard deviations below the mean and the average non FRL student scoring nearly .52
standard deviations above the mean. The math gap in “A” scores was about .75 standard
deviations with the average math score of FRL students falling -.19 standard deviations below
the mean and the average math score for non FRL students at about .56 standard deviations
above the mean.
[Insert Figure 1 About Here]
For “B” schools, the FRL reading and math gaps were about 1 standard deviation. FRL
students in “B” schools had an average reading score of -.33 and math score of -.31 standard
deviations below the mean. Non FRL students had average reading and math scores of .38 and
.36 standard deviations above the mean. Smaller FRL gaps were found in “C”, “D”, and “F”
schools. The average reading difference in “C” schools was about .56 standard deviations and
the average math difference was about .68. For “D” schools, differences were about .34 standard
deviations for both reading and math, and in “F” schools the average reading difference was less
than .02 standard deviations (with FRL students having a slightly higher average) and nearly .26
standard deviations for math.
[Insert figure 2 About Here]
The minority test score gap followed a similar pattern as the FRL (Table 3). The overall
minority difference in reading and math scores was about 1 standard deviation. In reading, the
Informational Significance of A-F Grades 19
average minority student scored at -.28 standard deviations below the sample mean whereas the
average non-minority student scored .27 deviations above. In math, the average minority student
scored -.31 standard deviations below the mean and the average non-minority was .27 standard
deviations above.
[Insert Table 3 About Here]
Test gaps for minority students varied by letter grade. The largest minority gaps in
reading and math (over one standard deviation) were found in “B” rated schools (Figures 3 and
4). The minority reading gap in “A” schools was .49 standard deviations while the minority
math gap was .59 standard deviations. For “C” schools the average reading gap was .64 standard
deviations and the average math gap was .37. Smaller differences between minority and nonminority students were found in “D” and “F” schools. For “D” schools the minority reading gap
was about .24 standard deviations and the math about .35. For “F” schools, minority gaps were
.30 and .25 standard deviations respectively.
[Insert Figures 3 and 4 about Here]
HLM Results
We first report the variance decomposition from the unconditional random effects
ANOVA models. Results show achievement variance that is attributed to student and school
differences. Student differences accounted for 72 percent of variance in reading and 70 percent
in math. Schools, on the other hand, accounted for 28 percent of the reading variance and 30
percent of the math variance (Table 2). To address the research question, we examined the main
effects of letter grades and the moderating effect of letter grades on achievement gaps.
[Insert Table 2 About Here]
Informational Significance of A-F Grades 20
Small Main Effects
Table 3 displays average differences in the math and reading scale scores after
controlling for student (FRL and minority status) and school characteristics (FRL rate and
percent of Caucasian students). For reading, we did not find statistically significant achievement
differences attributed to school letter grade. Further, the estimated differences were small and
considerably less than the standard deviation for the sample and the average standard error for
the reading assessment (SEM = 33) (CTP McGraw Hill, 2013). Students in schools receiving a
“C” grade averaged 3 scale points lower than the average reading scores for students in “A” and
“B” schools. The average reading score for students in “D” schools was 1 scale point less than
the average student scores in “A” and “B” schools. The largest difference, 31 scale points, was
between students “F” ranked and students “A” and “B” ranked schools.
The average
difference, however, was not statistically significant and fell within the range of the standard
error for the reading assessment (SEM = 33).
Letter grades performed only slightly better in explaining differences in average math
scores. We did not find statistically significant differences in average math achievement
between students in “C” schools and students in “A” and “B” ranked schools. The estimated
difference of 11 scale points was small (Cohen’s d = .11) and fell within the average
measurement error of the math test (SEM = 22). The average math difference of 25 scale points
for students in “D” schools and students in “A” and “B” schools was also not statistically
significance at p<.05. This estimated difference was small (Cohen’s d = .25) and around the
measurement error of the test. The difference of 42 scale points in average math achievement
between “F” and “A” and “B” ranked schools was statistically significant with a small effect size
(Cohen’s d = .44).
Informational Significance of A-F Grades 21
[Insert Table 3 About Here]
Moderating Effects of Letter Grades
Consistent with the test score gaps we reported in the previous section, FRL and minority
achievement gaps were lower in schools with the lowest school index scores. For FRL students,
within-school achievement gaps increased proportionally to increases in the school index score
for reading and math. Negative parameter estimates for reading (γ11 = -0.44, p < 01) and math
γ21 = -0.53, p < .01) indicate a decline in the average achievement of FRL students as index
scores increase. Figures 5 and 6 illustrate the negative relationship between FRL gaps and the
school index score. As index scores increased, reading and math gaps widen. Additionally,
average reading and math achievement of FRL students was considerably lower in schools with
the highest index scores compared to schools with lower index scores.
[Insert Figures 5 and 6 About Here]
The relationship between index score and minority test score gaps was similar to FRL,
but not as strong. Average reading achievement of minority students decreased (γ11 = -0.35, p <
.05) as school index scores increased. Similarly, average math scores of minority students
decreased (γ21 = -0.31, p < .05) as index scores increased. Figures 7 and 8 illustrate the changes
in the minority test score gap by school index score. Notice that compared to the FRL gap, the
slope of the line for minority students is not as steep and the average gap in schools with the best
index scores is not as large.
[Insert Figures 7 and 8 About Here]
Informational Significance of A-F Grades 22
Discussion
Informational significance provides a different framework to evaluate accountability
indicators. Unlike validity studies that evaluate measurement quality, informational significance
targets the usefulness of an accountability indicator. An indicator may achieve a degree of
validity but not have value or utility for decisions affecting policy and practice. To support
achievement equity, letter grades should be capable of explaining high and equitable
achievement within and between schools. Oklahoma’s grades did not meet this standard. We
found that A-F letter grades end up hiding achievement gaps rather than revealing them.
When analyzing test score gaps, we found higher average reading and math scores in “A”
and “B” schools compared to “C”, “D”, and “F” schools, but test scores were not equally
distributed within letter grades. The largest achievement gaps were in schools ranked as the
most effective. FRL and minority students in “A” and “B” schools had average reading and
math achievement below the overall sample mean and in some cases not different from FRL and
minority students in schools with lower letter grades. In “A” and “B” schools, FRL students had
an average reading score about -.32 standard deviations below the mean. This average score was
similar to average reading score of -.35 for FRL students in “D” schools and an average reading
score of -.40 for FRL students in “F” schools. Average performance of FRL students was nearly
equivalent across letter grades.
Informational significance partly depends on a clear and accurate indication of
achievement patterns within and across schools. Our evaluation of test score gaps suggests that
Oklahoma’s A-F grades do not provide a clear signal of achievement for poor and minority
students. Some “A” and “B” schools likely had high and equitable student achievement, but it is
also true that schools with large test score gaps for FRL and minority students were rated as
Informational Significance of A-F Grades 23
effective. Herein lies the crux of the issue: grades do not sort out schools with high and
equitable achievement from schools with high average achievement and large achievement gaps.
Not knowing the relative achievement of FRL and minority students leads to inaccurate
judgments about school quality and diminishes the usefulness of letter grades. To be
meaningful, grades need to reflect the performance of all students and student subgroups.
Oklahoma’s A-F letter grades fail this test by making it possible for schools to receive “A’s” and
“B’s” while failing to serve their FRL and minority students.
HLM results raise additional concerns about the informational significance of A-F
grades. After removing achievement variance attributed to factors unrelated to teaching or
school effectiveness, letter grades were unable to differentiate schools by average student
achievement. In reading, average test scores in “A”, “B”, “C”, and “D” schools were similar.
The lower average reading achievement we found in “F” schools does not correspond to the
performance difference one would reasonably expect between an A and an F. Math results were
not much different than reading. Perhaps the most troubling finding was that “A” and “B”
schools were least effective for poor, minority children, while “D” and “F” schools were most
effective. Rather than supporting schools in closing achievement gaps, the intent of NCLB
waivers, the Oklahoma system rewards schools with high grades even when large achievement
gaps exist. Informational significance is lost on grades that hide achievement variance within
and between schools, making any diagnostic and improvement use of A-F grades ineffectual.
Evidence that FRL and minority students had higher achievement in “D” and “F” schools
than their counterparts attending “A” and “B” schools challenges the formula used to calculate
school grades. The distribution of letter grades would change quite drastically if the state
assigned achievement gaps the same weight it assigns to achievement status. In our sample of
Informational Significance of A-F Grades 24
schools, several “D” and “F” schools would become “C” or “B” schools, and many “A” and “B”
schools would become “C” or “D” schools. Poor, minority students end up being left behind
when grades obscure achievement differences within schools. In some instances, letter grades do
not reflect the achievement of all students and student sub-groups, and in other cases schools
showing some progress may be misidentified as needing urgent improvement.
The absence of informational significance means that school grades cannot be used to
nurture the human and social capacity under which effective schools adapt to their external
environments and to the needs of their students. School grades deliver little informational value
to teachers and administrators. They hide achievement differences, they cannot be disaggregated
by content standards, and they do not measure student growth toward college, citizenship, and
career ready expectations. Furthermore, school grades cannot be used to measure the
effectiveness of improvement strategies or interventions; any change from one year to the next is
just as likely attributable to factors outside school control than to what happens within schools
and classrooms.
School grades have limited use as an external control as well. Grades that obscure
achievement differences encourage misguided judgments about school effectiveness and
misplaced reform pressure. For instance, “D” and “F” can use additional support and resources,
but instead they face mandated interventions that do not address the sources of diminished
capacity. In contrast, “A” and “B” schools encounter no external pressure or incentives to track
achievement of FRL and minority students. In fact, “A” and “B” schools can be rewarded even
if low-income and minority student achievement lag behind students with more social
advantages.
Informational Significance of A-F Grades 25
Problems identified with accountability indicators under NCLB compound in
Oklahoma’s A-F grading system. First, the system uses proficiency scores for its calculation of
student achievement and student growth. Proficiency scores are a simple metric to describe
achievement status in the aggregate, but their accuracy erodes when used as the basis for ranking
schools for the purpose of policy decisions or passing judgments of school effectiveness
(Carlson, 2006; Forte, 2010; Ho, 2008; Linn, 2005). Second, the system hides achievement of
poor and minority students by using the growth of the bottom 25 percent to satisfy the
achievement equity expectation. To keep the spotlight on achievement equity, Oklahoma’s
policy would, at the minimum, need to report proficiency scores by student subgroups and
account for subgroup performance in calculations of the student achievement and student growth
components. The State does neither, effectively ignoring poor, minority students in its
calculations and reporting.
Finally, assumptions of letter grades do not correspond with the dynamic nature of
schools and student learning. School performance is multifaceted and varies across subjects,
classrooms, and students. Instead of measuring and reporting variability, grades treat teaching
and learning in schools as fixed processes. As a result, lower achieving students receive the
same performance status as higher achieving students, essentially ignoring variance that can help
schools recognize and respond to unmet student needs. NCLB waivers were designed to provide
states flexibility in developing a fair and focused accountability system to support continuous
improvement (US Department of Education, 2012). A policy that rewards schools for large FRL
and minority achievement gaps and penalizes schools whose poor, minority students outperform
peers in more affluent schools, is neither fair nor supportive of continuous improvement.
Informational Significance of A-F Grades 26
Rather than advancing achievement equity, the intent of the federal NCLB waiver, letter
grades seem to exploit achievement levels that derive from wealth and social advantage, while
obscuring a school’s failure to serve all children. To advance achievement equity, educators
need to understand common sources of achievement variance within schools. Letter grades,
however, collapse achievement variance into a single composite indicator. No measure of school
performance can yield accurate results if the majority of variance in student achievement is
concealed by the indicator (Forte, 2010). As a practical consequence, grades end up classifying
some schools as “A” and “B” schools when they are failing to meet the learning needs of all
students and other schools as “D” and “F” schools when they are making progress with poor,
minority students.
Conclusion
Progress made under NCLB in exposing achievement inequity in the US has taken a step
back with Oklahoma’s A-F school grades. Our evidence suggests that a composite letter grade
does not provide a clear signal or simple interpretation of achievement differences within and
between schools. No meaningful information about achievement gaps can be obtained from a
letter grade. We cannot conclude, for instance, that “A” and “B” schools have high average and
equitable achievement. We also cannot conclude that FRL and minority students in “D” and “F”
schools perform worse on average than peers in higher ranked schools. Herein lies a
fundamental problem with the informational significance of A-F accountability grades: grades
do not provide the right information to understand achievement patterns in schools. Without
knowledge of test score differences, it is hard to see how appropriate action can be taken to
improve learning outcomes.
Informational Significance of A-F Grades 27
Although our evidence is limited to one state, many components of Oklahoma’s system
are similar to those used in other states (Howe & Murry, 2015). Other states use proficiency
bands without reporting results by subgroups, they use achievement of the bottom 25% to fulfill
the achievement gap requirement, and they use a composite indicator to judge school
effectiveness (Domaleski & Perie, 2013; Polikoff, McEachin, Wrabel, & Duque, 2014). These
three components are likely to behave the same way in other state accountability systems. It is
not variability in schools that presents a problem, but rather weaknesses of the components to
measure achievement variance within schools.
We cannot conclude with certainty that effects found in our sample will appear in a
larger, more representative sample of states, districts and schools. What is clear is that additional
research on new state accountability policies is needed. With states using different
accountability designs, it is important for researchers to identify system components capable of
yielding valid inferences of school performance. As long as accountability carries with it high
stakes consequences, state governments have a legal and ethical responsibility to ensure that
accountability systems accurately distinguish among different levels of school effectiveness.
Informational Significance of A-F Grades 28
Table 1.
Descriptive Student and School Data
Reading Sample by Student Composition and
Mean
Test Score
Minority
.42
Non-Minority
.58
Free/Reduced Lunch
.45
Reading Scale Score
746
Math Sample by Student Composition and Test Score
Minority
.54
Non-Minority
.46
Free/Reduced Lunch
.78
Math Scale Score
759
School Sample
Free/Reduced Lunch Rate
70%
Minority Composition
60%
“A” Schools
.14
“B” Schools
.19
“C” Schools
.04
“D” Schools
.20
“F” Schools
.43
SD
Min
Max
.50
.50
.42
89.86
0
0
0
400
1.0
1.0
1.0
990.00
.50
.50
.42
92
0
0
0
400
1.0
1.0
1.0
990
28.5
21.0
------
5.2%
15%
------
100%
99%
------
Note. N=81 elementary and middle schools from three contiguous districts in one metropolitan
area. We had valid reading scores for 25,469 students and valid math scores for 25,663 students.
Informational Significance of A-F Grades 29
Table 2.
Differences in reading and math test scores by FRL status and school grade.
Grade
F
D
C
B
A
Total
FRLStatus
NonFRL
FRL
Total
NonFRL
FRL
Total
NonFRL
FRL
Total
NonFRL
FRL
Total
NonFRL
FRL
Total
NonFRL
FRL
Total
Reading
-.39
-.41
-.41
-.01
-.35
-.27
.16
-.40
-.22
.38
-.33
.12
.52
-.31
.34
.37
-.35
.05
Math
-.15
-.41
-.36
-.28
-.53
-.47
.21
-.47
-.25
.36
-.31
.11
.56
-.19
.39
.38
-.37
.04
Note. Test scores were standardized to a mean of 0 and a standard deviation of 1. Values
represent that average deviation from the sample mean.
Informational Significance of A-F Grades 30
Figure 1. Mean differences in reading test scores by FRL status and school letter grades. Test
scores were standardized to a mean of 0 and a standard deviation of 1. Values are reported in
standard deviation units. FRL students are coded as 1 and non FRL students are coded as 0.
Informational Significance of A-F Grades 31
Figure 2. Mean differences in math test scores by FRL status and school letter grades. Test
scores were standardized to a mean of 0 and a standard deviation of 1. Values are reported in
standard deviation units. FRL students are coded as 1 and non FRL students coded as 0.
Informational Significance of A-F Grades 32
Table 3.
Differences in reading and math test scores by minority status and school grade.
Grade
F
D
C
B
A
Total
Minority
Status
Non-Minority
Minority
Total
Non-Minority
Minority
Total
Non-Minority
Minority
Total
Non-Minority
Minority
Total
Non-Minority
Minority
Total
Non-Minority
Minority
Total
Reading
-.17
-.47
-.41
-.10
-.34
-.27
.19
-.43
-.22
.26
-.18
.12
.42
-.07
.34
.27
-.28
.04
Math
-.18
-.43
-.36
-.25
-.60
-.47
.12
-.50
-.25
.27
-.25
.11
.46
.13
.39
.27
-.31
.04
Note. Test scores were standardized to a mean of 0 and a standard deviation of 1. Values
represent that average deviation from the sample mean.
Informational Significance of A-F Grades 33
Figure 3. Mean differences in reading test scores by Minority status and school letter grades.
Test scores were standardized to a mean of 0 and a standard deviation of 1. Values are reported
in standard deviation units. Minority students are coded as 1 and non-minority students coded as
0.
Informational Significance of A-F Grades 34
Figure 4. Mean differences in reading test scores by Minority status and school letter grades.
Test scores were standardized to a mean of 0 and a standard deviation of 1. Values are reported
in standard deviation units. Minority students are coded as 1 and non-minority coded as 0.
Informational Significance of A-F Grades 35
Table 4.
Variance Components and IntraClass Correlation Coefficients
Variable
σ2
% Student
Variance
τ
Reading Achievement
6465.72
72 %
2519.28 .28
3104.53**
Math Achievement
6727.21
70 %
2947.83 .30
3211.79**
Note. ** p < .01. σ2 is the achievement variance attributed to student differences.
attributed to school differences.
ICC(1)
Chi Square
τ is the achievement variance
Informational Significance of A-F Grades 36
Table 3.
Main effects of A-F letter grades and moderating effect of index score on FRL and Minority
slopes
Fixed Effects
Mean Reading
Mean Math
Differences
Differences
Intercept
712.41 (2.20)** 724 (2.79)**
FRL Rate
-0.86 (.19)**
-0.96 (.23)**
Percent White
0.46 (.22)*
0.87 (.37)*
C Schools
-2.95 (13.3)
-11.45 (13.2)
D Schools
-1.13 (10.11)
-25.07 (14.87)
F Schools
-31.18 (17.23)
-42.28 (12.65)**
-28.10 (3.71)**
-30.43 (3.83)**
-0.45 (.18)**
-0.52 (0.18)**
-31.76 (3.53)**
-31.65 (2.92)**
-0.35 (.15)*
-0.31 (.15)*
Deviance (-2 Log likelihood)
82012
81937
Δ Deviance
990**
998**
Explained Between School Variability
91%
85 %
FRL Slope
Index Score
Minority Slope
Index Score
Note. * p<.05, **p<.01. We had valid reading data for 25,469 students and valid math data for 25,663 from 81
elementary and middle schools. Estimates come from random intercept and slopes as outcomes models. Standard
errors are reported in parentheses. Student variables include FRL status and minority status. Contextual controls
include percent minority and FRL rate. Student variables were group-mean centered in the full model and full
maximum likelihood estimation was used. The change in deviance represents the change from the unconditional
model to the final slopes and intercepts as outcomes model. Scale scores range from 400-990.
Informational Significance of A-F Grades 37
732
READING
721
Non- FRL
710
FRL
698
687
-35.06 -16.81 1.44
19.69 37.94
INDEXSCORE
Figure 5. Graph from intercepts and slopes as outcomes model of reading achievement. Results
show a larger FRL gap in reading achievement as index score increases.
Informational Significance of A-F Grades 38
747
MATH
734
Non- FRL
722
FRL
709
697
-35.06 -16.81 1.44
19.69 37.94
INDEXSCORE
Figure 6. Graph from intercepts and slopes as outcomes model of math achievement. Results
show a larger FRL gap in math achievement as index score increases.
Informational Significance of A-F Grades 39
729
READING
719
Non- Minority
709
Minority
699
689
-35.06 -16.81 1.44
19.69 37.94
INDEXSCORE
Figure 7. Graph from intercepts and slopes as outcomes model of Reading achievement. Results
show a larger Minority gap in reading achievement as index score increases.
Informational Significance of A-F Grades 40
741
MATH
730
Non- Minority
719
Minority
708
697
-35.06 -16.81 1.44
19.69 37.94
INDEXSCORE
Figure 8. Graph from intercepts and slopes as outcomes model of Reading achievement. Results
show a larger Minority gap in math achievement as index score increases.
Informational Significance of A-F Grades 41
References
Adams, C. M., Forsyth, P. B., Dollarhide, E. Miskell, R. C., & Ware, J. K. (2015). Selfregulatory climate: A social resource for student regulation and achievement. Teachers College
Record, 117, 1-28.
Ayers, J. (2011). No child left behind wiaver applications: Are they ambitious and
achievable? Center for American Progress, Washington, DC. Retrieved from
http://files.eric.ed.gov/fulltext/ED535638.pdf
Baard, P. P. (2002). Intrinsic need satisfaction in organizations: A motivational basis of
success in for-profit and not-for-profit settings. In E. Deci and R. Ryan (Eds.) Handbook of SelfDetermination Research, (pp. 255-276). Rochester, NY: University of Rochester Press.
Baker, E. L., & Linn, R., L. (2002). Validity issues for accountability systems. CSE
Technical Report 585. National Center for Research on Evaluation, Standards, and Student
Testing. University of California, Los Angles.
Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F. Linn, R.
L., Ravitch, D. Rothstein, R. Shavelson, R. J., & Shepard, L. A. (2010). Problems with the use of
student test scores to evaluate teachers. The Economic Policy Institute (Retrieved:
http://epi.3cdn.net/724cd9a1eb91c40ff0_hwm6iij90.pdf).
Barton, P. E., & Coley, R. J. (2010). The black-white achievement gap: When progress
stopped. Princeton, NJ: Educational Testing Services. Retrieved from:
http://files.eric.ed.gov/fulltext/ED511548.pdf
Booher-Jennings, J. (2005). Educational triage and the Texas accountability
system. American Educational Research Journal, 42(2), 231-268.
Bryk, A. S. (2009). Support a science of performance improvement. Phi Delta Kappan,
90(8), 597-600.
Carlson, D. (2006). Focusing state educational accountability systems:
Four methods of judging school quality and progress. Dover, NH: The Center for Assessment.
Retrieved from http://www.nciea.org/publications/Dale020402.pdf
CEP (2012). Accountability issues to watch under NCLB waivers. The George
Washington University, Center on Education Policy. Retrieved.
http://files.eric.ed.gov/fulltext/ED535955.pdf
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ:
Lawrence Erlbaum.
Darling-Hammond, L. (2006). Securing the right to learn: Policy and practice for
powerful teaching and learning. Educational Researcher, 35(7), 13-24.
Informational Significance of A-F Grades 42
Domaleski, C., & Perie, M. (2013). Promoting equity in state education accountability
systems. National Center for the Improvement of Educational Assessment, Center for
Educational Testing and Evaluation, University of Kansas.
Enders, C. K., & Tofighi, D. (2003). Centering predictor variables in cross-sectional
multilevel models: A new look at an old issue. Psychological Methods, 12(2), 121-138.
Feuer, M. J. (2008). Future directions for educational accountability: Notes for a poitical
economy of measurement. In K. Ryan & L. Shepard (Eds.), The future of test-based educational
accountability, (pp293-306). New York: Routledge.
Figlio, D. N., & Getzler, L. S. (2002). Accountability, ability and disability: gaming the
system. Working paper 9307. Cambridge, MA: National Bureau of Economic Research.
Finnigan, K. S., & Gross, B. (2007). Do accountability policy sanctions influence teacher
motivation? Lessons from Chicago’s low-performing schools. American Educational Research
Journal, 41(3), 594-630.
Forsyth, P. B., Adams, C. M., & Hoy, W. K. (2011). Collective trust: Why schools can’t
improve without it. New York, NY: Teachers College Press.
Forte, E. (2010). Examining the assumptions underlying the NCLB federal accountability
policy on school improvement. Educational Psychologist, 45(2), 76-88.
Fusarelli, L. D. (2004). The potential impact of the No Child Left Behind Act on equity
and diversity in American education. Educational Policy, 18(1), 71-94.
Haladyna, T. M., Nolen, S. R., & Haas, N. S. (1991). Raising standardized achievement
test scores and the origins of test score pollution. Educational Researcher, 42(17), 2-7.
Hall, D. (2013). A step forward or a step back? State accountability in the waiver era. The
Education Trust. Retrieved at http://files.eric.ed.gov/fulltext/ED543222.pdf
Hamilton, L.S., Schwartz, H.L., Stecher, B.M., & Steele, J.L. (2013). Improving
accountability through expanded measures of performance. Journal of Educational
Administration, 51(4), 453-475.
Harris, D. N. (2011). Value-added measures in education: What every educator needs to
know. Cambridge, MA: Harvard Press.
Heck, R. H. (2009). Teacher effectiveness and student achievement: Investigating a
multilevel cross-classified model. Journal of Educational Administration, 47, 227-249.
Heilig, V. J., & Darling-Hammond, L. (2008). Accountability Texas-style: The
progress and learning of urban students in a high-stakes testing context. (2), 75-110.
Informational Significance of A-F Grades 43
Educational Evaluation and Policy Analysis, 30
Ho, A D. (2008). The problem with “proficiency”: Limitations of statistics and policy
under No Child Left Behind. Educational Researcher, 37(6), 351-360.
Howe, K.R. & Murray, K. (2015). Why School Report Cards Merit a Failing Grade.
Boulder, CO: National Education Policy Center. Retrieved from
http://nepc.colorado.edu/publication/why-school-report-cards-fail.
Hox, J. J. (2010). Multilevel analysis: Techniques and applicaitons (2nd ed.). New York,
NY: Routledge.
Jencks, C., & Phillips, M. (1998). America’s next achievement test: Closing the blackwhite test score gaps. The American Prospect, 9(40), 44-53.
Kane, T. J., & Staiger, D. O. (2002). The promise and pitfalls of using imprecise
school accountability measures. Journal of Economic Perspectives, 16(1), 91-114.
King, B., & Minium, E. (2003). Statistical Reasoning in Psychology and Education,
4th ed., Hoboken: Wiley.
Lee, J. (2008). Is test-driven external accountability effective? Synthesizing the evidence
from cross-state causal-comparative and correlational studies. Review of Educational Research,
78(3), 608-644.
Lee, J. (2006). Tracking achievement gaps and assessing the impact of NCLB on the
gaps: An in-depth look into national and state reading and math outcome trends. The Civil
Rights Project at Harvard University. President and Fellows of Harvard College.
Lee, J. (2002). Racial and ethnic achievement gap trends: Reversing the progress toward
equity? Educational Researcher, 31(1), 3-12.
Linn, R. L. (2008). Educational accountability systems. In K. Ryan and L. Shepard
(Eds.), The future of test-based educational accountability (pp. 3-24). New York, NY:
Routledge.
Linn, R. L. (2005). Conflicting demands of No Child Left Behind and state systems:
Mixed messages about school performance. Education Policy Analysis Archives, 13(33), 1-20.
Linn, R. L., & Haug, C. (2002). Stability of school building accountability scores and
gains. CSE Technical Report 561. National Center for Research on Evaluation, Standards, and
Student Testing. University of California, Los Angeles.
McNeil, M. (2012). States punch reset button with NCLB waivers. Education Week.
Retrieved from
Informational Significance of A-F Grades 44
http://www.edweek.org/ew/articles/2012/10/17/08waiver_ep.h32.html?tkn=NSLFJ%2BWQnkq
PlIMGUAUBakJda6JiHNTaJZDt&intc=es.
Messick, S. (1995). Validity of psychological assessment; Validation of inferences from
persons’ responses and performances as scientific inquiry into school meaning. American
Psychologist, 50(9), 741-749.
Miller, D. M. (2008). Data for school improvement and educational accountability:
reliability and validity in practice. In K. Ryan & L. Shepard (Eds.), The future of test-based
educational accountability (pp 249-262). New York: Routledge
Mintrop, H., & Sunderman, G. L. (2009). Predictable failure of federal sanctions-driven
accountability for school improvement and why we may retain it anyway. Educational
Researcher, 38(5), 353-364.
Moller, J. (2008). School leadership in an age of accountability: Tensions between
managerial and professional accountability. Journal of Educational Change. Available on-line:
10.1007/s10833-008-9078-6.
National Center for Education Statistics. (2013). NAEP 2012: Trends in Academic
Progress, Reading 1971-2012, Math 1973-2012. US Department of Education. Retrieved from
http://nces.ed.gov/nationsreportcard/subject/publications/main2012/pdf/2013456.pdf
Neal, D. & Schanzenbach, D. W. (2010). Left behind by design: Proficiency counts and
test-based accountability. The Review of Economics and Statistics, 92(2), 263-283.
Nye, B., Konstantopoulos, S. & Hedges, L. V. (2004). How large are teacher effects?
Educational Evaluation and Policy Analysis, 26, 237-257.
OCEP & CERE. (2012). An examination of the Oklahoma State Department of
Education’s A-F report card. The Oklahoma Center for Education Policy, University of
Oklahoma, and The Center for Educational Research and Evaluation, Oklahoma State
University.
O’Day, J. A. (2002). Complexity, accountability, and school improvement. Harvard
Educational Review, 72(3), 293-329.
Oklahoma State Department of Education. (2012). Oklahoma school testing program,
Oklahoma core curriculum tests, Grades 3 to 8 assessments. Pearson.
Oklahoma State Department of Education. (2014l). 2014 A-F Report Card Technical
Guide. Retrieved from:
http://ok.gov/sde/sites/ok.gov.sde/files/documents/files/AtoF_Report_Card_Technical_Guide_828-2014.pdf.
Informational Significance of A-F Grades 45
Quality Counts. (2014). District disruption and revival: School systems reshape to
compete and improve. Education Week. Retrieved from:
http://www.edweek.org/ew/toc/2014/01/09/index.html
Pearson, Inc. (2012) Technical Report of the Oklahoma School Testing Program,
Oklahoma Core Curriculum Tests, Grades 3 to 8 Assessments, Pearson, inc.
Polikoff, M. McEachin, A., Wrabel, S. Duque, M. (2014). The waive of the future?
School accountability in the waiver era. Educational Researcher. Retrieved from: http://wwwbcf.usc.edu/~polikoff/Waivers.pdf
Popham, J. (2007). The no-win accountability game. In C. Glickman (Ed.), Letters to the
next president: What we can do about the real crisis in public education (pp. 166-173). New
York: Teachers College Press.
Raudenbush, S. W. (2004). Schooling, statistics, and poverty: Can we measure
school improvement? Princeton, NJ: Educational Testing Service.
Reardon, S. F. (2011). The widening academic achievement gap between the rich and
poor: New evidence and possible explanations. In G. J. Duncan & R. J. Murnane (Eds), Wither
opportunity? Rising inequality, schools, and children’s life chances, (pp.91-116). New York,
NY: Russell Sage Foundation.
Reeve, J. (2002). Self-determination theory applied to educational settings. In E. Deci
and R. Ryan (Eds.), Handbook of Self-Determination Research, (pp. 183-204). Rochester, NY:
University of Rochester Press.
Reeve, J., & Halusic, M. (2009). How k-12 teachers can put self-determination theory
principles into practice. Theory and Research in Education, 7(2), 145-154.
Reeve, J., & Jang. H. (2006). What teachers say and do to support students’ autonomy
during a learning activity. Journal of Educational Psychology, 98(1), 209-218.
Rothstein, R. (2009). Getting accountability right. Education Week.
Retrieved from
http://www.csun.edu/~krowlands/Content/SED610/reform/Getting%20Accountability%20Right.
pdf.
Rothstein, R., Jacobson, R., & Wilder, T. (2008). Grading education: Getting
accountability right. New York, NY: Teachers College.
Ryan, R. M., & Deci, E. L. (2012). Overview of self-determination theory: An
organismic dialectical perspective. In R. Ryan (Ed.), The Oxford Handbook of Human
Motivation, (pp.3-33). Oxford: Oxford University Press.
Ryan, R. M. & Deci, E. L (2002). Overview of self-determination theory: An organismic
Informational Significance of A-F Grades 46
dialectical perspective. In E. Deci and R. Ryan (Eds.), Handbook of Self-Determination Theory
Research, (pp. 3-36). Rochester, NY: University of Rochester Press.
Ryan, R. M. & Weinstein, N. (2009). Undermining quality teaching and learning: A selfdetermination theory perspective on high-stakes testing. Theory and Research in Education, 7(2),
224-233.
Sahlberg, P. (2008). Rethinking accountability in a knowledge society. Journal of
Educational Change. Published on-line: 10.1007/s10833-008-9098-2.
Schlechty, P. C. (2010). Leading for Learning: How to transform schools into
learning organizations. San Francisco, CA: Wiley.
Schwartz, H. L., Hamilton, L. S., Stecher, B. M., & Steele, J. L. (2011). Expanded
Measures of School Performance. Technical Report: Rand Corporation.
Sirotnik, K. A. (2005). Holding accountability accountable. What ought to matter in
public education. New York: Teachers College Press.
Sirotnik, K. A. (2002). Promoting Responsible Accountability in Schools and
Education. The Phi Delta Kappan, 83(9), 662-673.
Sunderman, G. L., & Kim, J. S. (2005). Measuring academic proficiency under the No
Child Left Behind Act; Implications for educational equity. Educational Researcher, 34(8), 3-13.
US Department of Education. (2012). EASA Flexibility – Request. Retrieved from:
http://www2.ed.gov/policy/elsec/guid/esea-flexibility/index.html
Ushomirsky, N., Wiliams, D., Hal, D. (2014). Making sure all children matter” getting
school accountability signals right. Washington, DC: The Education Trust.
Whitford, B. L., & Jones, J. (2000). Accountability, Assessment, and Teacher
Commitment: Lessons from Kentucky’s Reform Efforts. New York: State University of New
York.
Williams, G. C. (2002). Improving patients’ health through supporting the autonomy of
patients and providers. In E. Deci and R. Ryan (Eds.), Handbook of Self-Determination Theory
Research, (pp. 233-254). Rochester, NY: University of Rochester Press.
National Center for Education Statistics (2013). The Nation's Report Card: Trends in
Academic Progress 2012 (NCES 2013–456). National Center for Education Statistics, Institute
of Education Sciences, U.S. Department of Education, Washington, D.C.
i
When use quotation marks when referring to letter grades associated with a schools ranking. No quotation marks
are used for a general reference of letter grades.