More than a decade ago, policymakers made a multi-billion-dollar bet that strengthening teacher evaluation would lead to better teaching, which in turn would boost student achievement. But , overall, those efforts failed: Nationally, teacher evaluation reforms over the past decade had no impact on student test scores or educational attainment.
The research is the latest indictment of a massive push between 2009 and 2017, spurred by federal incentives, philanthropic investments, and a nationwide drive for accountability in K-12 education, to implement high-stakes teacher evaluation systems in nearly every state.
Prior to the reforms, in their evaluations. So policymakers from both political parties introduced more-robust classroom observations and student-growth measures鈥攊ncluding standardized test scores鈥攊nto teachers鈥 ratings, and then linked the performance ratings to personnel decisions and compensation.
鈥淭here was a tremendous amount of time and billions of dollars invested in putting these systems into place, and they didn鈥檛 have the positive effects reformers were hoping for,鈥 said Joshua Bleiberg, an author of the study and a postdoctoral research associate at the Annenberg Institute for School Reform at Brown University. 鈥淭here鈥檚 not a null effect in every place where teacher evaluation [reform] happened. ... [But] on average, [the effect on student achievement] is pretty close to zero.鈥
The evaluation reforms were largely unpopular among teachers and their unions, who argued that incorporating certain metrics, like student test scores, was unfair and would drive good educators out of the profession. Yet proponents鈥攊ncluding the Obama administration鈥攁rgued that tougher evaluations could identify, and potentially weed out, the weakest teachers while elevating the strongest ones.
鈥淲e think the goal of great teaching is to have students learn; and to have student learning be a piece of teacher evaluation, I think, actually gives the profession the respect it deserves,鈥 said Arne Duncan, who served as President Obama鈥檚 education secretary from 2009 to 2016, in an EdWeek interview in 2015.
But teachers said the focus on student growth measures stripped away the emphasis on building relationships with students.
鈥淚t took away the overall focus on the kid and the overall focus on teaching,鈥 said Erin Scholes, an innovation coordinator at a Connecticut middle school who has been in the classroom for 15 years. 鈥淚 felt like [the reforms] hit the science of teaching rather than the art of teaching and tried to fit everyone in the same box.鈥
Researchers found no positive effects on student outcomes
A team of researchers from Brown and Michigan State Universities and the Universities of Connecticut and North Carolina at Chapel Hill analyzed the timing of states鈥 adoption of the reforms alongside district-level student achievement data from 2009 to 2018 on standardized math and English/language arts test scores. They also analyzed the impact of the reforms on longer-term student outcomes, including high school graduation and college enrollment. The researchers controlled for the adoption of other teacher accountability measures and reform efforts taking place around the same time, and found that their results remained unchanged.
They found no evidence that, on average, the reforms had even a small positive effect on student achievement or educational attainment.
The study鈥檚 authors noted that the design and implementation of the reforms fell short of the recognized best practices for performance management systems. Under a program known as Race to the Top, the Obama administration offered states $4.35 billion in competitive grants for enacting certain policy changes, including incorporating student achievement data in their evaluation systems. The government also used a waiver system that would allow states to receive some regulatory relief from stringent federal requirements if they implemented more accountability measures for teachers.
But in practice, implementation proved difficult in most places, with most teachers still receiving satisfactory ratings under the new evaluation systems. Performance-based dismissals were still rare, and states that linked evaluation ratings to compensation often offered only small bonuses or set the bar so low that most teachers qualified.
Also, the reforms decreased job satisfaction among new teachers who felt like they had little autonomy to do their best work, the paper noted. And they added significant demands to administrators鈥 already burdensome workload.
鈥淚t was really the worst of all worlds,鈥 said Michael Petrilli, the president of the Thomas B. Fordham Institute, a conservative education think tank that advocated for more teacher accountability. 鈥淚t was just a big paperwork exercise. It led to a lot of anxiety and bad morale. Not only did it have no findings [of positive effects on student outcomes], it had real-world consequences that were almost entirely negative.鈥
Tougher teacher-evaluation systems can work, Petrilli said鈥攂ut there was no political will to act on the results at the time of the reforms. Teachers鈥 unions resisted firing teachers who received poor results, and districts were unwilling or unable to pay great teachers more, he said.
Indeed, past research done in 2017 found that principals continued to rate nearly all teachers as effective, even though researchers found the principals would give harsher ratings in confidence with no stakes attached.
鈥淲e just don鈥檛 have a system in the country that鈥檚 well set up to push the rapid implementation of any education reform, including teacher evaluation,鈥 Bleiberg said. 鈥淵ou see a lot of superficial adoption鈥攖hat鈥檚 likely to lead to the null effects overall.鈥
Evaluation reform has already changed course
States overhauled their teacher-evaluation systems quickly, and then many reversed course within just a few years. A National Council on Teacher Quality analysis found that the number of states that required student-growth data in teacher evaluations went from 15 in 2009 to 43 in 2015鈥攁nd then back down to 34 in 2019.
The changes were in part due to the increased flexibility states now have under the Every Students Succeeds Act, which stripped the U.S. secretary of education of the power to determine how states grade their teachers.
Also, other research into the outcomes of evaluation reform has produced similarly discouraging results. For example, a $575 million effort, funded in part by the Bill & Melinda Gates Foundation, to implement new teacher-evaluation systems in three large school districts was found to have been largely ineffective in increasing student achievement.
Experts say the results show the difficulties of implementing any large-scale reform, but in particular a top-down model that was forced onto districts and adopted without much buy-in from those on the ground. And some say the evaluation reforms were done without considering other constraints on the profession.
鈥淵es, most of our teachers could be better at their jobs, but it鈥檚 not because they鈥檙e not trying hard enough,鈥 said Jack Schneider, an associate professor of leadership in education at the University of Massachusetts Lowell. 鈥淚t鈥檚 because they teach too much, they have too many students in their classrooms, they don鈥檛 have relevant and sustained professional development opportunities, they don鈥檛 have adequate support from school leaders who themselves are overburdened in schools. There鈥檚 a lot we could do if we wanted to strengthen the teaching profession, but most of these reforms didn鈥檛 really address the fundamental barriers that keep teachers from being their best professional selves.鈥
The reforms were also demoralizing for teachers, said Rebecca Garelli, a science education consultant who taught for 14 years and left the classroom partly because of the increased focus on student test scores.
鈥淭o tie those test scores to my evaluation was something I innately struggled with from the beginning,鈥 she said. 鈥淚t never made sense to me to take something so human and turn it into something so non-human.鈥
Even so, there are bright spots in teacher-evaluation reform, many say, most notably in Washington, D.C. The district鈥檚 teacher-evaluation system, known as IMPACT, ties student test scores to teachers鈥 job security and paychecks. Under the system, teachers who receive 鈥渋neffective鈥 scores are subject to dismissal, and teachers who score 鈥渕inimally effective鈥 or 鈥渄eveloping鈥 could face dismissal if they don鈥檛 improve. 鈥淗ighly effective鈥 teachers, however, are eligible for financial rewards and professional opportunities.
Research has found that lower-performing teachers in the District of Columbia school system are more likely to voluntarily leave than their higher-performing counterparts. When they leave, they are replaced by teachers with higher IMPACT scores, and student achievement increases. And when they do stick around, their performance tends to improve.
Other states and districts used similar evaluation systems, but there were some key differences, the study鈥檚 authors said.
The former D.C. school chancellor, Michelle Rhee, and the local teachers鈥 union had a long, bitter dispute about the details of evaluation reform, but , with both sides making concessions, Bleiberg said. (Even so, the teachers鈥 union says the evaluation system has created a culture of fear in the district. And a recent study found that , with white teachers on average receiving higher scores than their Black and Hispanic peers.)
In many places, governors didn鈥檛 work with teachers鈥 unions before implementing evaluation reforms, Bleiberg said: 鈥淚t was a reform that was all about teachers and didn鈥檛 really end up getting them on board.鈥
鈥榃e know it鈥檚 possible鈥 to achieve positive outcomes
Still, the results in Washington and in other cities show that high-stakes teacher-evaluation systems can work, said Kate Walsh, the president of the National Council on Teacher Quality, a Washington-based group in favor of measuring teacher effectiveness through objective data like test scores.
鈥淲e know it鈥檚 possible for teacher evaluation [reform], when well-implemented, to achieve great outcomes,鈥 she said. 鈥淲e know it鈥檚 theoretically possible, and we know it鈥檚 practically possible.鈥
But there鈥檚 little evidence to suggest a large number of school districts can meaningfully implement any sort of reform and get positive results, Walsh said, especially in a relatively short amount of time.
鈥淚 think people were serious about it for two years max鈥攜ou鈥檙e not going to get good outcomes in a couple years,鈥 she said. 鈥淵ou have to do it a while before you can reap the benefits.鈥
Also, teacher-evaluation systems cannot be changed in a vacuum, said Garrett Landry, the founder and CEO of Steady State Impact Strategies, a consulting firm working with school districts in Texas to reform the way they identify鈥攁nd reward鈥攅ffective teachers.
Teachers have to have the right conditions for success, he said, and improving teacher quality has to start with ensuring principal quality. Landry said districts should anchor their teacher-evaluation systems in growth and delineate clear targets for teachers to meet.
鈥淲e don鈥檛 really have time [to waste] in education. 鈥 If we don鈥檛 get [students] on track early, it鈥檚 really hard to catch them up,鈥 he said. 鈥淲e really need the best and brightest educators, and too many systems can鈥檛 tell me who the best educators are. Everybody looks the same on paper.鈥
There鈥檚 currently little political appetite to try again with teacher-evaluation reform, Bleiberg said. That鈥檚 in part due to the pandemic, which has dampened teacher morale, but he also thinks policymakers will need to take time to generate more buy-in and address the fundamental challenges of implementation.
But Walsh said the issue will come up again, as part of the cyclical nature of school reform.
鈥淚t鈥檚 not acceptable to have an evaluation system where everyone gets the same rating,鈥 she said. 鈥淏ecause we didn鈥檛 do it well [the last time] doesn鈥檛 mean it can鈥檛 be done well. We鈥檝e just got to find a different way.鈥