Includes updates and/or revisions.
A by a public and labor economist suggests that “value added” methods for determining the effectiveness of classroom teachers are built on some shaky assumptions and may be misleading.
The study, due to be published in February in the Quarterly Journal of Economics, is the first of a handful of papers now in the publishing pipeline that are widely seen to be contributing important evidence, both pro and con, regarding the use of value-added assessments, which gauge the effectiveness of schools and teachers by measuring the gains that their students make on standardized tests over the course of a school year.
A small number of states have used value-added measures since the early 1990s to track how well their schools are doing. But the method is coming in for new scrutiny amid growing interest from politicians in performance-pay plans for teachers.
Proponents say value-added measures provide a fair way of deciding which teachers deserve financial rewards by objectively measuring the learning gains students make from fall to spring, rather than students’ absolute achievement levels. By gauging that progress during the school year, proponents reason, teachers are not getting unjust blame for the learning deficits that students bring to their classes or undue rewards for being blessed with a class of high achievers.
The Quarterly Journal study was conducted by Jesse Rothstein, who left Princeton University recently to become an associate professor of public policy at the University of California, Berkeley. Mr. Rothstein, drawing on data for 99,000 North Carolina students who were in 5th grade during the 2000-01 school year, makes his case by using a “falsification” test. For example, he asked, what effect do 5th grade teachers have on their students’ test scores in 3rd and 4th grades?
Because it’s impossible for even the best teachers to have an impact on students’ previous learning, Mr. Rothstein reasons, there should be no impact.
Not Random?
But Mr. Rothstein’s analysis, using three different value-added models, found quite large effects—even though he was comparing teachers in the same years, grades, and schools.
To explain the findings, he suggested that students may well not have been randomly assigned to classrooms. Instead, they may have been sorted into classes based in some way on their prior achievement. A principal might, for example, assign students with behavior problems to teachers known to have a way with problem students or reward more senior teachers with high achievers.
“Anybody who’s had a kid in elementary school has tried to exert some influence over that kind of nonrandom assignment,” said Mr. Rothstein. Yet, he added, value-added calculations are based on the assumption that students’ classroom assignments are random, overlooking the day-to-day reality of what happens in schools.
Mr. Rothstein also looks at teachers’ long-run effects on individual students and finds that they tend to decay or fade out after the first year. He also finds wide variation in teachers’ effectiveness over time. For example, only about a third of the teachers who landed in the top quintile of the study sample based on their accumulated effects over two years fell in that same category based on their one-year effects. That suggests to Mr. Rothstein that some of those teachers were misclassified.
“There’s a tendency to assume these value-added scores may be noisy, but they are measures of teachers’ true effects,” Mr. Rothstein said. “We can’t make that leap as quickly as we’d like to.”
His study was posted online last month by Princeton’s Woodrow Wilson School of Public and International Affairs.
Cory R. Koedel, an assistant professor of economics at the University of Missouri in Columbia, said Mr. Rothstein’s paper is “huge in the literature, because he’s pointing out some things the literature had overlooked.”
In that is being revised for publication in the journal Education Finance and Policy, Mr. Koedel and his research partner, Julian R. Betts of the University of California, San Diego, found that researchers can mitigate some of the effects of student-sorting bias by incorporating more years of data from teachers’ classes into value-added calculations.
To make their case, the researchers tried to replicate Mr. Rothstein’s techniques with testing data from four waves of 4th graders moving through San Diego public schools between the 1998-99 and 2001-02 school years. Although they found similar signs of sorting bias with one year of test scores, the size of that effect was reduced with more years of data.
For that reason, the researchers conclude, policymakers may want to be cautious about using achievement gains from a first-year teacher to make important decisions about salaries or retention. Many districts make teachertenure decisions in less than three or four years, experts said.
But in practice, sorting biases may not be as problematic as some experts think, assert the authors of a third article in the publishing pipeline.
An Experiment in L.A.
For , researchers Thomas J. Kane and Douglas O. Staiger use data from Los Angeles public schools to conduct what is widely considered to be the first experimental test of value-added modeling. The paper is being revised for possible publication in a not-yet-named economics journal, according to Mr. Staiger, who is an economics professor at Dartmouth College in Hanover, N.H.
The study was part of a larger evaluation of the National Board for Professional Teaching Standards, an Arlington, Va.-based group that confers special status on teachers who demonstrate that they can meet its criteria, in Los Angeles elementary schools. Researchers used data from the study to determine if value-added calculations could yield the same results on teachers’ effectiveness as an experiment in which students are randomly assigned to teachers.
To find out, the researchers recruited 78 principals of elementary schools in which one or more teachers had applied to be certified by the national board.
The principals identified experienced teachers in the same grades who were comparable to the nbpts applicants and then made up rosters of classes that they would be comfortable assigning to either the comparison teacher or the nbpts applicant. To further control for any possible student-assignment bias, district administrators later randomly switched
the rosters and notified principals that the switch had occurred.
In most cases, the researchers found, the value-added calculations matched the experimental results: They accurately identified which of the teachers in each school seemed to do a better job of producing bigger learning gains.
“What our paper suggests is that controlling for students’ prior-year setting is enough to eliminate the bias problem. Still, I think I want to see a lot more experiments like ours done in other settings,” said Mr. Staiger. “In our experiment, all you could learn about was two teachers in the same schools where the schools were also willing to randomize them.”
The 78 principals in the experiment, Mr. Staiger noted, were the only ones who agreed to the random-assignment procedure from among 200 principals who were invited to take part.
The study also found, as Mr. Rothstein did, that the teacher effects faded in their students’ performance from one year to the next, which may be the more important issue, according to Mr. Staiger.
“When calculating the potential value of shifting the teacher-effectiveness distribution, we and others have typically assumed that the effects of a strong teacher persist in the children they teach,” write Mr. Staiger and Mr. Kane, who is the faculty director of the Seattle-based Bill & Melinda Gates Foundation’s project on policy innovation.
“Critics will point to Jesse’s paper and say value-added doesn’t solve the [bias] problem,” said Douglas N. Harris, an economist at the University of Wisconsin-Madison. “Defenders will say, well, we’ve got this experiment. ....We’ve got Koedel and Betts saying the problem can be reduced. We still have the reliability problem, but it’s better than the alternatives.”
“Ultimately, it’s probably even the wrong debate,” Mr. Harris added. “We need to know how to use the measure in practice to improve school performance.”
Experts say a fourth experiment to get under way this fall may help shed light on how to do that.
Researchers at Mathematica Policy Research Inc. hope to randomly assign teachers with high value-added scores to classes in disadvantaged, hard-to-staff schools in seven districts. The target teachers are being offered $20,000 bonuses to agree to move to the disadvantaged schools and stay for two years.
At random, principals will fill existing vacancies by either recruiting from the “high value-added” pool or using their normal hiring procedures. But, according to Steven Glazerman, who is leading the study for the Princeton, N.J.-based research group, the final results are still a few years away.