Good teachers matter and鈥攁s in every other profession鈥攕ome are better than others. Researchers have even found that the very best teachers can help students overcome many of the effects of poverty and catch up to or surpass their more privileged peers.
That鈥檚 why there is intense interest now in finding better ways to judge the relative effectiveness of teachers. But how should that be done? Most teacher evaluations not only fail to single out successful teachers鈥攖hey also don鈥檛 help principals determine which teachers need help to improve and which ones are failing their students altogether. Instead, all teachers end up being judged the same, which is to say, satisfactory.
鈥淚t鈥檚 universally acknowledged鈥攖eacher evaluations are broken,鈥 said Timothy Daly, president of , a group that helps school districts recruit and train teachers.
Perhaps surprisingly, teacher-union leaders agree. Michael Mulgrew, president of New York City鈥檚 (UFT), said last spring that 鈥渢he current evaluation system doesn鈥檛 work for teachers鈥攊t鈥檚 too subjective, lacks specific criteria and is too dependent on the whims and prejudices of principals.鈥
So, it would seem that a system using student test scores to calculate how much 鈥渧alue鈥 teachers add to their students鈥 learning would be fairer. Indeed, Mulgrew endorsed New York state鈥檚 new evaluation system, in which student achievement counts for 40 percent of a teacher鈥檚 rating.
鈥淰alue-added鈥 measurements use complex statistical models to project a student鈥檚 future gains based on his or her past performance, taking into account how similar students perform. The idea is that good teachers add value by helping students progress further than expected, and bad teachers subtract value by slowing their students down.
Using value-added models to calculate teacher effectiveness wasn鈥檛 possible on a wide scale until recently. In the 1990s, William L. Sanders, a statistician at the University of Tennessee, pioneered the technique with student test scores鈥攁nd managed to persuade the Hamilton County School Board to work with him in taking a closer look at the results.
The method鈥攁s Sanders puts it鈥攊s like measuring a child鈥檚 height on a wall. It tracks a child鈥檚 academic growth over the year, no matter how far ahead or behind the child was initially. Sanders discovered that teacher quality varied greatly in every school, and he and others also found that students assigned to good teachers for three consecutive years tended to make great strides, while those assigned to three poor ones in a row usually fell way behind.
Why Value-Added Is Hot Now
Hundreds of districts, including Chicago, Denver, New York City and Washington, D.C., are using such methods as a way to strengthen their teacher evaluations by factoring in student performance. The biggest push for the use of such methods has come from the Obama administration, which insisted that states competing for grants under its $4.3 billion Race to the Top program find ways to link student performance to teacher evaluations. The dozen winners of those grants are now struggling to figure out how to do just that.
At the same time, however, value-added modeling is the focus of furious debate among scholars, policymakers, superintendents, education advocates and journalists. The latest flare-up is occurring this week in New York City. The New York Times, Wall Street Journal, New York Daily News and others are seeking the value-added rankings of about 12,000 teachers in grades 4 through 8 whose students took state English and math tests.
The New York City Department of Education says it鈥檚 willing to make those scores public. But the UFT is suing to block their release. Value-added ratings, the union says in its lawsuit, are 鈥渦nreliable, often incorrect, subjective analyses dressed up as scientific fact.鈥 The union calls the calculations a 鈥渃omplex and largely subjective guessing game.鈥
In August, the Los Angeles Times was the subject of intense criticism and praise for its series that included value-added scores for individual teachers based on years of standardized test data鈥攁 project that newspapers in New York City now want to replicate. (Disclosure: The Los Angeles Times data-analysis was supported in part by a grant from The Hechinger Report.)
The documentary 鈥淲aiting for 鈥楽uperman鈥,鈥 directed by Davis Guggenheim, also has thrust the teacher-evaluation issue into the national spotlight, highlighting as it does the historical disconnect between teacher job-security and student performance.
Limitations of Value-Added
Value-added models aren鈥檛 perfect, as even their most ardent supporters concede. Oft-cited shortcomings range from doubts about fairness to broader concerns centered on teaching goals. 鈥淲hen people talk about their experience with a really good teacher, they鈥檙e not talking about test scores,鈥 said Aaron Pallas, professor of sociology and education at Teachers College, Columbia University. 鈥淭hey鈥檙e talking about a teacher who gave them self-confidence, the ability to learn, an interest and curiosity about certain subjects.鈥
Critics point out that value-added data are only as good as the standardized tests鈥攁nd test quality varies greatly from state to state. There are also many ways to calculate value-added scores, and different statistical techniques yield different results. The calculations may take into account factors that can affect achievement, such as class size, a school鈥檚 funding level and student demographics. Whether to include the race and poverty-status of students when measuring teachers is particularly contentious, writes Douglas N. Harris, an economist at the University of Wisconsin, Madison, in a report on value-added models released this week.
Whatever the computational method, a teacher鈥檚 score can vary significantly from one year to the next鈥攔esults that could affect a teacher鈥檚 reputation and salary in places that are considering linking teacher pay to performance. And while value-added models may do a decent job of highlighting the best and worst performers, they鈥檙e widely considered unreliable in differentiating the good from the mediocre (or the mediocre from the terrible).
For this reason, many want value-added calculations only to be used in assessing schools and curricula鈥攏ot individual teachers. But more and more, value-added data are playing a role in personnel decisions about bonuses, tenure and dismissal. In July, D.C. Schools Chancellor Michelle Rhee made waves by firing 165 teachers for poor evaluations, half of which depended on value-added data. With Adrian Fenty鈥檚 loss in the Democratic primary for mayor of D.C. last month, Rhee has announced her resignation鈥攂ut her teacher-evaluation system, , will remain in place.
What Lies Ahead
Even those who champion value-added measures caution against using them as the sole means of evaluating teachers. Kate Walsh, president of the , a research and advocacy group in Washington, D.C., has called value-added the best teacher-evaluation method so far. But she also says it would be a 鈥渉uge mistake鈥 to rely on it alone, or even primarily.
Randi Weingarten, president of the , does not oppose the use of value-added data but wants to ensure evaluations are based on 鈥渃lassroom observations, self-evaluations, portfolios, appraisal of lesson plans, students鈥 written work鈥 as well.
The best uses of value-added data may well be in the future. If educators could use the data to figure out what the most effective teachers are doing right and share that with colleagues, it would be a great boon. But while major foundation money is being spent to try and do just that, it is very difficult, especially given that great teachers often don鈥檛 know themselves what they鈥檙e doing right.
Whatever the future uses of value-added measures, the idea of holding teachers accountable for student performance seems here to stay.
鈥淚t鈥檚 a valuable part of the conversation,鈥 said Daly, of The New Teacher Project. 鈥淚t puts what matters most鈥攕tudent achievement鈥攆ront and center as the most important responsibility for a teacher.鈥