The day we can accurately measure a teacher鈥檚 performance has finally arrived. Or so the likes of District of Columbia Schools Chancellor Michelle A. Rhee and New York City Mayor Michael R. Bloomberg would have us believe.
In a speech this past fall in Washington, but directed at the New York state legislature, 鈥渄ata-driven systems,鈥 while arguing that student test scores should be linked to teacher-tenure decisions. His preferred analogy was to medicine: To prohibit the use of student test-score data in such decisions, Bloomberg explained, would be as insane and inane as 鈥渟aying to hospitals, 鈥榊ou can evaluate heart surgeons on any criteria you want鈥攋ust not patient-survival rates.鈥 鈥
John Merrow, the education correspondent for 鈥淧BS NewsHour,鈥 favors instead the : If half the class nearly drowns when trying to demonstrate what they鈥檝e learned, we鈥檇 be downright daft not to find fault with the teacher.
The logic behind both analogies is seductive. If someone鈥檚 job is to teach you something and yet you don鈥檛 learn it, or you aren鈥檛 able to demonstrate you鈥檝e learned it, then isn鈥檛 the only reasonable conclusion that the teacher has failed you?
People like Rhee, Bloomberg, and Merrow are so certain of their positions鈥攁nd so wildly confident in the data鈥攖hat another perspective seems all but impossible. The clear implication is that to disagree with them, you鈥檇 have to be mentally ill, hopelessly naive, or wholly heartless.
But as seductive and seemingly straightforward as their logic appears, it also turns out to be deeply flawed.
Merrow鈥檚 analogy, for instance, ignores a very important reality: A 10-year-old who鈥檚 ostensibly been taught to swim has much greater motivation to successfully display his newfound knowledge in the pool than do students who鈥檝e been taught fractions in math class. No math student has ever drowned because he couldn鈥檛 multiply one-half by one-third.
Rhee, Bloomberg, Merrow, and the many others now beating the fashionable drum of 鈥渄ata driven鈥 accountability in education鈥攔ight on up to U.S. Secretary of Education Arne Duncan and President Barack Obama鈥攕eem determined to ignore some basic truths about both education and statistical analysis.
Numbers impress. But they also tend to conceal more than they reveal."
First, when a student fails to flourish, it is rarely the result of one party. Rather, it tends to be a confluence of confounding factors, often involving parents, teachers, administrators, politicians, neighborhoods, and even the student himself. If we could collect data that allowed us to parse out these influences accurately, then we might be able to hold not just teachers but all parties responsible. At present, however, we are light-years away from even understanding how to collect such data.
Second, learning is not always, or easily, captured by high-stakes tests. A student鈥檚 performance on a given day reflects a whole lot more than what his teacher has or hasn鈥檛 taught him.
When it comes to school accountability, today鈥檚 favorite catchphrase is 鈥渧alue added鈥 assessment. The idea is that by measuring what students know at both the beginning and the end of the school year, and by simply subtracting the former from the latter, we鈥檙e able to determine precisely how much 鈥渧alue鈥 a given teacher has 鈥渁dded鈥 to his or her students鈥 education. Then we can make informed decisions about tenure and teacher compensation. After all, why shouldn鈥檛 teachers whose students learn more than most be better compensated than their colleagues? Why shouldn鈥檛 teachers whose students learn little be fired?
The short answer to both questions is because our current data systems are a complete mess. We tend to collect the wrong kinds of data, partly to save money and partly because we鈥檙e not all that good at statistical analysis.
The accountability measures of the federal No Child Left Behind Act, for instance, are based on cross-sectional rather than longitudinal data. In layman鈥檚 terms, this means that we end up comparing how one set of 7th graders performs in a given year with how a different set of 7th graders performs the following year. Experts in data analysis agree that this is more than a little problematic. A better system鈥攐ne based on longitudinal data鈥攚ould instead compare how the same set of students performs year after year, thereby tracking change over time. But these are not the data we currently collect, in large part because doing so is difficult and expensive.
There鈥檚 no denying that we love data. Indeed, we are enthralled by statistical analyses, even鈥攐r especially鈥攚hen we don鈥檛 understand them. Numbers impress. But they also tend to conceal more than they reveal.
Every educator knows that teaching is less like open-heart surgery than like conducting an orchestra, as the Stanford University professor Linda Darling-Hammond has suggested. 鈥淚n the same way that conducting looks like hand-waving to the uninitiated,鈥 she says, 鈥渢eaching looks simple from the perspective of students who see a person talking and listening, handing out papers, and giving assignments. Invisible in both of these performances are the many kinds of knowledge, unseen plans, and backstage moves鈥攖he skunkworks, if you will鈥攖hat allow a teacher to purposefully move a group of students from one set of understandings and skills to quite another over the space of many months.鈥
Until we get much better at capturing the nuances of such a performance, we should be wary of attempts to tie teacher tenure and compensation to student test scores.