President Obama recently announced the broad outlines of a new education plan. (“Rigor, Rewards, Quality: Obama’s Education Aims,” March 18 and “Obama Echoes Bush on Education Ideas,” April 8, 2009.) This plan has much to praise but also three critically important omissions.
The new approach, like the No Child Left Behind Act and many of its precursors, relies on holding educators accountable for student performance on achievement tests. Indeed, it would make this element of education policy even more important—for example, by encouraging pay-for-performance plans. But the effects of the previous accountability programs have been disappointing: relatively small improvements on trustworthy indicators of performance, and many serious side effects. Why should we expect more of the president’s proposed variation on this familiar theme?
—Susan Sanford
To avoid replicating past mistakes, the president’s accountability program will have to follow three principles:
Make it broad. An assumption underlying the No Child Left Behind law was that if we initially focused accountability on just a couple of the critically important goals of education—math and reading achievement—the rest would hold steady and wait for us to turn to them later. Abundant evidence shows this assumption to be wrong. The activities and outcomes that do not count for accountability deteriorate, sometimes seriously, as schools shift resources from them to those few things that do count.
Many areas can suffer, including untested subjects, untested aspects of the subjects used in the accountability system, performing arts, student-initiated work, and physical activity. This should be no surprise. It is just common sense, and the same problem has been found in many other fields and in private firms as well as the public sector.
Confront score inflation. Commenting on the low performance standards currently set by some states, the quoted U.S. Secretary of Education Arne Duncan as saying: “States are lying to children. They are lying to parents. They’re ignoring failure.” Indeed. But there is another reason that the public has been misled: bogus increases in scores. Many schools have responded to test-based accountability in ways that inflate scores, increasing them more than actual gains in achievement warrant. This does not require cheating. Scores can be inflated by many honest—although undesirable—forms of test preparation.
While not ubiquitous, score inflation is common and sometimes very large, and it is likely to hurt the most disadvantaged students the most. This too should not be surprising, because similar problems have been found in many other fields. Yet policymakers continue to ignore this inconvenient fact and use inflated scores to support exaggerated claims of success.
We should admit that our ideas for a better educational accountability system, however thoughtful, are partly unproven, need evaluation, and may require midcourse corrections.
If we fail to confront the problem of score inflation, we will be left, once again, with an illusion of effective accountability. Dealing with inflation effectively will require numerous steps. We will need frequent auditing of score gains to ensure that improvements on the tests used for accountability are trustworthy indicators of improved learning. We have had a small number of audits over the past two decades, but these have been the exception rather than the rule because neither federal nor state reform programs have established an expectation, let alone a requirement, that this type of evaluation be conducted.
We need to monitor how educators prepare students for the accountability tests—whether they improve their instruction or resort to inappropriate forms of test preparation. We also need to evaluate new approaches to test design tailored to lessening inflation. The Obama administration’s goal of developing tests that focus more on higher-order skills is laudable, but it will not address the problem of score inflation.
Experiment and evaluate. This nation has tried numerous approaches to test-based accountability over the past several decades, but all of them have shared one essential trait: None has been based on sufficient evidence. They have been designed without enough hard information about their likely effectiveness and side effects. Once implemented, they have not been adequately evaluated. Scores on the accountability tests usually increase, and for a time we are greeted with claims of success. Eventually, less-encouraging data catch up with us—for example, scores on other tests less vulnerable to inflation, such as the National Assessment of Educational Progress and international comparative studies. A crisis is declared, we make up a new accountability system, and the cycle begins anew.
This failure to rely on hard evidence hinders the improvement of policy and schooling, and the failure to monitor effects on children is unacceptable. We do not tolerate this in other policy areas (think of Vioxx). The administration should avoid the temptation to say, once again, that we have it right this time. We should admit that our ideas for a better educational accountability system, however thoughtful, are partly unproven, need evaluation, and may require midcourse corrections.
The upcoming reauthorization of the No Child Left Behind law should institute routine and rigorous evaluation of the program’s effects—evaluations that do not rely on potentially inflated test scores. It should also encourage states and large districts to experiment with innovative approaches to accountability, but with a price: evaluations that will tell the rest of us whether their systems should be terminated, modified, or emulated.
There is room to argue about how best to address these three principles. But a failure to address them will give us more of the same: a narrowed educational system, bogus claims of success, and children left behind.