Only three of 24 popular school reform models have strong evidence that they improve student achievement, according to a report released last week that provides the most comprehensive rating of such programs by an independent research group.
Direct Instruction, High Schools That Work, and Success for All received the best marks from 鈥淎n Educators鈥 Guide to Schoolwide Reform,鈥 which was released at a press conference in Washington.
The 141-page guide from the Washington-based American Institutes for Research was commissioned by five leading education groups.
The consumer-oriented guide rates 24 whole-school reform models according to whether they improve achievement in such measurable ways as higher test scores and attendance rates. It also evaluates the assistance provided by the developers to schools that adopt their strategies, and compares the first-year costs of such programs.
鈥淲e wanted to have a document that really, critically evaluated the evidence base underpinning these programs,鈥 said Marcella R. Dianda, a senior program associate at the National Education Association, which helped underwrite the $90,000 study. 鈥淲e felt that our members really wanted that. They wanted us to get to the bottom line.鈥
The study comes as districts around the country seek proven, reliable solutions to the problem of low-performing schools. But as they spend greater amounts of tax dollars on the various reform models, questions remain about how well the programs work. Experts say that research such as the AIR report is needed to fill the gaps.
About 8,300 schools nationwide were using one of the 24 designs rated in the study as of Oct. 30, the report says. Congress gave a major impetus to such 鈥渨hole school鈥 reforms in 1997, when it authorized nearly $150 million in federal grants for low-performing schools to adopt 鈥渞esearch-based, schoolwide鈥 efforts. (鈥淲ho鈥檚 In, Who鈥檚 Out,鈥 Jan. 20, 1999.)
Yet, according to the report, 鈥渕ost of the prose describing these approaches remains uncomfortably silent about their effectiveness.鈥 That leaves schools in the tough position of deciding which model to choose with little evidence to go on.
鈥淏efore this guide came along, about the only way educators could judge the worth of some of these programs was by the quality of the developers鈥 advertising and the firmness of their handshakes,鈥 said Paul D. Houston, the executive director of the American Association of School Administrators. 鈥淣ow, superintendents, principals, and classroom teachers can sit down together and make reasonable decisions about which are best for their district鈥檚 needs.鈥
The study was sponsored by the NEA, the AASA, the American Federation of Teachers, the National Association of Elementary School Principals, and the National Association of Secondary School Principals.
Ratings Questioned
While the report is a big step forward in helping schools sort out the value of such programs, it also underscores how hard it is to judge effectiveness in education.
Last week, several of the organizations behind reform models evaluated in the report contested its ratings. In particular, developers questioned how AIR decided which studies to include as evidence of a program鈥檚 effectiveness. Several developers maintained that they have more evidence of positive results than AIR gave them credit for.
Henry M. Levin, a Stanford University economist and scholar whose Accelerated Schools program received only a 鈥渕arginal鈥 rating, described the study as 鈥渇airly amateurish.鈥
鈥淏asically, they discounted anything, as far as I can tell, that comes in and changes test scores over time for a particular school,鈥 Mr. Levin said. 鈥淎nd anything that said it had a comparison group was given a gold standard.鈥
The guide reviews all 17 whole-school models that were originally identified in the 1997 federal legislation that created the $150 million Comprehensive School Reform Demonstration Program. It also rates seven other prominent or widely used programs that schools could potentially adopt when seeking Obey-Porter grants, as the federal program is commonly known.
The evaluators used a two-step process to rate whether the programs had evidence that they raised student achievement.
First, AIR gathered almost any document about a program that reported student outcomes, including articles in scholarly journals, unpublished case studies and reports, and changes in raw test scores reported by the developers. 鈥淲e tried to cast a really wide net in collecting the research,鈥 said Rebecca Herman, the project director.
More than 130 studies were then reviewed and rated for their methodological rigor in 10 categories, based on such criteria as the quality and objectivity of the measurement instruments used, the period of time over which the data were collected, the use of comparison or control groups, and the number of students and schools included. Each study was assigned a final methodology rating by averaging across the 10 categories.
Only studies that met AIR鈥檚 criteria for rigor were used to rate whether a program was effective in raising student achievement.
For example, a number of developers submitted changes in state or local test scores as evidence that their programs were working. But 鈥渨e didn鈥檛 really consider test scores alone, without some sort of context,鈥 Ms. Herman said, 鈥渂ecause there are a lot of things that can explain changes in test scores.鈥
Leaping to Conclusions?
The study gave a 鈥渟trong鈥 rating to the programs with the most conclusive research backing, notably four or more studies that used rigorous methodology and found improved achievement.
In at least three studies, the gains had to be statistically significant. A 鈥減romising鈥 rating went to models with three or more rigorous studies that showed some evidence of success.
Reform models that earned a 鈥渕arginal鈥 rating had fewer rigorous studies with positive findings, or a higher proportion of studies showing negative or no effects. A 鈥渕ixed or weak鈥 label was assigned to programs with study findings that were ambiguous or negative. And AIR gave a 鈥渘o research鈥 rating to programs for which there were no methodologically rigorous studies.
Eight of the programs received the 鈥渘o research鈥 rating. Ms. Herman said that was not surprising, given the newness of many of the models.
鈥淚t takes a good three years to implement a reform model across a school, and another two years to come up with a decent study,鈥 she said. 鈥淲hat we鈥檙e looking at is the first wave of research, and we鈥檙e hoping for an ocean to follow it.鈥
Janith Jordan, the vice president of Audrey Cohen College in New York City, whose design received a 鈥渘o research鈥 rating, said that 鈥渂ecause of the fact that we are a younger design team, to leap to a conclusion about our potential or our effectiveness really is premature.鈥
More Research Needed
More than anything, experts said last week, the study underscores the need for strong, third-party evaluations of schoolwide reform models. Several other efforts are now completed or in the works.
鈥淭he fact is that the capacity to do this kind of research is very limited in this country,鈥 said Marc S. Tucker, a founder of America鈥檚 Choice, one of the 24 models reviewed. 鈥淚 believe that it鈥檚 very important for the federal government to put a fair amount of money on the table to make this kind of research possible.鈥
Ellen Condliffe Lagemann, the president of the National Academy of Education, a group of education researchers and scholars, agreed. 鈥淚t鈥檚 amazing how little evaluation there is,鈥 she said. 鈥淪ince the early 20th century, the people who have peddled the educational reform strategies that we all hear about tend to be successful because they鈥檙e the best entrepreneurs. It doesn鈥檛 necessarily have to do with any research credibility.鈥
AIR rated the support that developers provide to schools based on the variety of help available; the frequency of on-site technical assistance; the number of years the support is given; and the tools schools receive to help monitor their own implementation.
To prepare the tables and a profile for each program, AIR interviewed the developers, gathered and reviewed all available studies, and collected additional information from schools that used the approach.