The What Works Clearinghouse: Time for a Fresh Start (Opinion)

Save to favorites
Print

Copy URL

Robert E. Slavin

Robert E. Slavin is a co-director of the Center for Research on the Education of Students Placed at Risk, at Johns Hopkins University, in Baltimore, and the director of the Institute for Effective Education at the University of York, in the United Kingdom. He is also the chairman of the Success for All Foundation, a private nonprofit company that provides a widely used K-12 reading program. The views expressed here are solely his own.

The is the flagship initiative of the federal Institute of Education Sciences. Begun in 2002, it was intended, according to its Web site, to 鈥渉elp the education community locate and recognize credible and reliable evidence to make informed decisions,鈥� and to provide educators with a 鈥渃entral and trusted source of scientific evidence of what works in education.鈥� Scientifically valid and clearly written reviews of research on practical programs are essential to evidence-based reform, so the clearinghouse has been eagerly awaited by all who believe that educational practice should emphasize programs with strong evidence of effectiveness.

After five years and more than $30 million, the clearinghouse has finally begun to produce significant numbers of reports on the evidence base supporting various educational programs. But the reports make it clear that the clearinghouse has failed. Its arcane and poorly justified procedures have produced information that is neither scientifically justified nor useful to educators.

Recently, the What Works contract was awarded to a new contractor, Mathematica Policy Research Inc. This transfer provides an opportunity for a fresh start鈥攐ne that is needed if the clearinghouse is to accomplish its worthy goals.

What is wrong with the What Works Clearinghouse is that although its rules appropriately emphasize random assignment, they ignore design elements with far more potential for bias than lack of random assignment. As a result, the clearinghouse gives its highest ratings for evidence of positive effects to programs supported by studies that are often very small, very brief, very biased, and/or very seriously flawed in other ways, failing to give educators valid or meaningful information on the programs they might use to improve their students鈥� achievement.

One example is in the middle school mathematics topic area. The clearinghouse gave its top rating, 鈥減ositive effects,鈥� to only one program, Saxon Math. Two randomized and four matched studies met clearinghouse standards.

How can the What Works Clearinghouse provide practioners with more useful information about education research? Join the discussion.

To get into the top category, a program must have significant effects in at least one randomized study and one other study. The unpublished randomized study that qualified Saxon Math for the 鈥減ositive effects鈥� rating involved 46 students taught by one teacher in one high school. The only outcome measure was made up by the author, and is closely aligned with the Saxon Math curriculum (but not the curriculum used in the control group). The other small randomized study found no differences, and two of the four studies that used conventional measures of math not keyed to the Saxon Math curriculum found effects favoring the control group. The median effect size across the four studies that used conventional measures was only +0.06 (most researchers consider an effect size, the proportion of a standard deviation separating experimental and control groups, to be educationally meaningful if it is +0.20 or larger).

Another egregious example relates to a program called DaisyQuest, computer software designed to teach phonemic awareness in grades K-1, listed in the clearinghouse as having 鈥減ositive effects鈥� on 鈥渁lphabetics.鈥� The DaisyQuest studies involved about five hours of computer instruction. Sample sizes were extremely small: 49 in one study, 27 in another, 69 in a third. Worse, outcome measures included activities taken from the DaisyQuest program (which experimental students had practiced and control students had never seen). In fact, control students were not being taught phonemic awareness at all. In one of the studies, ignored in the ratings, a comparison treatment was used in which a teacher taught phonemic awareness to a group of children, and those children scored far better on the Phonological Awareness Test than those who experienced DaisyQuest (effect size = -0.44).

Studies like those of DaisyQuest are the rule, not the exception, among programs rated 鈥減ositive鈥� in the clearinghouse鈥檚 beginning-reading topic report. Other programs rated 鈥減ositive鈥� included Kaplan SpellRead, a tutoring program validated as having 鈥減ositive effects鈥� in alphabetics by an eight-week study in a single school in Newfoundland involving 47 children. Another tutoring program, Stepping Stones to Literacy, was rated 鈥減ositive鈥� for alphabetics based on a five-week study involving just 36 children, in which tutoring was delivered by project staff members.

A four-week study of an ill-defined intervention called Peer Tutoring and Response Groups evaluated a process-writing model in which 4th graders in the experimental group worked in small groups to plan, draft, edit, and finalize compositions. The small groups were composed of English-language learners and fully proficient English-speakers. On the final test used as the outcome measure, children were asked to write a composition. In the experimental group, children were allowed to help each other write their compositions, while children in the control group wrote by themselves. Not surprisingly, experimental ELL students wrote significantly more words in their compositions, with the help of their English-proficient groupmates. This outcome qualified Peer Tutoring and Response Groups for a 鈥減ositive effects鈥� rating for 鈥淓nglish-language development鈥� in the English-language-learning topic report.

Program ratings should not be strongly influenced by one or two small studies, but should emphasize programs evaluated with many students in many schools.

The clearinghouse is unaccountably inconsistent from topic to topic. It requires that studies have a duration of at least a semester in math, but there is no duration requirement in reading or in programs for English-language learners. This means that if DaisyQuest, Stepping Stones to Literacy, SpellRead, or Peer Tutoring and Response Groups had been math programs, their key studies would have been excluded. The middle school math review considers only textbook programs, ignoring computer-assisted instruction and programs that focus on changing instructional processes (such as cooperative learning). Elementary math includes computer-assisted instruction, but not instructional-process programs. Other reviews of math programs have found that the instructional-process programs excluded by the clearinghouse have the strongest positive effects in the most rigorous evaluations, yet a reader of the What Works Clearinghouse would never know this.

The current clearinghouse will not pass muster among the scientific community, among educators, or among policymakers. Its Web site should be taken down immediately while new procedures are devised. The new procedures should refocus on giving educators unbiased information on the likely outcomes of programs available to them today to accomplish their most important goals, as outlined in the What Works Clearinghouse鈥檚 mission statement. To this end, the procedures should place strict standards on outcome measures (to remove those biased toward the experimental group) and on duration (requiring at least 12 weeks of intervention), to ensure that highlighted studies are meaningful and fair. To give appropriate emphasis to large, unbiased studies, the clearinghouse should compromise on statistical adjustments for clustering and other statistical issues that do not introduce bias. Program ratings should not be strongly influenced by one or two small studies, but should emphasize programs evaluated with many students in many schools.

Some day, the What Works Clearinghouse could become a reliable, respected source of information regularly consulted by educators and policymakers as they make critical decisions for children. As it currently stands, however, the clearinghouse is counterproductive, communicating to educators, policymakers, and researchers that research to date has little to offer them in choosing proven or promising programs. Mathematica has an opportunity to make a fresh start with a new clearinghouse that truly represents the accumulated findings of high-quality research.

Educators and policymakers have been promised fair, meaningful, and useful information they can rely on to make wise decisions for children. With a fresh start, the What Works Clearinghouse can still fulfill this promise.