As policymakers think about the reauthorization of the No Child Left Behind Act, one of the key considerations is the measurement of school performance. Many people find the federal law’s original measures of school performance (“status” and “safe harbor”) to be unfair because they do not take into account students’ initial achievement levels. As a result, schools are judged in large part by that which is beyond their control: the amount of knowledge their students had when they entered the school. Consequently, schools are held to very different standards; some need only to produce modest learning gains, while others must produce unrealistically large gains.
In response to such criticism, the federal government initiated the growth-model pilot program in 2005, allowing states to use the progress of individual students over time in determining whether schools are making adequate yearly progress, or AYP, toward academic proficiency for all students. Many thought that the growth-model pilot would help recognize—or at least exempt from negative labeling—schools where students were making large achievement gains but not reaching proficiency because of their low initial achievement levels. Such low-status, high-growth schools fail to make AYP under the law’s original measures, even though they may be relatively more effective than other schools at raising achievement.
Seven of the nine states participating in the pilot (Alaska, Arizona, Arkansas, Florida, North Carolina, Ohio, and Tennessee; Delaware and Iowa are the two remaining) are using “projection models” that give schools credit for getting students “on track” to become proficient in the future, even if they are not currently. Under NCLB’s traditional model—a status model—a school has to bring an initially low-performing 3rd grader up to proficiency by the end of the year for the school to receive credit for her performance. Under the pilot’s projection model, the school could receive credit for this student, even if she failed the 4th or 5th grade exam, if learning gains were sufficiently large such that the student appeared to be on track to become proficient by 6th grade.
In theory, the advantage of using projection models is that they give schools a few additional years to bring students up to proficiency. But, as reported in these pages, the pilot program’s growth models don’t appear to be making a big difference in the proportion of schools meeting annual goals under the federal law. (“Impact Is Slight for Early States Using ‘Growth,’” Dec. 19, 2007.) The reason for this has to do with the type of growth models being used.
In practice, projection models are extremely similar to NCLB’s original status measure. In schools where students enter with high initial achievement levels, the learning gains required to get students on track to become proficient are quite small, while in schools where students enter with low initial achievement levels, the required learning gains to get students on track to become proficient may be unrealistically large. Consequently, under the federal growth-model program, schools are still held to different standards—some must produce large gains while others need only to produce small gains. Both status and projection models require all students to reach a fixed proficiency target regardless of their initial achievement levels. It is because No Child Left Behind’s status model and the growth-model pilot program’s projection models are so similar that very few new schools are making AYP because of “growth” alone.
Are people simply wrong about growth models? Are there really very few low-status, high-growth schools? The answer is no.
This is made worse by the fact that the projection models currently being used are often inaccurate. Florida’s is one example. ܹ̳ reported this past December that the growth-model pilot there was having an impact: “About 14 percent of the schools that made AYP in Florida made it under the growth model but not the status model.” Unfortunately, the reason for this is not that students in Florida are making more growth than students in the other states piloting these models. Nor is it because of the difficulty of Florida’s state standards. Rather, it is because Florida’s projection model is inaccurate. It assumes that students’ test scores will increase according to a linear trend. That is, if a student gains 200 points between 3rd and 4th grade, Florida’s projection model assumes he will continue to gain 200 points between 4th and 5th grade, and then another 200 points between 5th and 6th grade. But the state’s developmental scale is curvilinear, with students typically making significantly smaller learning gains as they progress in school. As a result, Florida’s projection model systematically identifies many students as on track to become proficient when in reality they will not. At their best, projection models mimic NCLB’s status model; at their worst, they allow some additional schools to make AYP because of measurement error and/or model misspecification.
While it is unlikely that projection models will have a significant impact on which schools are identified as making AYP, if federal policymakers decide to continue using such models, they should at least require states to demonstrate the accuracy of their models at the student and school levels. States with records of student-achievement data can demonstrate the accuracy of their models by using them to determine which students were on track to become proficient and then comparing these projections with the observed outcomes (that is, whether the students became proficient). In my research, I’ve found that the models are not very accurate when compared to any reasonable standard.
This raises the question: Are people simply wrong about growth models? Are there really very few low-status, high-growth schools? The answer is no. What most people have in mind when they think of growth models is a particular type that is not allowed under the federal pilot program: value-added models. These aren’t allowed because they do not require all students to become proficient. Unlike the growth-model pilot’s projection models, however, value-added models attempt to measure schools’ relative effectiveness by accounting for students’ initial achievement levels. Although many value-added models are extremely complex, the ones used to measure school performance can be loosely thought of as comparing the average gains of students in a school to the gains those same students could have been expected to have made had they gone to the “average” school. While value-added measures of school performance have their own problems, these are the models people typically think of when they envision an accountability system that includes a growth component. If value-added models were used, we would identify quite a few low-status, high-growth schools.
Value-added models are not allowed under the growth-model pilot program because they don’t adhere to the core principle of NCLB—to bring all students up to proficiency. But they do represent the fairest (albeit imperfect) way to compare schools’ effectiveness. The dilemma over which measure of school performance to use highlights an inherent tension when designing an accountability system for schools, one between the desire to compare their relative effectiveness (value-added models) while simultaneously holding them accountable for bringing all students up to high achievement levels (status or projection models). Some people thought that the pilot program’s projection models were a happy middle ground. Unfortunately, projection models don’t address the essential tension between status and growth. They are just the same old status-model wine in a new bottle.