What students are expected to know in order to reach proficiency levels on exams in some states may be as much as four grade levels below the standards set in the states with the most rigorous assessments, that uses international testing data to gauge states against a common measuring stick.
Released today, the report by the Washington-based research group makes a case for states, as they collaborate on common standards, to use national and international benchmarking to make cutoff scores more demanding and improve the descriptions of what it means for students to be proficient in reading and mathematics at each grade level.
The researchers used National Assessment of Educational Progress benchmarks to compare each state鈥檚 standards against the benchmarks for the same subjects used in two international assessments, the Trends in International Mathematics and Science Study, or TIMSS, and the Progress in International Reading Literacy Study, or PIRLS, during 2007, the most recent year all three types of assessments were administered. Researchers then analyzed the percentage of students in each state who would meet minimum proficiency according to their state standards and the common international standards.
Measured against the international benchmarks, the gaps between states for students were so great, the report notes, that the difference in actual proficiency between students in states with the most and least rigorous standards was double the national achievement gap between black and white students on the National Assessment of Educational Progress in 2007, then about two grade levels. At the 4th grade level, only Massachusetts had more rigorous state standards than the international standards, and its standards for 4th grade math were comparable to those required for a typical student in the highest-performing TIMSS countries, such as Japan, Taiwan, Singapore, and Hong Kong. For 8th grade math and 4th grade reading, only Massachusetts and South Carolina had standards comparable to those of the best-performing countries.
鈥楽hort Selling鈥 Students
Gary W. Phillips, the AIR鈥檚 vice president and chief scientist, who wrote the report, called state-proficiency standards 鈥渢he educational equivalent of short selling.鈥
鈥淩ather than betting on student success,鈥 he said in the report, 鈥渢he educators sell the student short by lowering standards.鈥
A comparison of 4th grade students scoring at the proficient level in math on 2007 state assessments vs. an internationally benchmarked common standard show dramatic differences in what is considered proficient. Of all states, only Massachusetts had more students perform at the proficient level on international standards than on state standards.
SOURCE: American Institutes for Research
For comparison, Mr. Phillips points to two winners in the federal Race to the Top grant competition: Massachusetts and Tennessee. Massachusetts鈥 bar for 8th grade math proficiency is two full standard deviations above Tennessee鈥檚 proficiency bar; that gap, the study found, represents more than four grade levels鈥 difference between proficient 8th graders in the two states. Tennessee changed its achievement standards this year, but such gaps remain among all states.
鈥淚t documents again what we鈥檝e long known, which is on current state tests the bar for proficiency is literally all over the map,鈥 said Michael Cohen, the president of Achieve, a Washington-based nonprofit group that works with states to evaluate their academic-content and testing standards.
鈥淚t鈥檚 been that way for a while,鈥 he said. 鈥淓ach state gets to do this entirely on its own, and they respond differently to the pressures of [the No Child Left Behind Act].鈥 The federal law gauges schools鈥 progress by gains in student proficiency against the targets the states set for performance on their respective tests.
The AIR researchers found the percentage of students who reached proficiency in 4th grade math and reading and 8th grade math were strongly inversely proportional to the rigor of the achievement benchmarks鈥攖o the extent that the report suggests low state proficiency bars may account for up to 60 percent of the gains states have reported in student performance in the years since the NCLB legislation was passed by Congress in 2001.
Similarly, by the Washington-based Thomas B. Fordham Institute found the states with the highest proficiency standards regressed to average standards since NCLB was implemented in 2002.
The AIR report also echoes , in which the National Center for Education Statistics compared state standards with those of the National Assessment of Educational Progress. The NCES study found, for instance, that across 2003, 2005, and 2007 assessments, the distance between the five states with highest standards and lowest standards in 4th-grade reading was comparable to the difference between NAEP鈥檚 鈥渂asic鈥 and 鈥減roficient鈥 achievement levels.
Benchmarking a New Way
Mr. Phillips argues that states now tend to set descriptions of, and cutoff scores for, different content-proficiency levels using recommendations from panels of local educators, researchers, and other stakeholders. Such panels have access to information about whether other countries used particular test items, but usually not until the end of the standards-setting process, 鈥渨hen their minds are already made up,鈥 he said.
The AIR recommends instead that states use a benchmark method to set proficiency levels.
First, the state would reach a consensus on academic-content standards and field-test a representative pool of test questions based on them. It would compile the questions in order from easy to hard, and link the scaled items statistically to equivalent questions in other states and countries. Then content experts would use both the questions and performance descriptions from other states and tests to describe what students should know and be able to do at each proficiency level. Finally, those descriptions would be used to set cutoff scores for the state content assessments.
鈥淭he new method uses international benchmarking starting out, so the teachers and other panelists in the workshop have a broader foundation for what they are doing,鈥 Mr. Phillips said.
A more detailed description of the method will be published in early 2011, in the book Setting Performance Standards, Second Edition by Gregory J. Cizek, an educational measurement and evaluation professor at the University of North Carolina at Chapel Hill. Mr. Phillips said the AIR also will update the findings after the state, TIMSS, PIRLS and NAEP assessments are administered together again in 2011.
Three states鈥擠elaware, Hawaii, and Oregon鈥攈ave already taken the first step.
In this year鈥檚 spring high school math assessments, Oregon embedded sample questions from PISA, which tests the math performance of 15-year-olds in countries in the Organization for Economic Cooperation and Development. While the sample questions did not count for students鈥 scores, they were used to benchmark the state test against international standards.
From there, with input from educators and researchers, the Oregon education department has recommended changing the proficiency descriptions and cutoff scores for each grade鈥檚 assessments, according to Anthony Alpert, the assessment director for the department. The state board of education is set to vote this week on the new proficiency standards for math, with other subjects in the works.
If the new standards are approved, Oregon鈥檚 proficiency cutoff scores would increase by half of a standard deviation at every grade level, Mr. Alpert said.
鈥淔or some of our communities, international benchmarking isn鈥檛 necessarily their highest priority, so we鈥檙e still talking with our communities about why vertical alignment and international benchmarking is critical to our students鈥 readiness to compete in the global workplace,鈥 Mr. Alpert said. He said he hopes to create 鈥渁 [testing] system that is better鈥攎ore consistent with the expectations that other states have for their kids and other countries have for their kids.鈥