Cutoff Scores Set for Common-Core Tests

Save to favorites
Print

Copy URL

In a move likely to cause political and academic stress in many states, a consortium that is designing assessments for the Common Core State Standards released data Monday projecting that more than half of students will fall short of the marks that connote grade-level skills on its tests of English/language arts and mathematics.

The Smarter Balanced Assessment Consortium test has four achievement categories. Students must score at Level 3 or higher to be considered proficient in the skills and knowledge for their grades. According to cut scores approved Friday night by the 22-state consortium, 41 percent of 11th graders will show proficiency in English/language arts, and 33 percent will do so in math. In elementary and middle school, 38 percent to 44 percent will meet the proficiency mark in English/language arts, and 32 percent to 39 percent will do so in math.

Level 4, the highest level of the 11th grade Smarter Balanced test, is meant to indicate readiness for entry-level, credit-bearing courses in college, and comes with an exemption from remedial coursework at many universities. Eleven percent of students would qualify for those exemptions.

The establishment of cut scores, known in the measurement field as 鈥渟tandard-setting,鈥� marks one of the biggest milestones in the four-year-long project to design tests for the common standards. It is also the most flammable, since a central tenet of the initiative has been to ratchet up academic expectations to ensure that students are ready for college or good jobs. States that adopted the common core have anticipated tougher tests, but the new cut scores convert that abstract concern into something more concrete.

Smarter Balanced is one of two main state consortia that are using $360 million in federal funds to develop common-core tests. The other group, the Partnership for Assessment of Readiness for College and Careers, or PARCC, is waiting until next summer鈥攁fter the tests are administered鈥攖o decide on its cut scores. Smarter Balanced officials emphasized that the figures released Monday are estimates, and that states would have 鈥渁 much clearer picture鈥� of student performance after the operational test is given in the spring.

More than 40 states have adopted the common standards鈥攖he product of an initiative led by the nation鈥檚 governors and state schools chiefs鈥攁nd most belong to one of the assessment consortia. Seventeen of Smarter Balanced鈥檚 22 members plan to use the consortium鈥檚 test this school year.

Aiming High

Smarter Balanced based its achievement projections on 4.2 million students鈥� performance on field-test items last spring. Using cut scores that were set in meetings with hundreds of educators in Dallas this fall, the consortium estimated how many students would score at each level on its test. Two people who took part in that process confirmed that the final cut scores approved by state chiefs, in consultation with top officials in their states, were very close to those recommended by the Dallas panels.

One participant said that when the standard-setting panelists saw the data projecting how many students would fall short of proficiency marks with their recommended cut scores, 鈥渢here were some pretty large concerns. And it was very evident that this was going to be a problem from a political perspective.鈥�

鈥淭he scores that came out of those rooms were close to the rigor level of NAEP,鈥� said another participant, referring to the National Assessment of Educational Progress, a federally administered test given to a nationally representative sampling of students that is considered a gold standard in the industry. 鈥淭hat was sure to freak out some superintendents and governors.鈥� He had anticipated that the state schools chiefs would lower the marks significantly before approving them, and he said he was 鈥渋mpressed and pleased鈥� that they didn鈥檛.

If the achievement projections hold true for the first operational test next spring, state officials will be faced with a daunting public relations task: convincing policymakers and parents that the results are a painful but temporary result of asking students to dig deeper intellectually so they will be better prepared for college or good jobs.

Managing the Message

Statements by Smarter Balanced officials previewed the kinds of arguments state officials will likely have to make.

鈥淏ecause the new content standards set higher expectations for students and the new tests are designed to assess student performance against these higher expectations, the bar has been raised,鈥� Joe Willhoft, the group鈥檚 executive director, said in statement on Monday. 鈥淚t鈥檚 not surprising that fewer students could score at Level 3 or higher. However, over time, the performance of students will improve.鈥�

Many state officials cautioned against comparing the projected performance on the Smarter Balanced test with performance on their current tests, because the tests themselves are different, and they test different material. And indeed, some Smarter Balanced states are likely to see big drops.

California, the biggest state using the new test next spring, turned in proficiency rates as high as 65 percent in some grades on its most recent English/language arts tests. Two-thirds or more of Delaware鈥檚 students cleared the proficiency mark on its tests in both subjects in 2014.

Some experts cautioned against assigning too much meaning to the projected levels of student performance.

Daniel Koretz, a Harvard University professor of education who focuses on assessment, said studies show that the numbers of students who score at each proficiency level can vary greatly, depending on which method of setting cut scores is used.

鈥淚 would ask to what extent what we鈥檙e seeing is a difference in the way standards are set, and to what extent it鈥檚 the content of the test,鈥� he said. 鈥淧eople typically misinterpret standards to mean more than they reasonably do. They think psychometricians have found a way to reveal the truth of the distinctions between 鈥榩roficient鈥� and 鈥榥ot proficient.鈥� But it鈥檚 just an attempt to put a label on a description of student performance.鈥�

Activists who oppose high-stakes standardized tests went further in their criticism of the anticipated Smarter Balanced performance levels.

鈥淧eople should take this with a pound of salt,鈥� said Monty Neill, the executive director of the National Center for Fair & Open Testing, or FairTest, an advocacy group based in Boston. 鈥淭he deliberate intent is to create more difficult standards. So when the result is that your child, your school, your district doesn鈥檛 look as good, it鈥檚 because the test is made deliberately more difficult.

鈥淭he big issue is that we鈥檙e trying to control education through a limited set of standardized tests, and we know from No Child Left Behind that that doesn鈥檛 work,鈥� he said, referring to the nearly 13-year-old federal law that made states鈥� annual testing of students the main lever of accountability for achievement.

How Scores Will Be Used

States in the Smarter Balanced consortium must report student performance on the test in order to meet accountability requirements. Those reports could pose a political challenge for states, and put keen pressure on districts, schools, and teachers. They could be accompanied by consequences as dire as school restructuring, depending on the details of each state鈥檚 accountability plan.

It is up to each state to decide how to use the test scores. The results could be factored into teachers鈥� evaluations, although some states have won delays in that requirement through waivers from the U.S. Department of Education. The results also could drive high-stakes decisions, such as grade-to-grade promotion and high school graduation. Few, if any, Smarter Balanced states plan to use them that way in 2014-15.

Advocates of the tests designed by Smarter Balanced and the other state consortium, PARCC, argue that the transitional stress of lower scores is justified by powerful payoffs.

Instead of each state giving its own test, half the states are giving the same two tests, allowing a shared concept of proficiency and an unprecedented level of cross-state comparison, they contend. And instead of gauging superficial knowledge through bubble sheets, the new exams plumb students鈥� skills more deeply, with lengthy performance tasks that require students to justify their conclusions in math and supply evidence for their interpretations in English/language arts, those advocates say.

鈥淲e have an opportunity to change what assessment means inside our classrooms, an opportunity to make it really be about improving teaching and learning,鈥� said Deborah V.H. Sigman, a member of the Smarter Balanced executive committee and a deputy superintendent in California鈥檚 Rocklin Unified School District.

During closed-door discussions to consider the new cut scores, some state leaders voiced uneasiness about reducing the complexity of student performance to four categories, instead of expressing its range in scale scores. Vermont abstained from the vote because of such concerns. (New Hampshire abstained for other reasons.) In response to the concerns about interpretations of the scoring categories, Smarter Balanced states approved a position paper encouraging states to take a broader view when discussing student achievement.

鈥淭here is not a critical shift in student knowledge or understanding that occurs at a single cut-score point,鈥� the paper said. 鈥淎chievement levels should be understood as representing approximations鈥� of the levels at which students demonstrate mastery. States should consider evaluating additional data, such as grades and portfolios of student work, when evaluating student performance, the paper said.

Drawing the Line

The Smarter Balanced consortium established its cut scores in a lengthy process that began with defining what achievement should look like at each of the four levels. In September, it invited members of the public to rate the difficulty of test items online, and 2,600 did so.

In October, about 500 reviewers鈥攎ainly classroom teachers, principals, curriculum directors and higher education faculty鈥攇athered in Dallas to study those descriptions of achievement and to review booklets of test items, arranged in order of difficulty. In separate panels by subject and grade level, they examined the items and decided the points that distinguished the four levels of achievement.

After rounds of discussion, the results were aggregated into cut scores for each grade and subject. The panelists considered performance data from national tests such as NAEP and the ACT college-entrance exam for comparison as well. Sixty of the Dallas panelists later reviewed the cut scores across grades for consistency.

Yvonne Johnson, a parent and PTA leader from Delaware, served on the 3rd grade math cut-score-setting panel. She said that in early rounds, she was inclined to set the cutoff points much lower than those many of her fellow panelists favored.

鈥淲e would be working on Level 2, and I鈥檇 want to set it on Page 7, and everyone else was at, like, Page 25 and above,鈥� she said. 鈥淚 thought, 鈥榃ow, are these appropriate? This is a very rigorous standard.鈥� But I learned that they鈥檙e aiming for something higher. I鈥檓 used to, 鈥�2 plus 2 is 4; you鈥檙e right.鈥� But here, you鈥檇 want the student to explain why he got that answer, to justify.鈥�

In the end, Ms. Johnson said, she felt confident that the cut points were set at the 鈥渁ppropriate鈥� level, with extended discussion, input, and agreement from all panelists.

Many in the field of educational measurement expressed concern, though, about Smarter Balanced鈥檚 decision to set cut scores based only on field-test data. More often, states establish those scores after the operational test is given, and that is what PARCC will do.

鈥淚t鈥檚 really bizarre to set cut scores based on field-test data,鈥� said one state education department psychometrician. 鈥淵ou can鈥檛 possibly project鈥� accurately what proportions of students will score at the four levels of the test. He and other assessment experts said that field-test data are not good predictors of performance on the operational test because students are unfamiliar with the test, and often, teachers have had less experience teaching the material that鈥檚 being tested.

And students might lack motivation to do their best on a field test, experts said.

鈥淭he good news is that whatever they鈥檙e anticipating now [in student-proficiency rates] will get better,鈥� one assessment expert said of the Smarter Balanced approach.