Testing Group Scales Back Performance Items

Save to favorites
Print

Copy URL

Includes updates and/or revisions.

A group that is developing tests for half the states in the nation has dramatically reduced the length of its assessment in a bid to balance the desire for a more meaningful and useful exam with concerns about the amount of time spent on testing.

The decision by the Smarter Balanced Assessment Consortium reflects months of conversation among its 25 state members and technical experts and carries heavy freight for millions of students, who will be tested in two years. The group is one of two state consortia crafting tests for the Common Core State Standards with $360 million in federal Race to the Top money.

From an original design that included multiple, lengthy performance tasks, the test has been revised to include only one such task in each subject鈥攎athematics and English/language arts鈥攁nd has been tightened in other ways, reducing its length by several hours.

The final blueprint of the assessment, approved by the consortium last month now estimates it will take seven hours in grades 3-5, 7陆 hours in grades 6-8, and 8陆 hours in grade 11.

Earlier this fall, states鈥� worries about too much testing time had prompted the group to offer a choice: a 鈥渟tandard鈥� version of the assessment鈥�6陆 to 8 hours鈥攐r an 鈥渆xtended鈥� one, which would run 10陆 to 13 hours, with more items to facilitate more-detailed feedback on student performance. (鈥淭wo Versions of 鈥楥ommon鈥� Test Eyed by State Consortium,鈥� Sept. 19, 2012.)

Persistent doubts about that plan, however, led to further discussions and a decision to expand the shorter version by about 30 minutes and make it the only one offered, consortium officials said.

The computer-adaptive test will include multiple-choice, constructed-response, and technology-enhanced items. The performance tasks are far lengthier and more complex, requiring students to do things like write several short essays based on their readings from multiple articles and videos, or perform a host of calculations to figure out how to build and plant a community garden.

While many states saw value in having more performance tasks on the test, the amount of information they could yield didn鈥檛 justify the additional testing hours, said Carissa Miller, the deputy superintendent for assessment, content, and school choice in Idaho, and the co-chairwoman of the SBAC executive committee. Including even one such task鈥攚hich requires students to tackle longer, more complex math problems and write essays based on reading multiple texts鈥攔epresents a major improvement in most states鈥� assessment systems, she said.

鈥淚t鈥檚 a precarious balance between having a test that we get all the measurement pieces we need, and having it be so long that it becomes impractical,鈥� she said. 鈥淗aving even one very authentic performance task, [with] how much that will change instruction in states that have not had those kinds of things in the past. I think we really came to a sweet spot.鈥�

Drilling Down

A key push in the latest redesign was to ensure that the test yields enough detailed information to enable reports on student performance in specific areas of math and English/language arts, Smarter Balanced officials said. The U.S. Department of Education, in particular, pressed for that, said Joe Willhoft, SBAC鈥檚 executive director. And the consortium鈥檚 technical-advisory committee had persistent concerns about a pared-down test鈥檚 ability to report meaningfully on student, as opposed to classroom- or district-level, performance, SBAC leaders said.

The final version will yield overall student scores in math and in English/language arts, by four levels of performance and on a yet-to-be-designed scale, Mr. Willhoft said. It will also produce student-level scores in three areas of math鈥攃oncepts and procedures, communicating reasoning, and problem-solving/modeling/data analysis鈥攁nd in four areas of literacy鈥攔eading, writing, listening, and research, he said.

In the earlier, 鈥渟tandard鈥� version of the test, some of those areas were combined, making it hard to judge those aspects of students鈥� performance. Adding more items and shifting their distribution allows the test to gauge students鈥� skills in each area, Mr. Willhoft said, while time was managed by scaling back performance tasks and reducing the length of some reading passages.

Still, some experts see the resulting reports as being of disappointingly little instructional value.

W. James Popham, an assessment expert who serves on the Smarter Balanced technical-advisory committee, said tests can provide meaningful information only if teachers and students get more fine-grained feedback than an overall score in writing or in math 鈥渃oncepts and procedures.鈥�

鈥淚t鈥檚 still too broad,鈥� he said. 鈥淣o one can ferret out what students need help with. For Smarter Balanced to make a real contribution, it has to make certain that its other two pieces, the interim and formative assessments, are instructionally focused, so educators can do something with the results.鈥�

The Right Balance

The evolution of the Smarter Balanced assessment showcases a persistent tension at the heart of the purpose of student testing, some experts say.

鈥淚s it about getting data for instruction? Or is it about measuring the results of instruction? In a nutshell, that鈥檚 what this is all about,鈥� said Douglas J. McRae, a retired test designer who helped shape California鈥檚 assessment system. 鈥淵ou cannot adequately serve both purposes with one test.鈥�

That鈥檚 because the more-complex, nuanced items and tasks that make assessment a more valuable educational experience for students, and yield information detailed and meaningful enough to help educators adjust instruction to students鈥� needs, also make tests longer and more expensive, Mr. McRae and other experts said.

What Smarter Balanced did, he said, was to compromise on obtaining data to guide instruction in order to produce a test that measures the results of instruction. As a strong supporter of accountability, that鈥檚 an approach Mr. McRae supports. It鈥檚 also crucial to have data that guide day-to-day instruction, he said, but that should come from separate formative and interim tests.

That鈥檚 what SBAC has in mind, said Mr. Willhoft. Its end-of-year, summative tests will measure results for accountability, and those can shape what schools and districts do long term, he said.

鈥淚鈥檓 not convinced that the end-of-year summative assessment used for accountability could be imagined to be extremely instructionally useful,鈥� Mr. Willhoft said. It鈥檚 the interim and formative pieces of its system, he said, that have the potential to affect day-to-day instruction in profound ways.

The plan is to have thousands of test items and tasks in an online 鈥渂ank鈥� teachers can draw from to custom-design interim tests on specific standards. Also available will be a bank of 鈥渇ormative鈥� tools and strategies to help them judge and monitor students鈥� learning as they go along, Mr. Willhoft said. That three-pronged approach鈥攕ummative, interim, formative鈥攎akes up the 鈥渂alanced鈥� suite of tests many have sought, he said.

The final test design, with a mix of multiple-choice, constructed-response, technology-enhanced, and performance items, is a big improvement over the exams most states have now, said Deborah V.H. Sigman, California鈥檚 deputy superintendent of public instruction and a member of SBAC鈥檚 executive committee.

鈥淲e have a summative assessment that signals to the world that there are different ways to measure what students are learning and can do,鈥� she said. 鈥淭hat鈥檚 a huge benefit.鈥�

Catherine Gewertz

Senior Contributing Writer, 澳门跑狗论坛

Catherine Gewertz was a writer for 澳门跑狗论坛 who covered national news and features.

Coverage of implementation of college- and career-ready standards is supported in part by a grant from the GE Foundation, at www.ge.com/foundation.
A version of this article appeared in the December 05, 2012 edition of 澳门跑狗论坛 as Test Group Rethinks Questions

澳门跑狗论坛