澳门跑狗论坛

Standards & Accountability

Two Versions of 鈥楥ommon鈥 Test Eyed by State Consortium

By Catherine Gewertz 鈥 September 17, 2012 8 min read
  • Save to favorites
  • Print
Email Copy URL

An unprecedented assessment project involving half the states is planning a significant shift: Instead of designing one test for all of them, it will offer a choice of a longer and a shorter version. The pivot came in response to some states鈥 resistance to spending more time and money on testing for the common standards.

The plan under discussion here last week among state education chiefs of the Smarter Balanced Assessment Consortium represents the collision of hope and reality, as states confront what is politically and fiscally palatable and figure out how that squares with the more in-depth鈥攁nd potentially more valuable鈥攁pproach to testing promised by the consortium.

鈥淭here is the dream, and there鈥檚 real life,鈥 said one state assessment director attending the meeting. 鈥淲e鈥檙e trying to bridge the two the best we can.鈥

The evolving two-pronged approach would give states the option of using a version of the Smarter Balanced test whose multiple sessions and classroom activities span nearly 6陆 hours in grades 3-5, close to seven hours in grades 6-8, and eight hours in high school, or the group鈥檚 original version, which lasts about four hours longer in grades 3-8 and about five hours longer in high school.

Because the assessments would be built on the same blueprint, with a mix of multiple-choice, constructed-response, and technology-enhanced items, as well as lengthy performance tasks, the two versions would deliver comparable results, said Joe Willhoft, the executive director of the consortium. And both would produce the school-, district- and state-level information needed to meet federal accountability requirements, he said.

Both versions would yield overall scores for each student in mathematics and English/language arts, as well as some results within each of those subjects, such as a separate score for students鈥 writing and research skills, or for their grasp of math concepts and procedures, Mr. Willhoft said.

But because a shorter version of the test is more limited in what it can validly say about an individual student鈥檚 performance, an extended version鈥攚ith more items of each type鈥攚ould be needed to make the finer-grained 鈥渃laims鈥 about each student鈥檚 learning in multiple areas of each subject that can yield richer portraits for teachers, parents, and state officials, Mr. Willhoft said.

Joe Willhoft leads an assessment consortium of 25 states.

It would be up to each state to choose which version of the assessment it uses. Early signs suggest that public antipathy toward testing and states鈥 tight fiscal straits are leading more than a few to consider the shorter version. It was pressure from chiefs within the Smarter Balanced consortium that prompted the group earlier this year to explore the option of two versions.

Tough Choices

Mr. Willhoft said the 25-state consortium鈥攚hose members span the country, from California and Washington state to Missouri, South Carolina, and Maine鈥攎ust respond to the needs of its states, or risk losing their membership. And if the consortium loses too many states, it can鈥檛 stay in operation. Federal rules require each consortium鈥擲marter Balanced and the 23-member Partnership for Assessment of Readiness for College and Careers, or PARCC鈥攖o have at least 15 members to qualify for federal funding. The two consortia are using $360 million in aid under the U.S. Department of Education鈥檚 Race to the Top program to design the tests and related projects.

鈥淗aving this shorter version at least keeps them in the game,鈥 Mr. Willhoft said. 鈥淚f all we had was the original, extended version, they might walk.鈥

Derek C. Briggs is a testing expert at the University of Colorado.

PARCC officials said there is no discussion in that group about offering two versions of its test, though Smarter Balanced officials see such a discussion as inevitable in that group as well.

The pressure within Smarter Balanced to offer a shorter version is unsettling for the group鈥檚 biggest advocates, who contend that its vision, while lengthening testing in some states, offers immense promise to make tests a more meaningful gauge of achievement and also a form of instruction.

鈥楢n Audible Gasp鈥

Idaho鈥檚 current tests take three hours or less, said Carissa M. Miller, the co-chair of Smarter Balanced鈥檚 executive committee. So it鈥檚 no small thing to consider exams that could double鈥攍et alone quadruple鈥攖hat amount of time.

鈥淚 presented that to district superintendents, and there was an audible gasp,鈥 said Ms. Miller, Idaho鈥檚 deputy superintendent for assessment, content, and school choice.

But she and the state鈥檚 superintendent of public instruction, Tom Luna, believe so strongly in the value of the detailed information the longer version of the Smarter Balanced assessment will yield that they are working hard to win support from their fellow educators, she said.

鈥淵ou asked for authentic assessments,鈥 Ms. Miller said she tells them. 鈥淎uthentic assessment takes time.鈥

Idaho has not yet decided which version of the test it will use. Neither has Missouri, according to state Commissioner of Education Chris L. Nicastro.

鈥淭here are many unanswered questions and a lot of anxiety about the tests,鈥 she said. 鈥淭he additional rigor and higher expectations of the common standards wouldn鈥檛 make it unreasonable to expect the tests to be a little bit longer. But still, we have some folks concerned about testing.鈥

The U.S. Department of Education, which must review and approve changes in either consortium鈥檚 assessment plan, is working with Smarter Balanced officials to refine the design of its two versions so the consortium can present them to its governing board for approval in late November, consortium officials said.

Ann Whalen, a top aide to U.S. Secretary of Education Arne Duncan, said the designs must meet key aims the department had in funding the project.

鈥淲hile there are different ideas and approaches under discussion, at the end of the day, these assessments must measure critical thinking, paint a very clear picture of which students are doing well, and which need more help, indicate whether students are college- or career-ready, and give students and teachers the information they need to improve,鈥 she wrote in an email. 鈥淭his is an absolute priority for us and will help us better serve the needs of children.鈥

Some consortium members, and some of its closest advisers, worry privately that too many states will opt for the cheaper, shorter version of the test, leaving few鈥攊f any鈥攖o prove that the greater investment of time and resources in 鈥渢ests worth teaching to鈥 is worth it in the long run.

鈥淭hey may opt for a shorter version, but what you lose in that is a greater ability to say detailed things about the depth of what students know and can do,鈥 said Derek C. Briggs, a nationally recognized assessment expert from the University of Colorado at Boulder who serves on both consortia鈥檚 technical advisory committees.

鈥淚t鈥檚 a slippery slope,鈥 he said. 鈥淥nce you start down that path, you may start losing the advantages of a groundbreaking assessment system and it might start resembling the testing systems we have now.鈥

Design Challenges

Experts cautioned that it can be daunting to build shorter and longer versions of a test without sacrificing the ability to compare results from one to those of the other. It鈥檚 also difficult to create a shorter version that measures a set of standards as meaningfully and consistently as a longer version, they said. Doing so requires careful attention to a host of psychometric and statistical concerns.

Gregory J. Cizek, a professor of educational measurement and evaluation at the University of North Carolina at Chapel Hill, said there are many examples of multiple versions of tests in use, such as the Iowa Tests of Basic Skills, the TerraNova, and states鈥 modified assessments for students with disabilities. Longer versions of a test may deliver higher levels of reliability and validity, he said, but shorter versions can produce levels that are still quite acceptable.

The 鈥減rime validity target鈥 in educational testing is content validity, the faithfulness with which the test measures the content of the standards, said Mr. Cizek, who serves on the Smarter Balanced technical advisory committee.

鈥淎 shorter test will reflect them a little bit less with fewer items to cover that terrain, so the validity is reduced a little bit,鈥 he said.

The key, he said, is to take care to make only those claims about student performance that are appropriate to the validity of the assessment.

鈥楪ood Luck and Bad Luck鈥

Another nationally known assessment expert, who declined to be identified because of the politically sensitive nature of the consortium work, cautioned that a shorter version of a test will have more measurement error than a longer version.

That doesn鈥檛 cause a problem when making inferences about certain results, such as the average score of all students who took the tests, he said. But he said that for others鈥攕uch as the proportions of students who scored at various achievement levels鈥攊t can cause significant distortion, tending to concentrate performance at the extreme ends of the spectrum.

鈥淚t鈥檚 like free throws in basketball,鈥 he said. 鈥淚f you give people five shots, some will get all five and some will get zero. If you let them shoot 100 times, hardly anyone will get zero or 100. With a short test, there is more spurious good luck and bad luck happening.鈥

There are statistical methods that can be applied to enable sound results in such cases, this expert said. But he expressed doubt that states have the capacity to apply those methods consistently to ensure accurate, responsible interpretations of test results.

And when they move from interpreting the two versions of the tests for groups, as states are expected to do for accountability, to using them to make decisions about individual students鈥攁s they plan to do in deciding whether high school students are 鈥渃ollege and career ready鈥濃攖he risk increases, he said.

鈥淎ny inferences about an individual from a shorter test will be noisier and less reliable,鈥 the expert said.

鈥淚f you鈥檙e going to make decisions about people,鈥 he said, 鈥測ou鈥檇 hate to make them based on a test where 30 percent of the time you would make a different decision if you used the long instead of the short version of the test.鈥

Coverage of the implementation of the Common Core State Standards and the common assessments is supported in part by a grant from the GE Foundation, at www.ge.com/foundation.
A version of this article appeared in the September 19, 2012 edition of 澳门跑狗论坛 as Two Versions of 鈥楥ommon鈥 Test Eyed

Events

This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of 澳门跑狗论坛's editorial staff.
Sponsor
Reading & Literacy Webinar
Literacy Success: How Districts Are Closing Reading Gaps Fast
67% of 4th graders read below grade level. Learn how high-dosage virtual tutoring is closing the reading gap in schools across the country.
Content provided by 
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of 澳门跑狗论坛's editorial staff.
Sponsor
Artificial Intelligence Webinar
AI and Educational Leadership: Driving Innovation and Equity
Discover how to leverage AI to transform teaching, leadership, and administration. Network with experts and learn practical strategies.
Content provided by 
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of 澳门跑狗论坛's editorial staff.
Sponsor
School Climate & Safety Webinar
Investing in Success: Leading a Culture of Safety and Support
Content provided by 

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide 鈥 elementary, middle, high school and more.
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.

Read Next

Standards & Accountability State Accountability Systems Aren't Actually Helping Schools Improve
The systems under federal education law should do more to shine a light on racial disparities in students' performance, a new report says.
6 min read
Image of a classroom under a magnifying glass.
Tarras79 and iStock/Getty
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of 澳门跑狗论坛's editorial staff.
Sponsor
Standards & Accountability Sponsor
Demystifying Accreditation and Accountability
Accreditation and accountability are two distinct processes with different goals, yet the distinction between them is sometimes lost among educators.
Content provided by Cognia
Various actions for strategic thinking and improvement planning process cycle
Photo provided by Cognia庐
Standards & Accountability What the Research Says More than 1 in 4 Schools Targeted for Improvement, Survey Finds
The new federal findings show schools also continue to struggle with absenteeism.
2 min read
Vector illustration of diverse children, students climbing up on a top of a stack of staggered books.
iStock/Getty
Standards & Accountability Opinion What鈥檚 Wrong With Online Credit Recovery? This Teacher Will Tell You
The 鈥渨hatever it takes鈥 approach to increasing graduation rates ends up deflating the value of a diploma.
5 min read
Image shows a multi-tailed arrow hitting the bullseye of a target.
DigitalVision Vectors/Getty