The Every Student Succeeds Act invites states and districts to use interim assessments in a new way: by combining their results into one summative score for federal accountability.
But testing experts say it can be difficult to produce valid scores that way, and warn that the approach can limit teachers鈥 curricular choices.
The No Child Left Behind Act required states to give 鈥渉igh-quality, annual tests鈥 that measure students鈥 year-to-year progress in math and English/language arts. ESSA, the latest revision of the Elementary and Secondary Education Act, signed into law Dec. 10, says that 鈥渁t the state鈥檚 discretion,鈥 student achievement and growth can be measured 鈥渢hrough a single summative assessment鈥 or 鈥渢hrough multiple, statewide interim assessments during the course of the academic year that result in a single summative score.鈥
Summative tests are typically measurements of learning taken when instruction is complete, while interim tests more often measure students鈥 progress toward learning goals, after specific sections of instruction.
No one knows yet whether states will use the new option in the law, but it has the potential to make a big market bigger. Industry analysts estimate that total U.S. spending on 鈥渃lassroom assessment鈥鈥攚hich includes formative, interim, and benchmark tests鈥攖otals $1.3 billion per year, outdoing the $1.1-billion-per-year statewide summative assessment market.
States that explore the new option might consider interim tests from the commercial market, or they could design their own tests by buying鈥攐r writing鈥攓uestions or tasks and administering them statewide at specified points in the year.
Which interim tests the U.S. Department of Education will consider acceptable for summative results is an open question, since regulations and guidance on the new law haven鈥檛 been written yet. And states will have to prove to the Education Department that their tests are valid for their intended purpose.
But states that choose to go the interim-testing route will have to grapple with key issues affecting their validity, and their power to shape curriculum, assessment experts say.
Bid for Better Assessment
In adding the new language, lawmakers wanted to recognize educators鈥 desire to use 鈥渕ore authentic鈥 kinds of assessment, according to a former Senate staffer who worked on early versions of the bill.
鈥淚t came from the idea that [assessment] is not just a one-time, one-day, multiple-choice exam, that doing these authentic performance-based tasks over the course of the school year should be recognized as equivalent to a one-time, one-day evaluation,鈥 the staffer said.
Designing such a system for valid statewide results, however, is rife with challenges.
Psychometricians cringe when policymakers and educators use a test to measure things it wasn鈥檛 designed to measure because it compromises the validity of the results. Combining interim test results into a summative score runs that risk, assessment experts said, but it depends on what tests states use and how.
鈥淚t seems like this could be the full employment act for psychometricians arguing back and forth about how to combine interims properly into a single summative,鈥 said Lauress L. Wise, immediate past president of the National Council on Measurement in Education, the association that sets standards for best practice in assessment. 鈥淭here are ways to do it, but there aren鈥檛 good examples of it yet.鈥
States considering exploring the law鈥檚 new option must be particularly careful if they鈥檙e thinking of using off-the-shelf interim tests to produce a single summative result, said Derek C. Briggs, the chairman of the research and evaluation and methodology department at the University of Colorado-Boulder.
鈥淚t sounds appealing to use the interims you already have for that, but I don鈥檛 think people appreciate that the current system of interim tests isn鈥檛 designed to be a replacement for summative tests,鈥 Briggs said.
The Northwest Evaluation Association, maker of one of the most widely used interim-testing systems, the Measures of Academic Progress, or MAP, agreed.
鈥淥ur interims as they鈥檙e designed today are not appropriate for use as a summative,鈥 said Donna McCahon, the company鈥檚 assessment director. NWEA is currently refining a computer-adaptive testing system that blends interim and summative elements, and would be more appropriate for that use, she said.
Need to Reflect Curriculum
To be combined for a valid summative score, interim tests 鈥渘eed to be written with a particular curriculum in mind,鈥 Briggs said. 鈥淚f they鈥檙e misconnected to the curriculum, there are all kinds of problems.鈥
For instance, one school might focus deeply on a topic for the first few months of the school year, and its students will do well on that interim test, Briggs said, but a school that chose to spread the same subject matter out across the year would risk having its students do poorly on that first interim.
To avoid those kinds of inequities, and to create results that are valid and comparable statewide, schools would have to teach shared curriculum topics in the same order, Briggs and other experts said.
鈥淲hen people realize this, they鈥檒l be concerned. It could introduce a kind of conformity most people would balk at,鈥 said Gregory J. Cizek, a professor of educational measurement and evaluation at the University of North Carolina-Chapel Hill.
To avoid pushback against 鈥渓ock-step鈥 instruction, a state would have to engage teachers, parents, policymakers, and others in a dialog鈥攁nd build a strong consensus鈥攁bout what should be taught and when, experts said.
Fear of that kind of backlash, amplified by anger about federal intrusion on instruction sparked by the common core, was one reason that the Partnership for Assessment of Readiness for College and Careers, or PARCC, one of the two federally funded testing consortia, abandoned its initial 鈥渢hrough-course鈥 design, which would have spread out summative testing across the year.
Some districts and states are already working on blending different types of assessment into a year-end result.
New Hampshire has drawn attention for a pilot program that uses a mixture of tests from the Smarter Balanced Assessment Consortium and locally developed performance tasks. About 10 other states are investigating variations on testing that include features such as blending year-end summative tests with competency-based tests given throughout the year, said Jennifer Davis Poon, the program director of the Innovation Lab Network at the Council of Chief State School Officers, which is working with those states.
But states that venture into such projects should recognize that there are 鈥渁 lot of technical hurdles to overcome,鈥 Poon said. A particular challenge in New Hampshire is figuring out how to get comparable results across locally developed tasks that vary from one district to another, she said.