When tens of millions of schoolchildren sit down at computers to take new common assessments in spring 2015, many of their peers will be taking similar tests the old-fashioned way, with paper and pencil, raising questions about the comparability of results鈥攁s well as educational equity鈥攐n an unprecedented scale.
Both state consortia that are designing tests for the Common Core State Standards are building computer-based assessments, but they will offer paper-and-pencil versions as well, as states transition fully to online testing. The Smarter Balanced Assessment Consortium plans to run the two simultaneous 鈥渕odes鈥 of testing for three years. The Partnership for Assessment of Readiness for College and Careers, or PARCC, will do so for at least one year.
In order to rely on the results, however, the consortia must show that the paper and computer modes of the tests in English/language arts and mathematics measure the same things.
The prospect of establishing such comparability between two versions of a test isn鈥檛 new. States have long used established statistical and psychometric practices to do so when they update their paper-and-pencil tests, for instance, or when they transition from paper-based tests to computer assessments. But the challenge before the two consortia ups the ante by hanging the validity of far more children鈥檚 test scores on the 鈥渓inking鈥 or 鈥渆quating鈥 process conducted by each group.
鈥淚n the assessment profession, we need to be able to back up claims we make about students鈥 and schools鈥 performance. Any threat to validity is a threat to those interpretations,鈥 said Richard Patz, the chief measurement officer at ACT Inc., which is conducting comparability studies of its own as the Iowa City, Iowa-based company introduces a digital version of its college-entrance exam.
Thorny questions have arisen, too, about whether children who take the paper-and-pencil version of the consortia tests will be at a disadvantage鈥攐r perhaps have an edge鈥攃ompared with their peers who take the computer version.
Could children in high-poverty areas, where technological readiness will likely be lower, lose something valuable by not interacting with the new tests鈥 technologically enhanced items, such as drawing and drag-and-drop functions? Would they actually benefit by sticking with paper exams if they are more comfortable taking tests in that mode?
Mixed Landscape
Consortia leaders say they are confident that comparability and equity questions will be fully addressed by the time the tests make their debut in 2015.
鈥淚t鈥檚 something we need to do carefully, and we intend to do it carefully,鈥 said the executive director of the 25-state Smarter Balanced group, Joe Willhoft, who oversaw such studies as the assessment director in Washington state.
Jeffrey Nellhaus, the testing director for PARCC, which includes 18 states and the District of Columbia, said the group鈥檚 test designers are 鈥渧ery sensitive鈥 to comparability questions and are planning studies to answer them.
Both of the state testing consortia will include technology-enhanced questions on their computer-based exams, such as this interactive sample item from the Smarter Balanced group. You can also try your hand at interactive sample questions from the Smarter Balanced consortium.
SOURCE: Smarter Balanced Assessment Consortium
About 40 million students attend school in the states that belong to the two consortia. But much is still unknown about how many will take paper tests in 2015 and how many will use a computer. Even rough feedback, however, shows a strong likelihood that large swaths of students will be picking up their No. 2 pencils.
Survey data collected in July by Smarter Balanced鈥攁lso more of an approximation than a full accounting鈥攕how a wide range of technological readiness.
Oregon, long a leader in online assessment, reported that all its districts were capable of giving tests online, while only 45 percent of California鈥檚 districts did likewise. For PARCC, Mr. Nellhaus ventured a guess of a 50-50 split, but emphasized that data on districts鈥 and schools鈥 readiness are far from complete.
The consortia will not decide who takes the paper-and-pencil version of the test and who takes the computer version, officials said. That will be up to states, and in some cases, individual districts or schools.
Ideally, test results are 鈥渋ndifferent鈥 to the mode in which the test is given, said Henry Braun, a longtime researcher with Princeton, N.J.-based test-maker Educational Testing Service and now an expert in educational evaluation and measurement at Boston College. If the mode of administration helps or hampers some students, the results are distorted, he said.
Differences in Format
Assessment experts say it鈥檚 much easier to establish comparability when two tests are similar in format, such as a multiple-choice test on paper that becomes a multiple-choice test on the computer. But even then, comparability issues can arise.
A student who must read a text passage in order to answer a multiple-choice question, for example, might be able to read the entire passage on one page of the paper test, but on the computer, she must scroll up and down to do so. Such shifts can affect the performance of some students, said a longtime assessment expert at a major testing company. (Like most experts interviewed for this story, he agreed to speak only if his name was withheld because of his employer鈥檚 contracts with the assessment consortia.)
Comparability challenges deepen when tests differ significantly in format, experts said. In the case of the two state consortia, their computer-based exams鈥攚ith technology-enhanced items such as interactivity and animation, and longer, more complex performance tasks鈥攚ill be able to represent ideas in ways that the paper versions cannot, so establishing comparability between the two will be tougher.
鈥淲hen an assessment has types of items only available in one mode, it creates a greater challenge for establishing comparability, but it鈥檚 a familiar one and it鈥檚 generally a manageable one,鈥 said ACT鈥檚 Mr. Patz.
The other expert, however, said that while the consortia鈥檚 comparability challenge is 鈥渘ot a fatal problem, it needs to be thoughtfully negotiated and represented to anyone who will use those test scores.鈥
That source said it鈥檚 not possible to measure everything in the paper-and-pencil version that can be measured in the computer-based version.
鈥淚n the technical sense of 鈥榗omparable,鈥 the two might not be comparable,鈥 he said. 鈥淚f you were successful in measuring the same things, which would be a stretch if the computer-based version鈥檚 items are truly innovative, it could well be the case that one [test] could be harder or easier than the other because of how the items are presented.鈥
Writing From Scratch
Assessment specialists outlined various ways to establish comparability between the paper and computer versions of a test. One is to use a set of common items in both, so test designers can compare student performance on those items in the two modes. Another is to randomly assign students to take one or the other mode of the test. Better yet, a study group of students can be selected to take both the paper and computer versions. Consortium officials said such methods are being planned or considered for field tests next spring.
Testing experts also said it鈥檚 best to create assessment questions from scratch for the paper-based assessment, rather than building paper versions of test items originally designed for the computer.
鈥淵ou can鈥檛 replicate the interactivity of the computer environment on paper,鈥 said one testing expert. 鈥淵ou need to build alternate forms of the test that measure the same standards [on paper].鈥
Mr. Willhoft from Smarter Balanced said that his group is adapting items written for the online environment to paper. Mr. Nellhaus from PARCC said its developers are writing paper items from scratch to use in place of technology-enhanced items on the computer, but more traditional item types can be used in both modes.
PARCC鈥檚 field test next spring will include paper-based as well as computer-based exams, Mr. Nellhaus said. The Smarter Balanced field test will include paper forms only for a small group of students, to study comparability, Mr. Willhoft said. 鈥淭here鈥檚 no denying that there will be some items that will be difficult to translate into the paper environment,鈥 said Mr. Willhoft. One of the consortium鈥檚 math items, for instance, asks students to click on images of a cylindrical shape and a rectangular one in an exercise about volume. 鈥淏ut there鈥檚 nothing inherent in a given standard that requires a certain kind of interactive item,鈥 he said. 鈥淵ou can measure the same standard in different ways.鈥
Smarter Balanced faces an extra layer of complexity in comparability because its test is computer-adaptive, meaning it adjusts questions to the test-taker鈥檚 skill level.
鈥淲ith an adaptive test, you see right away what questions a kid needs,鈥 said Lauress L. Wise, a principal scientist with the Monterey, Calif.-based Human Resources Research Organization, which has performed quality assurance and evaluation on testing systems such as the National Assessment of Educational Progress. 鈥淲ith paper and pencil, you鈥檇 have to offer a lot more questions鈥攁 longer test鈥攖o make it comparable to that. If you can鈥檛 do that, you won鈥檛 be measuring the end points [of achievement] as well.鈥
Mr. Willhoft acknowledged that the paper version of the Smarter Balanced test will be 鈥渓ess precise, with a larger measurement error鈥 at those points in the spectrum.
In seeking comparability, a key consideration is what kinds of conclusions will be drawn from the scores on the two types of tests, said Mr. Wise. The degree of comparability takes on added significance when high-stakes decisions are based on the results, he said.
鈥淚f this were a graduation test, and some kids were getting denied diplomas because they took one form or another, you could make a plausible argument why there could be a lawsuit,鈥 Mr. Wise said. 鈥淭hat could get sticky.鈥
Quality of Tasks
The fact that paper-and-pencil tests might be more widely used in lower-income areas is something that officials at the Education Trust, which advocates school improvement for disadvantaged students, are keeping an eye on. But those potential questions of equity revolve more around the quality of the assessment鈥攁nd the teaching that goes with it鈥攖han about the mode of the test, they say.
Christina Theokas, the organization鈥檚 director of research, said she worries that if the paper test is less complex and instructionally rich than the computer version, classroom instruction could mirror that.
But students aren鈥檛 necessarily at a disadvantage just by taking a paper-and-pencil test, said Sonja Brookins Santelises, the Education Trust鈥檚 vice president of K-12 policy and practice. Top-notch paper tests such as NAEP and Massachusetts鈥 statewide exams demonstrate that, she said. The important thing to watch is not the mode in which a test is administered, Ms. Santelises said, but 鈥渢he quality of the task鈥 and how well students are prepared for it.
鈥淵ou can do a rudimentary task on a computer and have it not be beneficial, and you can have a paper-and-pencil task that鈥檚 instructionally rigorous and very beneficial,鈥 she said. 鈥淎re students going to have access to the kind of experiences and curriculum that prepare them for those kinds of tasks? Are teachers being prepared and supported to do that?鈥
Ms. Santelises added: 鈥淲e need to stay focused on the teaching and learning, rather than on whether we have the right technology to give a test.鈥
Take the test: Try your hand at interactive sample questions from the Smarter Balanced consortium.