Some test questions are likely harder to answer on tablets than on laptop and desktop computers, presenting states and districts with a new challenge as they move to widespread online assessments.
Analyses by test providers and other organizations have pointed to evidence of small but significant 鈥渄evice effects鈥 for tests administered in some grades and subjects and on certain types of assessment items.
The results in many cases do not follow a clear pattern. And the most comprehensive studies to date鈥攁nalyses of 2014-15 test results from millions of students who took the tests aligned to the Common Core State Standards designed by two major assessment consortia鈥攃oncluded that overall test results were comparable, despite some discrepancies across individual test items.
But much remains uncertain, even among testing experts. A recent analysis commissioned by the Partnership for Assessment of Readiness for College and Careers, for example, found that test-takers in Ohio鈥攈ome to 14 percent of all students who took the 2014-15 PARCC exams鈥攑erformed significantly worse when taking the exams on tablets. Those students鈥 poor showing remains unexplained.
鈥淎bsolutely, this preliminary evidence leads to questions,鈥 said Marianne Perie, the director of the Center for Educational Testing and Evaluation at the University of Kansas. 鈥淲e鈥檙e so new into this and we need so much more research.鈥
In its June report titled 鈥淪core Comparability Across Computerized Assessment Delivery Devices,鈥 the Council of Chief State School Officers offered four recommendations for states:
1. Identify the comparability concerns being addressed
From different devices to multiple test formats, there are a variety of factors that can make student test scores not directly comparable to each other. In order to minimize potential threats, state officials first need to be clear what they鈥檙e dealing with.
2. Determine the desired level of comparability
For most states, this will mean 鈥渋nterchangeability,鈥 in which test scores are reported without regard to the device a student used.
3. Clearly convey the comparability claim or question
In the contemporary testing environment, states may be wise to embrace some level of flexibility, by claiming, for example, only that students took tests on the devices most likely to produce accurate results, rather than claiming that students would have received the exact same score, no matter which device they used.
4. Focus on the device
When administering tests on different devices, it鈥檚 important to ensure that all devices meet recommended technical specifications, and that students are familiar with the device they will be using.
Source: Council of Chief State School Officers
The 2015-16 school year marked the first in which most state-required summative assessments in elementary and middle schools were expected to be given via technology. Over the past decade, states and districts have spent billions of dollars buying digital devices, in large measure to meet state requirements around online test delivery.
To date, however, relatively little is known about how comparable state tests are when delivered on desktop computers, laptops, tablets, or Chromebooks. Each type of device has different screen sizes and ways of manipulating material鈥攖ouchscreen vs. mouse, for example鈥攁nd inputting information鈥攕ay onscreen vs. detached keyboard鈥攆actors that could contribute to different experiences and results for students.
In an attempt to summarize research to date, the Council of Chief State School Officers released last month a report titled 鈥Score Comparability Across Computerized Assessment Delivery Devices.鈥
鈥淒evice effects鈥 are a real threat to test-score comparability, the report concludes, one of many potential challenges that state and district testing directors must wrestle with as they move away from paper-and-pencil exams.
From a practical standpoint, researchers say, the key to avoiding potential problems is to ensure that students have plenty of prior experience with whatever device they will ultimately use to take state tests.
Struggles in Ohio
In February, 澳门跑狗论坛 reported that the roughly 5 million students across 11 states who took the 2014-15 PARCC exams via computer tended to score lower than those who took the exams via paper and pencil. The Smarter Balanced Assessment Consortium, the creator of exams given to roughly 6 million students in 18 states that year, also conducted an analysis looking for possible 鈥渕ode effects.鈥
In addition to looking for differences in scores between computer- and paper-based test-takers, both consortia also looked for differences in results by the type of computing device that students used.
Smarter Balanced has not yet released the full results of its study. In a statement, the consortium said that its findings 鈥渋ndicated that virtually all the [test] items provide the same information about students鈥 knowledge and skills, regardless of whether they use a tablet or other device.鈥
A PARCC report titled 鈥淪pring 2015 Digital Devices Comparability Research Study,鈥 meanwhile, reached the same general conclusion: Overall, PARCC testing is comparable on tablets and computers.
But the report鈥檚 details present a more nuanced picture.
Numerous test questions and tasks on the PARCC Algebra 1 and geometry exams, for example, were flagged as being more difficult for students who took the tests on tablets. On the consortium鈥檚 Algebra 2 exam, some questions and tasks were flagged as being more difficult for students taking it on a computer.
The analysis of students鈥 raw scores also found that in some instances students would have likely scored slightly differently had they taken the exam on a different device. For PARCC鈥檚 end-of-year Algebra 1, geometry, and Algebra 2 exams, for example, students who used computers would likely have scored slightly lower had they been tested on tablets.
And most dramatically, the researchers found that students in Ohio who took the PARCC end-of-year and performance-based exams on tablets scored an average of 10 points and 14 points lower, respectively, than their peers who took the exams on laptops or desktop computers. The researchers concluded that those results were 鈥渉ighly atypical鈥 and decided to exclude all Ohio test-takers (representing 14 percent of the study鈥檚 overall sample) from their analysis.
When Ohio鈥檚 results were included, though, 鈥渆xtensive evidence of device effects was observed on nearly every assessment.鈥
PARCC officials were not able to definitively say why tablet test-takers performed so poorly in Ohio. They speculated that the results might have been skewed by one large tablet-using district in which students were unusually low-performing or unfamiliar with how to use the devices.
Perie of the Center for Educational Testing and Evaluation said more data鈥攊ncluding the full extent of the apparent device effect in Ohio鈥攕hould have been presented to help practitioners draw more informed conclusions.
鈥淭ypically in research, we define our parameters before looking at the results,鈥 Perie said. 鈥淚f the decision to drop the anomalous state was made after looking at that data, that could be problematic.鈥
Screen Size, Touchscreen
In its roundup of research to date, meanwhile, the CCSSO noted a number of studies that have found some evidence of device effects. Among the findings: some evidence that students taking writing exams on laptops tend to perform slightly worse than their peers who used desktop computers, and signs that students generally experience more frustration responding to items on tablet interfaces than on laptops or desktops.
The report also examines research on the impact of specific device features. Screen size, for example, was found to be a potential hurdle for students, especially for reading passages. Smaller screens that held less information and required students to do more scrolling led to lower scores, according to a 2003 study.
Touchscreens and on-screen keyboards, both features of many tablet devices, also appear to put students at a disadvantage on some test items. Technology-enhanced performance tasks that require precise inputs can be challenging on touchscreen devices, and students tend to write less鈥攊n response to essay prompts, for example鈥攚hen using an onscreen keyboard.
Overall, said Perie, she would not go so far as to advise states and districts to avoid using tablets for online testing, but there are 鈥渁bsolutely some questions鈥 about how students perform on tablets.
The CCSSO, meanwhile, offered an extended set of recommendations for states.
Ultimately, the group said, states and districts will want to be able to use test scores interchangeably, regardless of the device on which the exams are taken.
To be able to do so with confidence, they鈥檙e going to have to conduct in-depth analyses of their results in the coming years, said Scott Norton, the group鈥檚 strategic-initiative director for standards, assessment, and accountability. 鈥淒evice comparability,鈥 he said, 鈥渋s definitely something that states should be paying attention to.鈥