When Delaware switched to computer-adaptive testing for its state assessments three years ago, officials found the results were available more quickly, the amount of time students spent taking tests decreased, and the tests provided more reliable information about what students knew鈥攅specially those at the very low and high ends of the spectrum.
But the path to launching those tests involved a significant education of students, parents, and teachers, a sizeable technology investment by the state, and the development of hundreds of test items for every exam.
As many states move to put in place online testing tied to the Common Core State Standards in 2014-15, at least 20 states have indicated they plan to use new computer-adaptive versions of the tests, and they鈥檙e looking at states like Delaware to learn some lessons.
鈥淎daptive testing is really beneficial and can pinpoint a student鈥檚 learning level more closely,鈥 says Gerri Marshall, the supervisor of research and evaluation for the 15,000-student Red Clay Consolidated School District in Wilmington, Del., which piloted such tests.
Nationally, two coalitions have received federal funding to develop English/language arts and mathematics tests for the common standards. Both coalitions鈥攖he Smarter Balanced Assessment Consortium and the Partnership for Assessment of Readiness for College and Careers, or PARCC鈥攈ave said their assessments will feature high-tech, interactive questions that incorporate video and graphics and are designed both to identify what students know and to be more engaging.
Both assessments will be given online, but Smarter Balanced will use adaptive testing, while PARCC will use what are known as fixed-form tests, which feature set questions that generally do not change.
Related Story: Common Adaptive Tests to Address Special Needs
Only a handful of states鈥攊ncluding Delaware, Hawaii, and Oregon鈥攁re now using adaptive testing on a widespread basis. Even supporters acknowledge challenges to its implementation and use, considering that many school districts are currently doing little, if any, testing online.
鈥淚t鈥檚 a big philosophical shift for people,鈥 says John Jesse, the director of assessment and accountability for the Utah department of education, which is in the process of developing its own computer-adaptive tests for the common core. 鈥淚f your district is still using paper, shifting to online is big, and then shifting to adaptive testing might be too much of a move all at once.鈥
Seeking Greater Precision
So what exactly is the difference between a traditional test, which presents a student with a set number of test items that don鈥檛 change during test-taking, and adaptive testing?
Testing experts say that traditional, or fixed-form, exams work well with the majority of students, who hover around the level the assessment is seeking to evaluate. Test questions are developed to appeal to most students and can assess how much those students know.
However, students at the farther ends of the spectrum鈥攈igh achievers and struggling students鈥攆are worse on those types of tests in terms of allowing teachers to identify exactly what material those students have or have not mastered.
With exceptional students, a fixed test can鈥檛 determine just how extensive their knowledge may be, and for struggling learners, it can鈥檛 determine how far behind they may be. A teacher won鈥檛 know exactly how far gaps in students鈥 learning on certain concepts go because the test questions don鈥檛 move far in that direction.
鈥淭he range of proficiency among kids in a grade is huge,鈥 says Jon Cohen, the executive vice president and director of assessment for the Washington-based American Institutes for Research, which is already delivering statewide adaptive tests in several states and has been selected by the Smarter Balanced consortium to do pilot and field testing and to create the adaptive-test algorithm.
鈥淲ith a typical test, a kid who is struggling is not going to see many items they can get right, and a kid at the top is not going to see many items they鈥檒l get wrong,鈥 he says. 鈥淜ids on the ends get a less precise score.鈥
Adaptive tests operate from a large test-item bank. For example, for a 40-question test, an adaptive test bank might contain 800 items, Cohen says.
An algorithm guides the computer as it picks questions based on the answer given to previous questions to pinpoint a student鈥檚 skill and knowledge level. Typically, a student will get about half the questions offered by the computer correct, whether he or she is a high, middle, or low performer, since the questions are tailored for that student鈥檚 particular level.
鈥淲ith a computer-adaptive test, the percent correct is no longer relevant,鈥 says Tony Alpert, the chief operating officer for Smarter Balanced. 鈥淭he adaptive test is always challenging for every student, and we need to help people understand that.鈥
Computer-adaptive assessments aren鈥檛 scored on the basis of how many right or wrong answers a student gets. A student鈥檚 score depends both on the number of items he or she got right and the difficulty of the items presented. Early trials, or field tests, present items to representative samples of students to evaluate the difficulty of each item in the pool and to translate that into values that will provide a score, Cohen says.
Personalization Improves Security
The biggest advantage to a computer-adaptive test, experts say, is the ability to evaluate all students at their own levels. Because of that, students often report that they are more engaged with the test and find it more interesting, says Dirk P. Mattson, the executive director of K-12 assessment for the Educational Testing Service, who is based in the nonprofit testing company鈥檚 San Antonio office. ETS, which has been hired by Smarter Balanced to develop several aspects of the computer-adaptive test, also produces the GRE, an adaptive graduate school admissions test.
鈥淭here鈥檚 a belief that this provides a more rewarding testing experience for the test-taker,鈥 Mattson says. 鈥淎 struggling student doesn鈥檛 need to be beaten over the head encountering lots of questions they can鈥檛 handle, 鈥 and the student who is strong might welcome an additional challenge.鈥
In addition, because each test for each student is personalized and there are so many test questions in the bank, security risks are lessened, says Doug Kosty, the assistant superintendent for assessment and information services for the Oregon department of education. His state has used computer-adaptive testing for nine years.
It鈥檚 unlikely that students sitting near each other would encounter the same test questions in the same order, for example. A student 鈥渃an鈥檛 go out on the playground and compare notes on question 14,鈥 Kosty says. 鈥淜ids are basically guaranteed not to have the same test.鈥
Some educators who have used adaptive testing say the test window is shorter since students don鈥檛 always have to answer as many questions. In Delaware, students used to spend multiple hours taking state reading and math tests, says Michael Stetter, the director of accountability and resources for the Delaware department of education. The computer-adaptive tests shrank that time to one hour for reading and one hour for math, he says, making it easier for schools to schedule test times around computer labs. 鈥淲e鈥檙e getting a more precise estimate of ability with the same or fewer questions,鈥 Stetter says.
However, Smarter Balanced鈥檚 tests are expected to take 10 to 13 hours, depending on grade levels. Because of concerns from states, the coalition is now developing a shorter version it says will produce comparable results.
In addition, users of computer-adaptive testing laud the immediacy of the assessment results, which typically are posted when a student finishes the test, giving teachers the opportunity to adjust their instruction more quickly based on the results. Officials from both coalitions say some results will be available almost immediately or within days, while results from sections that contain more writing and constructed response may take several weeks.
But in the field, implementation of computer-adaptive tests can pose problems. Much like the PARCC tests, the Smarter Balanced tests will be given online, and that means schools will have to have enough devices and bandwidth. Delaware had to allocate funds to buy additional servers for districts and the state distributed 10,000 netbooks to get schools ready; the state also had to redesign training for teachers who were going to be test administrators. Districts are raising concern about lengthy testing windows tapping out their bandwidth for long periods of time and having enough devices with the right specifications to run the test.
Computer-adaptive tests can also be costly to develop since so many test items are needed. 鈥淭he early years of computer-adaptive testing are extremely expensive,鈥 Stetter says. However, since his state鈥檚 development of initial computer-adaptive tests, costs have dropped, he says, as test banks can be used for a long time.
Smarter Balanced estimates that once its adaptive tests are fully developed, its test bank will contain at least 30,000 items across all grades.
鈥淥nce you have an adaptive-testing pool, you can continue to run it for a long period of time, so there are a lot of efficiencies gained,鈥 says Walter 鈥淒enny鈥 Way, the senior vice president of psychometric and research services for the education publisher Pearson, based in London.
Smarter Balanced received $160 million and PARCC received $170 million in federal grants to develop the common assessments. Once the tests are ready, states will be expected to pay for them, but just how much and how those payments will be structured is still being worked out.
鈥楾wo Viable Solutions鈥
Experts seem to agree that computer-adaptive testing works well with multiple-choice questions, or one-word-response questions, but there are differing opinions about how it does with longer answers or with essays. That makes computer-adaptive testing more suited to some subjects than others. Oregon, for example, uses a writing assessment that is not adaptive that takes students three hours to complete.
鈥淭hings that are essays or that contain more complicated projects that need to be evaluated through human judgment really can鈥檛 be administered [through computer-adaptive testing] in this situation,鈥 says John Mazzeo, the vice president of research and development for ETS, based in Princeton, N.J.
Despite those limitations, Alpert says the English/language arts and literacy component of the Smarter Balanced assessments will be adaptive. The only exceptions will be a handful of performance tasks, which may be longer activities that take place in the classroom or offline.
As states and schools get ready to address the challenges of adaptive testing, training students, educators, and even parents becomes increasingly important, says Steve Slater, the lead psychometrician for the Oregon education department.
Melissa Fincher, the associate superintendent for assessment and accountability in Georgia, a state that has joined the PARCC coalition, says she appreciates the fact that the federal government has financed the work of both coalitions.
鈥淭he jury is still out, and I see this as an opportunity to look large-scale at the best way to assess students,鈥 she says. 鈥淚 don鈥檛 see this as an either-or situation. I鈥檓 pleased we have two viable solutions in the works.鈥