A previous version of this article misstated the percentage of educators who believe artificial intelligence will make standardized testing worse, according to a survey conducted by the EdWeek Research Center earlier this year.
Here鈥檚 a multiple-choice question: Which of the following have educators said is a problem with current state standardized tests?
- a. Teachers don鈥檛 get the test data back quickly enough.
- b. The exams are not personalized for students鈥 interests or learning needs.
- c. The exams don鈥檛 measure what students really need to know.
- d. All of the above
The correct response, d., points to the big, long-standing problems with today鈥檚 standardized tests. That raises another, more recent question that has been coming up in education circles: Can artificial intelligence mitigate those problems and help standardized testing improve significantly?
For now, there鈥檚 no hard and fast answer to that question. While AI has the potential to help usher in a new, deeper breed of state standardized tests, there are plenty of reasons for caution.
On the one hand, testing has long been due for a facelift, many experts argue.
The tests students now take鈥攑articularly the state standardized assessments that carry significant stakes for schools and districts鈥攚ere developed for a time when the 鈥渄ominant testing model was a lot of students sitting in a gym, taking a pencil and paper test,鈥 said Ikkyu Choi, a senior research scientist in the research and development division of ETS, a nonprofit testing organization.
AI may be able to 鈥減rovide much more engaging and relevant types of scenarios, conversations, interactions that can help us measure the things that we want to measure,鈥 Choi said, including students鈥 ability to think critically and communicate. 鈥淲e鈥檙e quite interested and excited, with the caveat that there are a lot of things that we need to be aware of and be careful about.鈥
AI鈥檚 greatest potential at this moment seems to be in helping with the nuts and bolts of assessments鈥攊ncluding generating test items and scoring them more efficiently, as well as providing more actionable feedback to educators on their students鈥 strengths and weaknesses.
Technologies like natural language processing鈥攖he ability of AI to listen and respond to human speech in real time鈥攎ay make it possible to gauge some of the skills educators say most traditional tests simply cannot do, such as creativity and problem-solving abilities.
But the technology comes with its own problems, experts add. For one thing, AI often cites wrong information, without a clear explanation of where it originated.
Plus, because AI is trained on data created by humans, it reflects human biases. In one , AI tools gave a lower grade to an essay that mentioned listening to rap music to enhance focus, compared with an otherwise identical essay that cited classical music for the same purpose.
Educators aren鈥檛 especially enthusiastic about the potential of AI to make testing better. In fact, more than a third of district and school leaders and teachers鈥36 percent鈥攂elieve that because of AI, standardized testing will actually be worse five years from now.
Fewer than 1 in 5鈥19 percent鈥攂elieve the technology might improve the assessments. The survey by the EdWeek Research Center of 1,135 educators was conducted from Sept. 26 through Oct. 8 of this year.
How AI might help capture more sophisticated thinking skills
One of the most-cited problems with the current breed of state standardized tests: Teachers don鈥檛 often see the results of tests their students take in the spring until the following school year, when it is typically too late to make any changes to instruction that could help students.
Multiple-choice tests are relatively easy and inexpensive to score, and much of that work can be automated, even without AI. But those exams can only capture a limited portion of students鈥 knowledge.
For instance, Matt Johnson, a principal research director in the foundational psychometrics and statistics research center at ETS, would love to be able to give students credit on an assessment for successfully working out multiple steps of a problem even if they ultimately arrive at the wrong answer because of a simple calculation error. That is essentially the approach many teachers use now.
Analyzing students鈥 work in that way would take significant muscle and manpower for human scorers. But it might be a simpler proposition if AI tools鈥攚hich can recognize and process human writing鈥攚ere employed. The technology, however, hasn鈥檛 reached the point where it can assess students鈥 thinking process reliably enough to be used in high-stakes testing, Johnson said.
Even so, AI may help speed up scoring on richer tests, which ask students to write a constructed response or short essay in answer to a problem. Typically, grading those questions requires a team of teachers all working with the same scoring guidelines and reviewers to check the fairness of their assessments鈥.
That, however, is where questions about bias surface. Parents have also expressed concerns about relying on machines to score student essays, on the assumption that machines would be less effective at understanding students鈥 writing.
For the foreseeable future, human beings will still play an integral role in scoring high-stakes tests, said Lindsay Dworkin, the senior vice president of policy and government affairs at NWEA, an assessment organization.
鈥淚 don鈥檛 think we鈥檙e ready to take things that have historically been deeply human activities, like scoring of, you know, constructed-response items, and just hand it over to the robots,鈥 she said. 鈥淚 think there will be a phased-in period where we see how it goes but we make sure it鈥檚 passing through teachers鈥 hands.鈥
Despite that gradual approach, AI may be able to offer more actionable feedback to teachers about their practice so that they can improve their teaching, Dworkin said.
For instance, a language arts teacher with a class of 30 kids could ask an AI tool: 鈥淭ell me what all of my students collectively did well. Tell me what they didn鈥檛 do well. Tell me the skill gaps that are missing?鈥 Dworkin said. 鈥淚s everybody failing to give me strong topic sentences? Is everybody failing to write a conclusion?鈥
Big experiment on AI and testing about to begin
One high-profile experiment in using AI for standardized assessment is about to get underway. The 2025 edition of the Program for International Student Assessment, or PISA, is slated to include performance tasks probing how students approach learning and solve problems.
Students may be able to use an AI-powered chatbot to complete their work. They could ask it basic questions about a topic so that the test could focus on their thinking capability, not whether they possess background knowledge of a particular subject.
That prospect鈥攁nnounced at a meeting of the Council of Chief State School Officers earlier this year鈥攇ot an excited reaction from some state education leaders.
Their enthusiasm may reflect concerns about whether the current batch of state standardized tests capture the kinds of skills students will need in postsecondary education and the workplace.
More than half of educators鈥57 percent鈥攄on鈥檛 believe that state standardized tests鈥攚hich generally focus on math and language arts鈥攎easure what students need to know and be able to do, according to the EdWeek Research Center survey.
States are increasingly focused on creating 鈥減ortraits of a graduate鈥 that consider the kinds of skills students will need when they enter postsecondary education or the workforce. But right now, state standardized tests emphasize language arts and math skills, and that can carry big consequences, said Lillian Pace, the vice president of policy and advocacy for KnowledgeWorks, a nonprofit organization that works to personalize learning for students.
鈥淲e are missing the picture entirely on whether we鈥檙e preparing students for success鈥 by ignoring kids鈥 ability to work across disciplines to solve more complex problems, Pace said. 鈥淲hat might it look like if AI opens the door for us to be able to design integrated assessments that are determining how well students are using knowledge to demonstrate mastery鈥 of skills such as critical thinking and communication.
That prospect鈥攖hough intriguing鈥攚ill take significant work, even with AI鈥檚 help, said Joanna Gorin, now the vice president of the design and digital science unit at ACT, a public benefit assessment corporation.
In a previous role, Gorin helped teams design a virtual task that asked students to decide whether a particular historical artifact belonged in their town鈥檚 museum. The simulation required students to interview local experts and visit a library to conduct research.
The task was designed to give insight into students鈥 communication skills and ability to evaluate information. That鈥檚 the kind of test many educators would like to move toward, she said.
鈥淪tates want to move [toward richer assessments] because there鈥檚 incredible promise from AI, and it can potentially get them the kind of information they really want,鈥 Gorin said.
But that could come with complications, even with AI鈥檚 help, she added. 鈥淎t what point are [states] willing to make the trade-offs that would come along with it, in terms of cost, in terms of technology requirements, in terms of other possible effects on how they teach?鈥
For instance, creating and reliably scoring performance tasks with AI would require significant data, meaning a lot of students would have to participate in experimental testing, Gorin said.
Given all that, 鈥淚 do not foresee full-blown performance assessment, simulation-based AI-driven assessments in K-12, high-stakes, large-scale assessment鈥 for quite some time, Gorin said.
AI could help generate better test questions, faster
Instead, Gorin expects that AI will help inform testing in other ways, such as helping to generate test questions.
Say an educator鈥攐r a testing company鈥攈as a passage they want to use on an exam, Gorin said. 鈥淐an I use AI to say 鈥榳hat would be the best types of items to build based on this [passage] or the reverse, what passages would work best based on the types of questions that I need to generate?鈥 she said.
AI could also write the initial draft of an item, and a human could 鈥渃ome in and take it from there,鈥 Gorin said. That would allow test-makers to be 鈥渕ore efficient and more creative,鈥 she said. Being able to create test items faster could be a key to personalizing tests to reflect students鈥 interests and learning needs.
If a goal of an assessment was to figure out if students understood say, fractions, it could offer a baking enthusiast a set of questions based on a chocolate chip cookie recipe and a sports-loving student another set based on the dimensions of a football field.
It could be possible to train AI to craft questions on different topics that measure the same skill, experts say. But it would be difficult鈥攁nd pricey鈥攖o 鈥渇ield test鈥 them. That entails having real students try them out to ensure fairness.
That means change will likely come first and most dramatically to teacher-created exams for classrooms, which may determine student grades, as opposed to state standardized tests, which evaluate how teachers and schools are performing.
In fact, teachers are already experimenting with the technology to create their own tests. One in 6 teachers have used AI to develop classroom exams, according to the EdWeek Research Center survey.
When a version of ChatGPT that could spit out remarkably human-sounding writing in minutes was released in late 2022, it seemed to come out of nowhere. Even so, it is unlikely that AI will transform standardized testing overnight.
鈥淚 think it鈥檚 going to come slowly,鈥 said Johnson of ETS. 鈥淢y opinion is that there will be a slow creep of new stuff. Scenario-based tasks. Maybe some personalization will come in. As we get more comfortable with the various [use] cases, you鈥檒l start seeing more and more of them.鈥
Data analysis for this article was provided by the EdWeek Research Center. Learn more about the center鈥檚 work.