澳门跑狗论坛

Ed-Tech Policy

States Testing Computer-Scored Essays

By Andrew Trotter 鈥 May 29, 2002 7 min read
  • Save to favorites
  • Print
Email Copy URL

Could a computer really be a good judge of student writing?

Pennsylvania education officials say yes. They have tested computerized essay scoring with about 30,000 students. Meanwhile, in Indiana, about 29,000 students are participating this spring in a pilot test of online essay-grading software designed by the Educational Testing Service.

Other states鈥攁nd many educators鈥攁re watching those developments to decide if they should consider using such technology.

鈥淥ne of our goals was to see how online scoring compared to human scoring鈥攖hey both ranked very equally,鈥 said Mary Gaydos, a spokeswoman for the Pennsylvania Department of Education.

Still, some educators and testing experts caution that essay-scoring systems are far from perfect, and that using them to evaluate students on high-stakes exams could be a mistake.

Pennsylvania conducted three pilot tests, from 1999 to 2001, of the Intellimetric essay-scoring system, which was developed by Yardley, Pa.-based Vantage Learning. Students in grades 6, 9, and 11 used the Web-based system to take reading and writing tests.

As it is, the state has no immediate plans to replace paper-and-pencil testing with Web-based assessments, Ms. Gaydos said. She said such a decision would have to consider whether all schools have the computer capabilities to administer such tests.

Indiana is conducting a test this spring of a competing essay-grading tool called the 鈥渆- rater,鈥 which was developed by the ETS, based in Princeton, N.J. High school students whose schools volunteered for the trial were scheduled to take Indiana鈥檚 end-of-course test for English 11 online. That test is a mixture of multiple-choice items and essay questions.

Other states are watching the trial closely.

鈥淲e鈥檙e very excited about the potential鈥 of essay-scoring technology, said Robert Olsen, the head of the online-assessment program for the Oregon Department of Education. Oregon is in the second year of pilot- testing a multiple-choice online assessment. (鈥淭esting Computerized Exams,鈥 May 23, 2001.)

Essay-scoring technology could soon be added to the Oregon system. 鈥淲e are in the process of completing a study in Oregon to verify the reports of the vendor [Vantage Learning] in terms of its accuracy and utility,鈥 Mr. Olson said, 鈥渁nd are very, very seriously looking at implementing it in this state.鈥

The Massachusetts Department of Education has also announced a test of an online writing-analysis tool that uses the Vantage Learning engine through the state鈥檚 鈥淰irtual Education Space,鈥 a Web site devoted to preparing students for state-sponsored assessments.

Testing the Software

If they prove effective, the new tools could have many benefits, some educators and policymakers say. Lessening the reliance on human scorers would reduce costs, for instance, and could help avert a possible shortage of scorers when state and federal mandates strain the capacity of testing programs over the next few years.

Some experts also argue that the tools could help improve online-testing systems that rely on multiple- choice questions, because tests with essay items are generally regarded as a more complete measure of student abilities than tests with multiple-choice items alone.

And online, computer-scored tests can return results to schools almost instantly, helping educators address students鈥 academic weaknesses soon after they鈥檙e spotted. Educators say it often takes months to get the results of paper tests.

ETS Technologies, the for-profit subsidiary of the nonprofit developer of the SAT college-entrance exam, approached the Indiana education department in January of this year and offered to set up a small pilot for online assessment, said Wes Bruce, the department鈥檚 director of the division of school assessment.

Indiana officials asked for a large-scale statewide trial that would use not the Indiana Statewide Testing for Educational Progress, the state鈥檚 high-stakes academic test, but the Core 40, a set of tests that the state has devised to get a sense of how students are performing in core academic courses. Those voluntary tests will become mandatory over the next few years.

鈥淚f you look at our [state educational accountability law], see all of its components, and the timeline for rolling it out, it will become particularly obvious why we piloted online testing this year,鈥 said Mary Tiede Wilhelmus, the communications director of the state education department.

Human vs. Machine

People hired to score student essays typically have a four-year college degree and good writing skills, said Alison Lyden, an official at Data Recognition Corp., a testing company in Maple Grove, Minn. She said scorers, who are paid about $12 an hour, are trained before scoring student essays. And two people usually score each test independently.

Still, officials from the testing-technology companies suggest that the essay-scoring software can match the human scorers.

Generally, the computer scores a student response by comparing it with hundreds of human-scored responses to the same test item. If it looks most like a response that human experts have given, say, a 5 on a 1-to-5 scale, then the machine will assign it a 5.

The Intellimetric engine used in Pennsylvania is prepped by scanning in thousands of test items, said Scott Elliot, the chief operating officer of Vantage Learning, adding that he prefers to have 300 scored responses for each item on a test. 鈥淏y learning the characteristics of 300 typical responses, it can apply that learning to score a novel response,鈥 he said.

Once primed, the software looks for patterns in about 76 different features of the responses, some of which might not be readily discernible to every human scorer, the company maintains.

Some are structural, mechanical elements, such as spelling, punctuation, syntax, and subject-verb agreement. Other features involve content鈥 鈥渃oncepts and relationships among those concepts,鈥 said Mr. Elliot.

鈥淚t ultimately comes down to vocabulary,鈥 he said.

All those patterns, layered together and anchored in the human-scored samples, create an effective scorer, Mr. Elliot argued.

鈥淭he bottom line,鈥 he said, 鈥渋s our engine typically matches [human] experts more often than two [human] experts can match each other.鈥

And, the computer 鈥渄oesn鈥檛 need a cigarette break, doesn鈥檛 need a cup of coffee, and scores the first and last essay the same,鈥 he said.

The essay-scoring engine created by Knowledge Analysis Technologies uses another analytical method, called 鈥渓atent semantic analysis,鈥 that is based on a broader model of English, said Lynn A. Streeter, the business-development officer of the company, based in Boulder, Colo.

It involves creating three lexicons, or collections of words: The first is a general model of English for the typical test-taker, such as a college freshman; the second is words pertaining to the subject of the test; the third is specific to each essay question, she said.

Ms. Streeter claims that having the first 鈥済eneral semantic space鈥 allows the computer to recognize student responses that might be further afield from the average. For example, she said, if the word 鈥渄octor鈥 was consistently used in a sample essay question, 鈥渢hen somebody writes a test essay in which they refer to a dermatologist, in our model we鈥檇 know that it鈥檚 very close to doctor and essentially means almost same thing.鈥

Potential Problems

But the use of essay-scoring software faces some big hurdles before becoming a part of state or federally mandated academic assessments. For starters, the uneven availability of computers and high-speed Internet connections in schools is a problem.

In addition, several studies by Boston College researchers suggest that students perform better on essay tests when the test-delivery method鈥攚hether on paper or computer鈥攊s the same method they use for regular writing assignments.

For now, Ms. Streeter said, machine- scoring of essays is best used to grade practice tests or to help teachers wade through student writing exercises, which would allow them to assign more of them. 鈥淚t should be more about helping a person, than 鈥榶ou flunk,鈥欌 she said.

For example, her company鈥檚 essay-scoring tool is used in a literacy project at the University of Colorado, called 鈥淪ummary Street,鈥 in which students in grades 3-12 write summaries of book chapters they have read. The computer gives feedback on how to improve their writing and concepts they have missed.

Michael K. Russell, a researcher at the Center for the Study of Testing, Evaluation, and Assessment, at Boston College, suggests that essay- scoring software might be best used as a diagnostic tool to analyze student essays to reveal misconceptions about academic topics.

Beyond that, Mr. Russell said, increased use of essay-scoring technologies must first be matched by more use of computers for student writing and classroom learning.

Coverage of technology is supported in part by the .

A version of this article appeared in the May 29, 2002 edition of 澳门跑狗论坛 as States Testing Computer-Scored Essays

Events

Artificial Intelligence K-12 Essentials Forum Big AI Questions for Schools. How They Should Respond鈥
Join this free virtual event to unpack some of the big questions around the use of AI in K-12 education.
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of 澳门跑狗论坛's editorial staff.
Sponsor
School & District Management Webinar
Harnessing AI to Address Chronic Absenteeism in Schools
Learn how AI can help your district improve student attendance and boost academic outcomes.
Content provided by 
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of 澳门跑狗论坛's editorial staff.
Sponsor
Science Webinar
Spark Minds, Reignite Students & Teachers: STEM鈥檚 Role in Supporting Presence and Engagement
Is your district struggling with chronic absenteeism? Discover how STEM can reignite students' and teachers' passion for learning.
Content provided by 

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide 鈥 elementary, middle, high school and more.
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.

Read Next

Ed-Tech Policy Need Guidance on How to Avoid AI Pitfalls? New Resources Aim to Help Schools
The U.S. Department of Education has released new resources for schools on AI that include recommendations on some thorny issues.
4 min read
Photo illustration of teacher using AI for grading.
iStock
Ed-Tech Policy Opinion How to Become an Ed-Tech Visionary Without Really Trying
Beware of PR grifters eager to turn education pros into A-list-worthy celebs. (And read the fine print.)
4 min read
The United States Capitol building as a bookcase filled with red, white, and blue policy books in a Washington DC landscape.
Luca D'Urbino for 澳门跑狗论坛
Ed-Tech Policy Should Schools Have Cellphone Restrictions for Teachers Too?
Schools expect teachers to model responsible cellphone use.
4 min read
Illustration of a young woman turning off her mobile phone which is even bigger than she is.
iStock/Getty
Ed-Tech Policy Here's When Most Americans Think Cellphones Should Be Banned
Banning cellphones during class is very popular with American adults.
5 min read
A student uses their cell phone after unlocking the pouch that secures it from use during the school day at Bayside Academy on Friday, Aug. 16, 2024, in San Mateo, Calif. Gavin Newsom sent letters Tuesday, Aug. 13, to school districts, urging them to restrict students鈥 use of smartphones on campus.
A student uses a cellphone after unlocking the pouch that secures it from use during the school day at Bayside Academy in San Mateo, Calif., on Aug. 16, 2024.
Lea Suzuki/San Francisco Chronicle via AP