Current Issue
This Month's Print Issue

Follow Fast Company

We’ll come to you.

4 minute read

The End Of Multiple Choice? The Quest To Create Accurate Robot Essay Graders

What if a computer could accurately grade student essays? It could change the way we test students (and the way they’re taught). And a new $100,000 competition is trying to spark auto-grading innovation.

What’s the best way to prove you "know" something?
A. Multiple choice tests
B. Essays
C. Interviews
D. None of the above


Go ahead: argue with the premise of the question. Oh yeah, you can’t do that on multiple-choice tests. Essays can often better gauge what you know. Writing is integral to many jobs. But despite the fact that everyone can acknowledge that they’re a more useful metric, we don’t demand students write much on standardized tests because it’s daunting to even imagine grading millions of essays. Time for a little technical assistance?

On Monday, the William and Flora Hewlett Foundation said it is creating a $100,000 competition for software that can "reliably automate" grading essays on state tests. That will ultimately be divided into three prizes ($60,000 first place, $30,000 second, $10,000 third). If machines can fairly grade written work, then why not assess students on their writing skills



To many people, the prospect of writing for a machine may seem chilling. Writers are told to "think about the audience." If the audience is a machine, should the prose be stilted? Will your robot judges be able to appreciate a brilliant metaphor?

Then there is the question of whether students are even prepared to be tested on their essays. What helps most writers improve is a chance to write a lot—and to get prompt and detailed feedback. Too many students lack opportunities to pen even prosaic prose during the school day or to get any constructive feedback. "In addition to improving state tests, I want to see more classroom writing," asserts Tom Vander Ark, chief executive of Open Education Solutions, a blended learning service provider that is helping orchestrate the competition. "I think kids should write 1,500 words a week—not a semester," he adds.

First things first: Current vendors need to demonstrate in open trials that their products are fair graders, says Barbara Chow, education program director at the Hewlett Foundation. "There are lots of claims" by software vendors, Chow notes. "To date, there’s no single place that works like a Turing test [i.e., that you can’t tell the difference between a human and robot grader]: Our hope is to provide a neutral, impartial platform on which to make judgments of whether automated scoring can match human raters," she says.



To establish such a base, eight current grading software vendors are spending the rest of January grading more than 19,000 essays. (These vendors include the "usual suspects," say those involved.) Their results will be calibrated against scores awarded by living, breathing teachers.

With that base established, other researchers will have an open field to develop alternatives that can top those scores—and win a prize.

"Prizes are a proven strategy for mobilizing talent and resources to solve problems," says Vander Ark, who previously served as president of the X Prize Foundation, a nonprofit that uses prizes to inspire invention.



Although three months may not seem like a long time to develop sophisticated software algorithms, the Hewlett foundation is literally betting that a prize will coax some folks to refocus existing work to the task of grading essays, notes Jaison Morgan, managing principal of The Common Pool, which is also helping to coordinate the assessment prize program.

The competition will be hosted on Kaggle, a platform for managing data competitions that lets organizations post data and have it scrutinized by scientists. (Edtech test prep company Grockit is already using Kaggle to run a $5,000 competition to devise an algorithm that will predict whether a student will answer the next test question correctly. Some 113 teams are competing.)

Among those closely watching the process are the two organizations charged with establishing a fresh set of assessments for measuring student progress in learning the Common Core curriculum. Those programs are still in the works, but many hope they will not be multiple-choice tests. "One big change [with the Common Core assessments] will be no more decontextualized prompts," Vander Ark says.

Translation: forget about those classic "write 500 words on your summer vacation" essays. (This could be a challenge for students who develop formulaic answers to vague prompts.) Automated essay assessments will measure how well students stay on topic or compare different texts—but they’re not nearly as good at grading the logic of an argument, concedes Vander Ark.



"As education moves to digital and begins to embrace 'big data,' the potential for competitions is extraordinary," Vander Ark says. When kids have "10,000 keystroke days," smart data miners will be able to extract from that "paradata" countless observations about students’ performance, persistence, and other aspects of their motivational profile.

Every choice about testing carries the burden of unintended consequences, acknowledges Vander Ark. "When some states started requiring three-paragraph essays, that wound up being the only writing taught," he says.

The downside to automated essay grading? "If this doesn’t work, then most states in America will be giving multiple choice tests," Vander Ark says. "We need to do better than that."