2012-01-12

Co.Exist

The End Of Multiple Choice? The Quest To Create Accurate Robot Essay Graders

What if a computer could accurately grade student essays? It could change the way we test students (and the way they’re taught). And a new $100,000 competition is trying to spark auto-grading innovation.

What’s the best way to prove you "know" something?
A. Multiple choice tests
B. Essays
C. Interviews
D. None of the above


Go ahead: argue with the premise of the question. Oh yeah, you can’t do that on multiple-choice tests. Essays can often better gauge what you know. Writing is integral to many jobs. But despite the fact that everyone can acknowledge that they’re a more useful metric, we don’t demand students write much on standardized tests because it’s daunting to even imagine grading millions of essays. Time for a little technical assistance?

On Monday, the William and Flora Hewlett Foundation said it is creating a $100,000 competition for software that can "reliably automate" grading essays on state tests. That will ultimately be divided into three prizes ($60,000 first place, $30,000 second, $10,000 third). If machines can fairly grade written work, then why not assess students on their writing skills



To many people, the prospect of writing for a machine may seem chilling. Writers are told to "think about the audience." If the audience is a machine, should the prose be stilted? Will your robot judges be able to appreciate a brilliant metaphor?

Then there is the question of whether students are even prepared to be tested on their essays. What helps most writers improve is a chance to write a lot—and to get prompt and detailed feedback. Too many students lack opportunities to pen even prosaic prose during the school day or to get any constructive feedback. "In addition to improving state tests, I want to see more classroom writing," asserts Tom Vander Ark, chief executive of Open Education Solutions, a blended learning service provider that is helping orchestrate the competition. "I think kids should write 1,500 words a week—not a semester," he adds.

First things first: Current vendors need to demonstrate in open trials that their products are fair graders, says Barbara Chow, education program director at the Hewlett Foundation. "There are lots of claims" by software vendors, Chow notes. "To date, there’s no single place that works like a Turing test [i.e., that you can’t tell the difference between a human and robot grader]: Our hope is to provide a neutral, impartial platform on which to make judgments of whether automated scoring can match human raters," she says.



To establish such a base, eight current grading software vendors are spending the rest of January grading more than 19,000 essays. (These vendors include the "usual suspects," say those involved.) Their results will be calibrated against scores awarded by living, breathing teachers.

With that base established, other researchers will have an open field to develop alternatives that can top those scores—and win a prize.

"Prizes are a proven strategy for mobilizing talent and resources to solve problems," says Vander Ark, who previously served as president of the X Prize Foundation, a nonprofit that uses prizes to inspire invention.



Although three months may not seem like a long time to develop sophisticated software algorithms, the Hewlett foundation is literally betting that a prize will coax some folks to refocus existing work to the task of grading essays, notes Jaison Morgan, managing principal of The Common Pool, which is also helping to coordinate the assessment prize program.

The competition will be hosted on Kaggle, a platform for managing data competitions that lets organizations post data and have it scrutinized by scientists. (Edtech test prep company Grockit is already using Kaggle to run a $5,000 competition to devise an algorithm that will predict whether a student will answer the next test question correctly. Some 113 teams are competing.)

Among those closely watching the process are the two organizations charged with establishing a fresh set of assessments for measuring student progress in learning the Common Core curriculum. Those programs are still in the works, but many hope they will not be multiple-choice tests. "One big change [with the Common Core assessments] will be no more decontextualized prompts," Vander Ark says.

Translation: forget about those classic "write 500 words on your summer vacation" essays. (This could be a challenge for students who develop formulaic answers to vague prompts.) Automated essay assessments will measure how well students stay on topic or compare different texts—but they’re not nearly as good at grading the logic of an argument, concedes Vander Ark.



"As education moves to digital and begins to embrace 'big data,' the potential for competitions is extraordinary," Vander Ark says. When kids have "10,000 keystroke days," smart data miners will be able to extract from that "paradata" countless observations about students’ performance, persistence, and other aspects of their motivational profile.

Every choice about testing carries the burden of unintended consequences, acknowledges Vander Ark. "When some states started requiring three-paragraph essays, that wound up being the only writing taught," he says.

The downside to automated essay grading? "If this doesn’t work, then most states in America will be giving multiple choice tests," Vander Ark says. "We need to do better than that."

Add New Comment

11 Comments

  • Robert Cummings

    To further Jacqueline's comment about the CCCC position statement, I am reprinting it here from http://www.ncte.org/cccc/resou... :

    C. Best assessment practice is direct assessment by human readers.
    Assessment that isolates students and forbids discussion and feedback
    from others conflicts with what we know about language use and the
    benefits of social interaction during the writing process; it also is
    out of step with much classroom practice. Direct assessment in the
    classroom should provide response that serves formative purposes,
    helping writers develop and shape ideas, as well as organize, craft
    sentences, and edit. As stated by the CCCC Position Statement on
    Teaching, Learning, and Assessing Writing in Digital Environments, “we
    oppose the use of machine-scored writing in the assessment of writing.”
    Automated assessment programs do not respond as human readers. While
    they may promise consistency, they distort the very nature of writing as
    a complex and context-rich interaction between people. They simplify
    writing in ways that can mislead writers to focus more on structure and
    grammar than on what they are saying by using a given structure and
    style.

    (Please also note: in order to leave this comment, I will have to complete a Captcha. I will "write" to a machine to prove that I am human before my writing will be accepted by a machine. Irony, anyone?)

  • Jacquelinejleigh

    This is what happens when a) politicians and b) grant money drive education.  I would like to refer readers to the well-worded section 2.C of the CCCC Position Statement which can be found on www.ncte.org.  Jackie Leigh

  • Just Do It

    e.e.cummings would have never survived machine grading. The real downside to machine grading is that then everyone will really write to the test. Creativity in writing will go out the window. 

    We do such a bad job of teaching writing now -- my son went through k-12 and I think only twice a teacher actually wrote corrections on the essay and not just at the beginning or end with the grade. We are so paranoid about plagiarism that we spend all of our money on computer engines for that and not enough on giving teachers time to actually grade and comment on papers. Oh yes, and then when students get one of those papers back that has been though plagiarism scrutiny we don't ask kids to correct their papers, but instead hand them back in after looking at them for 5 minutes so that no one will be able to copy them. We need to get real when it comes to writing. MAchine scoring might be fine as a preliminary to teacher grading -- but not the end all and be all ,but that is what it will become.

  • Elizabeth Corcoran

    We'd agree: let's use machine scoring to help kids learn the basics, including basic punctuation, sentence structure and grammar. Machine scoring on standardized tests also means that students will be able to answer with sentences, rather than just fill-in-the-dots. But we'd agree that there will always be a great need for inspirational teachers to help students master the finer aspects of writing!

  • Sherman1

    There are already online programs that accomplish this task very accurately.  They use AI and took years to perfect.  Criterion by ETS and My Access by Vantage Learning are two such programs.

  • Elizabeth Corcoran

    The first part of the Hewlett program is aimed at assessing just how good existing tools such as ETS and My Access are. Prior to this effort, there's been no comparison testing.

  • Jeroen Fransen

    It's a good discussion and one must dream, but why not start with technology ASSISTING teachers instead of replacing them? I see a lot more opportunity in bionic or hybrid software than in full robots for the foreseeable future. Haven't we seen this same evolution in all other fields before?

  • Elizabeth Corcoran

    Separate the two points: should we spend money and time developing better tools for grading essays? How should those tools be used? There's no reason why the tools couldn't be used to assist teachers. You could imagine a scenario where the teacher uses an automated grading tool to grade 2/3 of a the class one week while he/she focuses on the other 1/3 -- and then the next week moves on to a different group. All depends on the implementation.
     

  • Pete Laberge

    Well, most teachers don't teach anyway... now-a-days....

    So... Why should they grade?
    Now, they can just collect their pre-deposited paychecks, and read the comics!

  • Jmbodi

    Most teachers work very hard for their students Pete. Grading tells the students how/if they are achieving at the teacher's standards. Teachers are professionals (some, small %% are incompetent) buy 99+% are the hardest working people I know.