How Do We Measure What Really Counts In The Classroom?

A new generation of assessment tools are hoping to piggyback off the wealth of online rating software to find a better and more efficient method of assessing students.

The world is caught up in an Information Age revolution, where we are all evaluating products, restaurants, doctors, books, hotels, and everything else online, but education has not yet moved past the standardized assessment, which was invented in 1914. Frederick Kelly, a doctoral student in Kansas, was looking for a mass-produced way to address a teacher shortage caused by World War I. If Ford could mass produce Model T’s, why not come up with a test for "lower order thinking" for the masses of immigrants coming into America just as secondary education was made compulsory and all the female teachers were working in factories while their men went to the European front? Even Kelly was dismayed when his emergency system, which he called the Kansas Silent Reading Test, was retained after the war ended. By 1926, a variation of Kelly’s test was adopted by the College Entrance Examination Board as the Scholastic Aptitude Test (SAT). The rest is history.

So when Kyle Peck (from Penn State) and Khusro Kidwai (of the University of Southern Maine) demoed their nonprofit, free eRubric assessment tool at Duke recently, we were all surprised at the flexibility it allowed, in a customizable and highly automated form.

An art history teacher and a prof teaching geographical information systems were both beta-testing it to grade essay and short answer exams to hundreds of students. eRubric allowed them to assess everything from the accuracy of the specific content on individual answers to logical thinking, verbal expression, imaginative thinking-outside-the-box application of the material—in other words: originality. In a different kind of assignment, the professors might have added categories for collaborative work, or the ability to take an idea from beginning to conclusion of a project—the kinds of skills good teachers discover but rarely have a chance to test, measure, or provide any good feedback on, especially if there are 90 or 400 students in a course. The eRubric allows anyone evaluating others the ability to customize the categories to be evaluated, to weight the individual categories differently on different assignments, and could be used in informal or formal education, from kindergarten through college and beyond, and with applications for any Human Resources department at any corporation too.

That’s just the beginning. If a teacher wished, she could even begin the first day of class with a blank eRubric and have students, together, write the categories and the feedback for each category together. They would then know, on each challenge or test or essay they were given, how they would be judged, the terms of the assessment that would, in the end, determine their grade. All research on assessment shows we learn more if we understand, participate in, and agree with the basic learning or work goals we’re aiming at. An investment in outcomes that research shows improves learning.

With eRubric, the teacher decides, on any assignment, which categories apply and how to weight them. When the test papers, problem sets, or essays come in, the teacher reads and then clicks each category box to generate complex feedback in each category. eRubric allows the teacher to write an individual comment in each category or on the whole assignment if he thinks the pre-written comment could use more precision. eRubric then automatically sends all of this feedback (probably a page-long assessment in the end) to the student in an email: summary grade, break down, general comments, specific comments. A week or so later, eRubric sends any student who hasn’t opened the assessment document a reminder email and sends one to the professor indicating whether or not the student has bothered.

When we hear every year that the U.S. has fallen in the OECD rankings to, say, 14th in reading, 17th in science, and 25th in math in the world, as we did in 2010, we’re always alarmed. Isn’t that a problem? It may well be, but it the problem is far more complex. Americans use standardized tests earlier and more often than any other nation on the planet. Research shows that high stakes, after-the-fact or end of grade, multiple choice testing has little impact on learning motivation and even little real quantitative relationship to content mastery.

When I talk to corporate trainers, they insist that, in this job market, they can hire the smartest students in the country, those who have had the highest grades through the entire school system. But because the No Child Left Behind national law began requiring the standardized tests for all students since 2002, it takes them one to two years to retrain these great students not to think in terms of single-best-answer (multiple choice) options. They have to make them "unlearn" the skill of guessing the best answer from five available ones (a pretty useless skill in the workplace), and begin to "relearn" how to think about what they do or don’t really understand about a situation, who to go to in order to find out, and what they need to do to have the best results. In other words, whether we are 1st or 17th, we’re failing at testing what we really value in the workplace. There is an extreme mismatch between what we value and how we count.

On September 20 and 21st, the 30 recipients of grants from our MacArthur Foundation-Gates Digital Media and Learning Competition will be meeting at Duke to show off how far they have gotten on the badging systems they are creating. One institutional representative and one software systems developer from each team will be there to demo, discuss, learn, and innovate in a group un-conference. The institutions include Intel, the Department of Veterans Affairs, Disney, the Smithsonian Museum of Natural History, the Girl Scouts, 4 H, Carnegie Mellon, the Urban Affairs Coalition, Microsoft, Boise State University, and several K-12 schools and teachers groups. All are working to find systems that—like eRubric—allow for real-time feedback, peer-contribution to an evaluation system, flexibility, and customizability—all of which inspire learning. They are also looking for ways that their systems can be automated and provide enough consistency that they are meaningful in comparing results within, between, and across institutions.

Standardized testing is our past—but it doesn’t have to be our future. We’re hoping that pioneers like the developers of eRubric or those who are coming together this week at Duke from institutions large and small can pioneer systems that work better for our age, taking advantage of the technology we now have. In this, they have two decades of work by the worldwide community of web developers who have already developed peer-awarded badging systems (on Top Coder, Stack Exchange, and other online accreditation sites) that they use when finding collaborative partners on which their systems and livelihood depends. If computer programmers can figure out reliable systems for rewarding everything from Python coding skills to "fire starter" ability to breathe creativity into a project when everyone else is stuck, so can our schools. Soon, we may well have automated, easy, teacher-friendly, student-inspiring assessment systems that actually measure what we value and count the kind of knowledge and thinking that really do count in the classroom and in the real world.

Add New Comment


  • Frances

    Yes!   Let students in on assessment criteria up front, especially when the assessment is formative.   More importantly, let them participate in designing the criteria.

  • Chris

    Great article. Amassing data is of little use if you do not
    know how to use it or what to measure. Standardized tests succumb to the limitations
    enumerated in this article, specifically because they only measure a sample of
    a student’s intelligence rather than holistically measuring a student’s real
    time performance.

    Hatch is an early education company that specializes in
    developing literacy and mathematical competence. If a student is unable to
    establish these fundamentals then they are highly likely to remain behind their
    peers throughout their life. Unfortunately, research by Susan Landry of The University of Texas System’s
    Health and Science Center and James Baker III of Rice University details that
    an overwhelming number of children enter progress through their K-12 education
    without a firm basis in reading and math skills. Hatch offers classrooms
    tablet and whiteboard solutions to measure student development in real time while
    the students feel that they are at play. The software in these technologies sifts
    the data into 18 core competencies and provides a digital report card analysis to
    teachers and administrators for additional analysis.

    As the articles points out, access to a comprehensive set of
    real time data allows for robust analization and allows for the compilation of
    a student profile over time. Rather than perpetuating the use of systems that
    standardize intelligence and skill assessments, teachers and schools should
    advocate progressive and adaptive measurement metrics that focus on what really

  • Nev

    Good to read this, even as someone outside of the US system.
    It's encouraing to the rest of us when someone knowledgeable can wax so enthusiastic on a subject.  But do you have shares in eRubric? ;-)

    Also, can you give share any pointers on how to dig deeper into "Research shows that high stakes, after-the-fact or end of grade, multiple choice testing has [...] little real quantitative relationship to content mastery."


  • Kyle Peck

    Hi, Nev. 

    To address your first question, no.  Cathy and our project met through a response I made to one of her blog posts about digital badging, and the tool she describes is being developed as a open source tool that will soon be offered at no cost to anyone interested in using it. There are no "shares" in the conventional sense, although we are looking for collaborators to invest "sweat equity" and intellectual capital by participating in the design through beta testing.  We hope others will adapt the work to fit different contexts, and we hope that they, too, will share their work to elevate assessment and learning.

  • Scott

    I have arrived at the conclusion that standardized tests are not solely about the students. Firstly,l they are about distrust that students are leaning what "we" as teachers are expected to teach. It appears, at least in part, that there is a deep seated distrust between those in administration and teachers - those on the front line, in the trenches. Secondly, they are about presenting a positive picture to the public - not a real one, but a positive one. 

    Too often it becomes a blame game between those at "higher" levels of administration all the way up to the State, and Federal levels of government. There too frequently is a negative tension between teaching and government. It sometimes seems like there is a "scapegoat" approach to much of the process.

  • Jess

    Thanks for posting! One of my favorite quotes is by Lord Kelvin, "If you cannot measure it, you cannot improve it," and it drives a lot of my work trying to find ways to measure what we find important. 

    While I agree there is a need for innovation in assessment of student learning, and the importance of connecting teaching and learning to assessment, I feel that your ire of standardized tests is a bit mis-directed. Aside from the fact that standardized testing pre-dates the Kansas Silent Reading Test; I think Kelly's contribution was the mass use of multiple-choice. We should be skeptical of the way that standardized tests have been used in the past. That doesn't mean that the tests are poor, but the implementation may be. Or perhaps we should examine the connection between what is taught and what is tested - the validity of the inferences that we can make from a test. Standardized tests are tools that should be used when there is a fit, I agree that we are over-reliant on them, but that doesn't mean they don't serve a purpose. I would take it a step further and state that you are arguing for a standardization of measurement through the use of rubrics. The scale is smaller, true, but you're still creating a metric that will be applied across a group. The SAT started with the same purpose - a way to standardize the measurement of learning across groups of people. I do agree that rubrics are an essential tool to communicate expectations, measure learning, and provide feedback, and I'm glad to see they are gaining momentum. 

  • fernside

    As she has done so often, Cathy Davidson  has produced a clear and compelling statement on an issue that is second to none in affecting our ability to make our schools work better for our kids. No matter what is said by anyone who has influence in shaping the nature of public education, what really matters is indeed what they count.  While I hate to pile onto to our politicians who are feeling the wrath (that seems rather justified to me!) of many with regard to their performance I think it is clear that what they value is indeed the test score.  For them a  test score is a test score is a test score.  While there are some exceptions for the most part the issue of what the test is really measuring  is not something they seem to worry much about.  Certainly it is ok to use other tests and means of assessment but the test score is what counts.  So to enable us to address this one of two things needs to happen. Either the holy spirit has to descend upon them and enlighten them or we need to find better words to communicate with them and to make the case that most certainly needs to be made.

  • ibor

    "Standardized testing is our past--but it doesn’t have to be our future."  A breath of fresh air! For 47 years of teaching I found that I could easily get along without standardized testing.  I was teaching in colleges and universities.  Philosophy to boot.  But standardized testing was usually going on around me.  If we get rid of standardized testing maybe we could get rid of grades.  I found I could do that too!