2012-09-05

Co.Exist

Why Our Definition Of "Failure" Gets An F

The accountability movement is quick to label teachers and schools "failing" based on test scores and other external benchmarks. But details suggest we may be missing something.

American schools have spent close to $4.4 billion on testing in the past decade thanks to No Child Left Behind. The ideal of the accountability movement was to impose some quality standards where few existed: to sanction schools, and sometimes individual teachers, whose students don’t perform up to par. Schools labeled as "persistently low-performing"—based on the results of standardized, multiple-choice tests—may be automatically subject to closure, mass firings, or the removal of the principal. Sounds objective, right?

Two recent articles have pointed out the absurdities that sometimes result when "failure" is defined this narrowly. Kristina Rizga in Mother Jones spent a year embedded at Mission High in San Francisco and found passionate and dedicated teachers helping excited and engaged students overcome difficult backgrounds. Dropout rates are down 75% at this school; almost nine out of 10 go on to college. But it’s on the list for restructuring or possibly even closure, because according to the Department of Education, it’s among the lowest-performing 5% of schools in the nation. This is based on tests taken in English by a majority first-generation population, one in five of whom arrived in this country within the past two years.

Similarly, the Washington Post's education blogger talked to Carolyn Abbott, labeled the "worst 8th grade math teacher in New York City" according to her students’ scores on state math tests.

Here, the story behind the story is equally muddy. Abbott taught the same students the year before, in seventh grade, and they scored at the 98th percentile of New York City students. That meant that as eighth-graders the next year, they were predicted to score at the 97th percentile.

But by eighth grade her students were learning math that was at least three years ahead of the concepts presented on the state test. And the exam was low-stakes—they took it at the end of the year, after they’d already gotten into high school. The smarty-pants scored at only the 89th percentile on the test, meaning they’d fallen far short of expectations—even though at the same time, 100% passed the more rigorous Regents Exam in Algebra, the material they were actually learning. The conclusion of the state’s formula was that Abbott had not "added value."

"It was humiliating," Abbott said, to be labeled a failure in the New York Times and elsewhere, where the list was published. She decided to leave teaching.

All of this doesn’t mean that the quest for accountability should be abandoned, but our nation’s love affair with the multiple-choice bubble test clearly needs to come to a close after 70 years. New tools like Kickboard Wireless Generation, and ClassDojo have the potential to help teachers capture far more fine-grained data about students’ day-to-day performance and interactions, which besides being good for teaching, could eventually be made the basis of a more nuanced picture of what happens in schools. But this data needs to be paired with nuanced judgments by politicians and the public.

Add New Comment

5 Comments

  • Steve Benfield

    I do have a degree involving statistics -- decision science. And I build systems to track performance of teachers and students--and to enhance achievement. The specific case that the author mentions here is one that clearly shows that evaluating things based on pure numbers is problematic. In addition, because we are dealing with essentially the entirety of society in our school systems, multiple districts, non standardized data, let alone metadata, the problems of objective analysis through systems is immense. I do agree with the article that we should not just give up -- but the public needs to know that it isn't just a 'run a report and compare school A to school B' type of thing. Each school has a different demographic makeup, different levels of parental engagement, different transient populations, etc. Then you look at curriculum - different per state, different per school district. No standardization on supplemental curriculum or assessments. Different funding sources per school which dictates different purchasing habits and abilities. The systems you talk about don't collect 'subjective data' -- although they certainly should. Student scores on assignments, tests, attendance #s, behavior incidents, etc. are not subjective data. They measure how the student is doing. But what the systems will have a very hard time measuring are things like student motivation, home life, and background. I very much understand the #s and issues here and it sounds like  does as well. 

    In the end, just like any measurement in the real world, we are going to have to decide what level of  type II error we are willing to live with for any measurement system. And what the appeals process will be to uncover egregious problems like the one mentioned in this article.

  • Erin Quinn

    I'm a teacher from Canada. I wish the public were more informed about what really happens in education, and what good LEARNING is. In more cases than not, schools are compared using data culled from high stakes testing. This is an incredibly narrow view of learning. In fact, it tells us VERY little about learning. Multiple choice tests will assess mainly knowledge-based skills, and most easily the learning that happens at the bottom of Bloom's taxonomy. The most complex, meaningful learning happens in the top of Bloom's taxonomy - judgement, creativity, and evaluation. These skills are very difficult to test with multiple choice tests.

    If you are interested in reading about how to REALLY transform education, you need to read this book: http://store.tcpress.com/08077...

  • Jen Meadows

    Maybe we should evaluate whether teaching to a test is the best means to educate our youth. We take the right brain out of the school system in the form of physical education, art, and music and expect our students to perform above average. As others have noted, teachers have been shouting for years about tests scores that do not accurately depict information retention. When you have teachers who strive to make sure they connect with as many students as possible and all learn as well as they are able, it becomes a race to catch up on what needs to be taught to the test.

  • Christopher Burd

    I would take articles like this more seriously if I was convinced the authors were truly committed to evaluating teachers and schools and merely wanted to improve the methodologies. But the writing gives off a vibe of opposition to any strong accountability by teachers and schools.

    "New tools like Kickboard Wireless Generation, and ClassDojo have the potential to help teachers capture far more fine-grained data about students’ day-to-day performance and interactions, which besides being good for teaching, could eventually be made the basis of a more nuanced picture of what happens in schools."

    Something tells me these "new tools" will capture rather subjective information that is too "nuanced" to be easily comparable. That is, they may (perhaps) be useful for evaluating individual students' strengths and weaknesses, but not evaluating teachers or schools against each other.

    The problems with standardized testing that the author mentions are real. There are ways of remediating them. For example, in interpreting results one needs to take student background (class, income, culture, family structure, etc.) into account. No decisions (for students, teachers, or schools) should be made on the basis of a single test. Tests should be geared to actual curricula (which vary) not a single ideal curriculum (but curricula need to be evaluated against each other; some are easier). None of this is advanced thinking. Anyone with an undergraduate knowledge of statistics will understand the issues. Unfortunately, that doesn't include many politicians and journalists. Sorry to blunt, but if you can't discuss the issue at that level, you're not really discussing it.

  • dlaufenberg

    Your opening lines... "The accountability movement is quick to label teachers and schools "failing" based on test scores and other external benchmarks. But details suggest we may be missing something."  I believe that teachers have been screaming this from the rafters since the get go.  Ignoring their voices all this time, while the media attacked their professionalism is going to be a rift that will take years to patch.

    After watching this trend develop over the span of my teaching career, which started just prior to NCLB, I agree with you that a more nuanced vision of school and learning is much overdue.  In the reform push, reform has often only translated to urban reform.  This leaves places like rural America and small town America out of the game.  In my opinion, assessment should be based at the school and district level, accountable to those they serve.  This fervor to publish scores and chastise accomplishes nothing productive.  We need empowered students, teachers, and administration.  Building a culture of fear, intimidation and threats are not going to yield the deep thinking, risk taking, problem tackling students and classrooms we need.