Doctors have just discovered a previously unknown relationship between the long-term recovery of spinal cord injury victims and high blood pressure during their initial surgeries. This may seem like a small bit of medical news—though it will have immediate clinical implications—but what's important is how it was discovered in the first place.
This wasn’t the result of a new, long-term study, but a meta-analysis of $60 million worth of basic research written off as useless 20 years ago by a team of neuroscientists and statisticians led by the University of California San Francisco and partnering with the software firm Ayasdi, using mathematical and machine learning techniques that hadn’t been invented yet when the trials took place. The process was outlined in a paper published today in Nature Communications, and hints at the possibility of medical breakthroughs lurking in the data of failed experiments.
"What was thought to have been a boondoggle turns out to have great value," says Adam Ferguson, a principal investigator at UCSF’s Brain and Spinal Injury Center and one of the paper’s authors. Just how much is unclear until trials are conducted in humans, but the finding raises several interesting questions—notably whether scientists should publish their raw data for posterity and whether their time and funding would be better spent poring through old experiments than conducting new ones.
Ferguson’s team began by meticulously reconstructing data from multiple studies comprising some 3,000 animals, including more than 300 from the Multicenter Animal Spinal Cord Injury Study conducted at Ohio State University in the mid-1990s. Rather than draw on only published results, he and his colleagues contacted each researcher and asked for unpublished data and lab notes as well. "They were very cool about this," says Ferguson. "A lot of scientists in other disciplines wouldn’t be—they’d feel like you were auditing them."
And perhaps for good reason. A paper published in The Lancet last year estimated less than half of all findings make it into print, with the remainder comprising a "long tail of dark data" that may hold the key to science’s reproducibility crisis. Spinal cord injury researchers are facing a crisis of their own. Twenty years after Christopher Reeve’s paralysis shone a spotlight on their field, there haven’t been any breakthroughs. "There are no drugs," Ferguson says. "It doesn’t have any real, agreed-upon therapeutic approach. That’s embarrassing. We should have something, at least."
Instead, they have failures. One reason is the sheer number of variables. Spinal cord injuries are enormously complex and thus still poorly understood compared to other systems. Efforts to isolate simple causal mechanisms have proven elusive, "and that’s a real threat to discovering new therapies," says Ferguson. So he and his team thought to test old, dark data again, this time using techniques designed for uncovering hidden relationships between large numbers of variables.
Their tool of choice was topological data analysis (TDA), a technique developed by Stanford mathematician (and paper coauthor) Gunnar Carlsson, using concepts from geometric topology—the study of highly complex shapes—to find patterns hidden in large datasets. Carlsson is also president of Ayasdi, the firm he cofounded to combine TDA with machine learning techniques to probe datasets for relationships between variables. (Ayasdi is one of Fast Company’s Most Innovative Companies in Big Data.) Before Ferguson had thought to use it for probing spinal cord injuries, Carlsson and others researchers had successfully employed TDA to find a unique mutation in breast cancers hiding in data sets that had been publicly available for more than a decade.
What sets Ayasdi apart from traditional competitors is its black box model: The software searches for patterns without human supervision (or bias) before rendering the results as a network diagram of variables for further analysis. "It’s the reverse of traditional hypothesis-driven science," says Ferguson. "We could never have found this correlation with hypertension using traditional tools, because with thousands of variables to test, it would have never occurred to us."
Does this mean that the process of discovery is over? That all new ideas will come from machines probing data and not from human ingenuity? While he rejects this "end of theory" idea as overblown, Ferguson does believe the first step in the scientific method—observation—has been radically complicated by Big Data and ripe for machine mediation. Or as Ayasdi CEO Gurjeet Singh told me earlier this year, "Traditionally, you have to be lucky, and then you have to have a stroke of insight. But the probability of being lucky is lower and lower over time, so you need these systems to do that work for you."
In the case of the spinal cord injury data, Ayasdi’s TDA-driven approach mostly confirmed what researchers already knew: The drugs didn’t work. But the discovery of high blood pressure’s detrimental effects on long-term recovery has immediate implications for human patients, namely whether the use of hypertension drugs immediately after their injuries and before surgery could improve outcomes, a hypothesis Ferguson and colleagues intend to test shortly at UCSF.
In the long run, Ferguson believes retroactive data mining is "a worthwhile approach," especially considering how much less expensive it is to sift old data again than run new trials. "For a little more than a million dollars, we’ve opened $60 million worth of value."