Want to know what's going to happen? Look at social media!
That's been a rallying cry among data scientists for a while now. And it certainly has a nice logic to it. In theory, networks like Twitter should be ready-made for predicting disease outbreaks, civic unrest, box-office revenues, and the like. These systems are distributed. They allow information to move organically. And they're open: Anyone can use them, anyone can mine them.
The problem is that making sense of terabytes of data isn't easy, and mistakes can easily slip in. As Google's problems with Google Flu Trends show, the reality of big data doesn't always match the hype. It's hard to map the sum of human communication, and people have reasonable privacy concerns when you try.
That's why a group of researchers took a different approach. Rather than trying to understand the entirety of the network, they focused on local parts of it: "sensor groups" of Twitter-users who were more connected than average. They reasoned that people with more followers would know things sooner, because they were linked with greater numbers of random individuals.
In testing their "sensor hypothesis" theory, the researchers took 2009 Twitter data, and randomly selected 50,000 users as a control group. Then, they selected a friend for each of the 50,000 as the "sensor group" and analyzed how much more quickly they processed new events. The answer was much more quickly.
"We know from network theory that they should be more central in the Twittersphere, and they indeed were," says Nicholas Christakis, one of the authors of the study. "Whereas the average Twitterer has 25 followers, the average person connected to the average Twitterer has 422 followers. When we studied the flow of information in Twitter, we found that it reached the 50,000 'friends' much sooner, on average, than it reached the average person."
Nine days sooner, in fact. Christakis, a professor at Yale, says identifying sensor groups might be a way of providing advance detection, not simply sensing for things happening in real time. "It can predict the epidemic before it strikes the general population, by taking advantage of sensor nodes as a kind of canary in a mine, tweeting before everyone else is hit by the wave," he says.
The paper, published in the journal PLoS One, suggests that local monitoring is "not just more efficient, but also more effective" than whole-network monitoring--though there are possible drawbacks. Smaller groups may be more prone to lobbying than full populations, for example--though that's unlikely to be a problem for, say, flu prediction.
"Public health officials around the world could use our sensor method to mount a quicker and more focused response to health epidemics," Christakis says. "It would give them more lead time to save lives.”
[Image: Sneezing via Shutterstock]