False tweets: Sometimes they’re funny parodies, sometimes they’re expensive, but mostly they’re a borderline dangerous nuisance. Both the potential danger and quantity of tweets that spread misinformation increase during emergencies, when the anxious desire to plum Twitter’s real-time stream can short-circuit any effort to assure a given tweet’s veracity.
It’s not a problem unique to Twitter—there has always been confusion in the immediate aftermath of major news events (see, for example, Columbine)—but Twitter can accelerate even the flimsiest falsehoods’ "viral" spread. Fortunately, according to researchers at the Indraprastha Institute of Information Technology (IIIT), while they’re easy to spread, falsehoods on Twitter are also easier to spot. So easy, in fact, they may soon be spotted by a plug-in you can install in your Internet browser.
"We are in the process of building the browser plug-in," says IIIT Ph.D. student Aditi Gupta. "We should have it ready in four to five weeks."
The plug-in is currently being developed for Chrome, before being adapted for other browsers. It’s functionality is based on work that Gupta and her colleagues have done on disaster-tweeting, focusing on the Boston marathon bombing and at Hurricane Sandy. In an analysis of the Boston bombing, they found that 29% of the "most viral" content on Twitter were rumors and misinformation. No surprise there. But by constructing a "decision tree classifier" that looked at attributes of tweets themselves (rather than the tweeters) and learned over time, they were able to filter out that misinformation with an astonishing success rate of 97%.
"The 3% of the tweets were misclassified were very different from other fake tweets," says Gupta. They had strange word structures and garbage text. For example: "RT @karwesty: â€œ@DanBrechlinRJ: Whaaat! â€œ@Shawn_930: â€œ@irishgoldengirl: Sharks in New Jersey streets! #Sandyhttp://t.co/U1OOvMOF"" @RyanB ...:" This kind of unicode-character-filled (bot-generated?) tweet may fool an algorithm, but it appears unlikely to fool Twitter-using humans.
Then again, the factors that caught 97% of highly retweeted fake tweets seem fairly blunt as well: length, negative words, uppercase letters, and "presence of punctuation and exclamation marks."
The browser plug-in will apply similar techniques, using different classification schemes for tweets that are purely text, and those that link images and videos. Those doubtful that such a classification system, created from past news event, will apply to the next, take heart: They plan to update the classification model each month, based on "new kinds of rumor and fake content data."