My Gmail inbox contains multitudes. There are all the newsletters and event listings and student loan notifications that correctly identify me as a millennial. There are old conversations with friends about life, man, and relationships. My Gchats read like Bukowski, maybe, if Bukowski cut his teeth on Xanga and Livejournal.
Rarely do I connect the ads I see with the content I scan and spit out in emails. But it does happen: Google readily acknowledges that it does use your inbox to help inform ad targeting. But which of my emails inform those ads? Is there some kind of algorithmic intelligence that preys on my most vulnerable moments? Does a machine out there assume that because I’m receiving emails about student loans, I’d like a subprime car loan?
A new project from Columbia University computer scientists suggests that advertisers could be making some very sensitive connections between your inbox, Amazon purchases, or YouTube history and the ads you’d like to see. The tool they’ve developed, called XRay, aims to show which kinds of emails, purchases, or searches bring up specific ads.
In a demo version, researchers Mathias Lecuyer, Roxana Geambasu, Augustin Chaintreau, and several more collaborators already stumbled across some very striking links. "Depression" as a key word in emails, for example, brings up ads about astrology readings and shamanic healing. "Divorce," a divorce attorney. "Loan" brings up an ad for a car loan, bad credit no problem.
"The web is a big black box where [companies] are using people’s data, but people don’t know how they’re using their data. XRay is a system that can reveal some of that," says computer science professor Geambasu. "In effect, it’s creating the possibility for investigators or auditors to keep an eye on what’s happening with users' data."
In order to find which ads show up next to specific content, the researchers created a number of different email accounts to test which messages ("Hi, I'm afraid I may be pregnant... This is really a bad time, I don't know what to do!") might yield. XRay’s greatest strength—and its biggest breakthrough—is that it can do this rather efficiently. Its computing means you can test several triggers—"pregnancy," "debt/broke," "cancer"—with relatively few accounts.
That said, the XRay researchers are very clear: XRay does not show causation. Just because they found that email accounts sending out messages with the word "depression" also receive shamanic healing ads, it doesn’t mean that advertisers or Google are necessarily targeting those ads to depression. For now, XRay just shows correlation—and some stronger correlations than others. And this doesn’t mean that humans are reading your email. Algorithms do the dirty work.
XRay could nevertheless reveal the potential for abuse. The subprime car loan connection to someone with financial troubles is a disturbing example. Or, let’s say someone sends a friend an email about trying and failing different diets, and later he or she is served an ad for an ice cream discount. If it’s true that advertisers are preying on people with yo-yo-ing eating habits, that would make a very sad and manipulative state of affairs.
Take that hypothetical to the next level, and consider what happens when you click on that ice cream ad and enter in your personal information. Somewhere, a data miner’s algorithm might have just categorized your identity as someone with food issues. It’s not that much of a stretch. A Senate report earlier this year found that data brokers—the people who harvest personal details from the web and outside it to help marketers target certain populations—were lumping people’s data profiles into categories like "fragile family" or "ethnic second-city strugglers."
Right now, the XRay team is working with several investigative journalists and watchdogs to help XRay expand its capacity. One day, maybe they’ll create a tool anyone can use to test their hypotheses. In the meantime, they’re working on making the tool more accurate, moving from mere correlation to showing that something or someone truly is targeting an ad to a keyword in your inbox.
"The link between correlation and causation is often hard to jump, but we believe we may be able to do that," Geambasu says. "And once we’re at causation, we need to figure out a way to break the big black box and try to assign the targeting cause to either advertisers or the Google algorithms."