Every time you visit a website, turn on an app, or go through a subway turnstile, you're offering up personal data. Often, this data is sold on and re-used by a third party, and most of the time we have no idea what's going on, even if we've consented to the transfer.
To help make sense of these hidden processes, researchers have created theDataMap, an expanding map of personal data flows. It shows how more than 2,000 organizations collect personal information and, in turn, how that information is then passed on.
"We live in a time of 'open data' when people are supposed to share their data freely," says Latanya Sweeney, who leads the project. "But it's free and open only when it gets into the vaults of private corporations. You don't know what they do with it. You don't know to whom they sell it or share it. And yet, the data can come back and harm you."
The map, which currently shows health and mobile data flows, has grown through several research projects and public interest releases. These include "discharge" data (state-collected data covering visits to hospitals and clinics, including diagnosis codes and billing details), and data covering what you give up when you visit popular mobile apps.
Sweeney, director of the Data Privacy Lab at Harvard University, now wants to expand the mapping by getting the public involved. She recently won a $440,000 grant from the Knight Foundation, which will go towards organizing visualization contests and "data detective" challenges.
"What we're looking to do now is expand it through contests and tools, so we can enlist the public to help us document more of these data flows," she says. "We think there's a lot more to learn about this information."
Sweeney's team will start putting raw data online this month. Winners of the challenges will get paid trips to Washington, D.C., where they'll be able to present findings to regulators, journalists and advocates. There's a need to help privacy defenders do their jobs. "These are the very people who would historically protect us from harm, if only they knew what data sharing was happening," she says.
Sweeney started focusing on health data in 2010 as the Obama Administration, armed with stimulus money, looked to standardize electronic medical records. Working on an oversight committee, she realized there was little transparency about how personal data was being passed around the health system. "There was just such a lack of knowledge about who was holding the medical data. The committee was constantly saying 'Well, we don't know who holds that,'" she says.
An investigation of discharge data from the state of Washington revealed that a lot of apparently anonymized information could be easily linked to individuals. Meanwhile, the study of 110 popular apps showed that most shared at least names and e-mail addresses with third parties (particularly Android apps). For example, the Drugstore.com Android app transmitted birthday, email, gender, name, password, ZIP, medical info, and username to Drugs.com; medical info to doubleclick.net, googlesyndication.com, intellitxt.com, quantserve.com, and scorecardresearch.com; email and username to google.com; and address, email, and names to googleapis.com.
"We want to make data sharing open. It shouldn't just be open data, it should be open data sharing, so the historical protections people have had can be restored," Sweeney says.