Imagine you’re moving apartments and shopping for new furniture at a couple of stores. You see a couch you like, but you’re not sure, so you leave thinking maybe you’ll return another day. But that couch doesn’t take well to rejection. It gets up, leaves the store, and starts following you around as you shop elsewhere and even after you go home having purchased a different couch. Then you start getting offers in the mail for new mattresses.
This is basically people’s experience on the Internet today—where innocently clicking on a link results in ad targeting that’s hard to shake and our purchases quickly reveal more information than we intend, such as the infamous example of Target knowing a woman is pregnant before she’s told her family—and before she's purchased any baby products.
From the credit card offers we receive to recommendations we see on Netflix and posts we see on Facebook, ads and marketing are the obvious example of our personal data being aggregated and analyzed to make predictions about us. National security—facilitated by massive and sometimes illegal data collection by the government—is clearly another. And if you work at a large information-based business, you’ve no doubt heard the terms "big data" and "bottom line" in the same sentence before.
But these (mostly) benign examples that we encounter every day hide the truth about what large-scale government and corporate data collection means and where it's being used. Predictions about you (and millions of other strangers) are starting to deeply shape your life. Your career, your love life, major decisions about your health and well-being, and even if you end up in jail, are now being governed in no small part by the digital bread crumbs you've left behind—many of which you don't even know you've dropped in the first place.
Cities have long seen the potential in big data to improve the government and the lives of citizens, and this is now being put into action in ways where governments touch citizens' lives in very sensitive areas. New York City’s Department of Homelessness Services is mining apartment eviction filings, to see if they can understand who is at risk of becoming homeless and intervene early. And police departments all over the country have adopted predictive policing software that guides where officers should deploy, and at what time, leading to reduced crime in some cities.
In one study in Los Angeles, police officers deployed to certain neighborhoods by predictive policing software prevented 4.3 crimes per week, compared to 2 crimes per week when assigned to patrol a specific area by human crime analysts. Surely, a reduction in crime is a good thing. But community activists in places such as Bellingham, Washington, have grave doubts. They worry that outsiders can’t examine how the algorithms work, since the software is usually proprietary, and so citizens have no way of knowing what data the government is using to target them. They also worry that predictive policing is just exacerbating existing patterns of racial profiling. If the underlying crime data being used is the result of years of over-policing minority communities for minor offenses, then the predictions based on this biased data could create a feedback loop and lead to yet more over-policing.
At a smaller and more limited scale is the even more sensitive area of child protection services. Though the data isn’t really as "big" as in other examples, a few agencies are carefully exploring using statistical models to make decisions in several areas, such as which children in the system are most in danger of violence, which children are most in need of a trauma screening, and which are at risk of entering the criminal justice system.
In Hillsborough County, Florida, where a series of child homicides occurred, a private provider selected to manage the county’s child welfare system in 2012 came in and analyzed the data. Cases with the highest probability of serious injury or death had a few factors in common, they found: a child under the age of three, a "paramour" in the home, a substance abuse or domestic violence history, and a parent previously in the foster care system. They identified nine practices to use in these cases and hired a software provider to create a dashboard that allowed real-time feedback and dashboards. Their success has led to the program being implemented statewide.
Dating apps get popular when they are actually connecting people, so it's obvious that their systems usually try to show you matches based on some formula that accounts for the person you say you prefer, what your swipes and clicks reveal, and how others behave. Apps surely increase the number of strangers you can meet, but in the quest for love, research shows that all the work of their matching algorithms are mostly meaningless. You still need to work hard to find the right person, because a formula can’t account for all the uncertainty and individuality about what finding a lasting relationship requires.
But while they don't have the magic formula for creating love, dating sites are still shaping the romantic lives of the growing portion of the population that use them. Consider that Tinder has an internal rating of how desirable you are. If you're getting a lot of swipes, you won't be shown as frequently to give other people a chance. Another app, Coffee Meets Bagel, guides users to people of their own race or ethnicity, even if they say "no preference" on their profile. Partly, they do this because of what their data reveal: even when users say they have no preference, in private, people gravitate to others like them. That may be true in general, but for any given user, it may nudge them to live more segregated lives than they would otherwise want to, without them knowing at all.
The emerging and heavily funded field of precision medicine revolves around the fact that doctors can start to personalize diagnosis and treatment based on how others—whether similar to you in their DNA, demographics, disease pattern, or life habits—respond to care. In the future, the goal is that health care will be highly personalized, and improved outcomes and lower costs will result. This is at an early stage, but already, responding to financial incentives in Obamacare, hospitals are using data mining to predict which patients are more likely to be readmitted within 90 days. People at a high risk to return are likely to receive more attentive follow-up care. At one hospital, for example, they are assigned a post discharge coordinator, where someone at a lower risk might not get the same treatment.
Personal finance is another new area for algorithmic, data-driven predictions, with a number of new "robo-advsior" apps. "We’re getting used to computers actually being pretty credible in terms of the recommendations they make," says Vasant Dhar, a professor at NYU’s Stern School of Business and its Center for Data Science. "It’s not that much of a stretch, where [a computer] actually says, here’s what I suggest with your portfolio."
Even major life decisions like college admissions and hiring are being affected. You might think that a college is considering you on your merits, and while that's mostly true, it's not entirely. Pressured to improve their rankings, colleges are very interested in increasing their graduation rates and the percentage of admitted students who enroll. They have now have developed statistical programs to pick students who will do well on these measures. These programs may take into account obvious factors like grades, but also surprising factors like their sex, race, and behavior on social media accounts. If your demographic factors or social media presence happen to doom you, you may find it harder to get into school—and not know why.
And what about getting a job? Consider a startup called Gild, which has built a database of tens of millions of professionals that contains data purchased from third-party providers plus "anything and everything that’s publicly available," according to CEO Sheeroy Desai. Its system identifies candidates who fit a job opening and analyzes factors that might predict their success. The company says it currently has about 10,000 recruiting and hiring managers using the platform, from employers such as Facebook, HBO, and TD Bank.
Desai says Gild’s speciality is "unifying information across very different sources." Its big data-based recommendations consider factors including job history, language, and behavior on social media sites, and public work samples such as a programmer's open-source code contributions. By analyzing the job movements of millions of people, it rates candidates not only based on their expertise but also how in-demand they might be based on the current job market. It also tries to answer questions like when a given person is most susceptible to a new job offer, how a person’s career track predicts where they’ll be in 10 years, and the likelihood a person will be a good fit at a company.
"The reason the job market is so inefficient is that we have humans making decisions," says Desai. Humans, he says, often have more nuanced judgment than a computer, but that judgment is clouded by lots of little biases that people are blind to. "At the end of the day, companies are still going to make decisions based on humans. We want to make more unbiased recommendations on who you should be interviewing."
On the plus side, recruiters have lauded it for helping them find candidates they might not otherwise have considered, like someone who didn’t go to college. A downside? Candidates trying to negotiate a higher salary against this kind of smart system might find a harder time of it. In either case, job candidates, Desai says, are sometimes shocked at how much interviewers know about them ahead of time.
"I think the opportunity is a rich one. At the same time, the ethical considerations need to be guiding us," says Jesse Russell, chief program officer at the National Council on Crime and Delinquency, who has followed the use of predictive analytics in child protective services. Officials, he says, are treading carefully before using data to make decisions about individuals, especially when the consequences of being wrong—such as taking a child out of his or her home unnecessarily—are huge. And while caseworker decision-making can be flawed or biased, so can the programs that humans design. When you rely too much on data—if the data is flawed or incomplete, as could be the case in predictive policing—you risk further validating bad decisions or existing biases.
Russell’s concerns are applicable to many areas where big data touches our lives. What happens when a computer says you’re likely to commit a crime before you do it, and, worse, what if the data underlying that prediction is wrong and you can’t do anything about it? What happens when a dating program is slowly pushing us to a more segregated society because it shows us the people it thinks we want to see? Or when personalized medicine can save lives, but because it is based mainly around genomes sequenced from white people of European descent, it's only saving some lives?
And while it’s true that analytics can already make smarter guesses than humans in many situations, people are more than their data. A world where people struggle to rise above what is expected of them—say a college won’t admit them because they don’t seem like someone with a good chance of graduating—is a sad world. "There’s this danger we lose our identity as people and we become categories," says Dhar.
On the other hand, big data does have the potential to vastly expand our understanding of who we are and why we do what we do. A decade ago, serious scientists would have laughed someone out of the room who proposed a study of "the human condition." It is a topic so broad and lacking in measurability. But perhaps the most important manifestation of big data in people’s lives could come from the ability for scientists to study huge, unwieldy questions they couldn't before.
A massive scientific undertaking to study the human condition is set to launch in January of 2017. The Kavli Human Project, funded by the Kavli Foundation, plans to recruit 10,000 New Yorkers from all walks of life to be measured for 10 years. And by measured, they mean everything: all financial transactions, tax returns, GPS coordinates, genomes, chemical exposure, IQ, bluetooth sensors around the house, who subjects text and call—and that’s just the beginning. In all, the large team of academics expect to collect about a billion data points per person per year at an unprecedented low cost for each data point compared to other large research surveys.
The hope is with so much continuous data, researchers can for the first time start to disentangle the complex, seemingly unanswerable questions that have plagued our society, from what is causing the obesity epidemic to how to disrupt the poverty to prison cycle. "There’s so many pressing problems that we struggle with in this society, and we are so bad at data-driven policy," says Paul Glimcher, director of the project and a professor of neural science, economics, and psychology at NYU.
For example, how do people decide what to eat? In these decisions, there’s complex interactions between biology, behavior, and environment that have always made this question hard to study comprehensively. But if the Kavli Human Project combines geo-located food shopping and consumption data with health biomarkers with financial details and other data, obesity experts say this will be a "first-of-its-kind bio-behavioral, economic, and cultural atlas of diet quality and health for New York City" that can help them make breakthroughs.
Part of its potential is that it could bring the benefits of big data to those who are currently left out. "I think it’s really not been the case that [big data] has broadly impacted everyone. I think it’s impacted the people who write for The New York Times and Fast Company, and people who read The New York Times and Fast Company," Glimcher says.
Glimcher says he’s disappointed at the ways that big data tools have been used so far. "It’s just terrible," he says. "Sometimes big data is treated as if it’s an organism. And the question is how will this organism interact with us. And we really honestly hate that. We like the idea as scientists, as activists—we are big data. We are designing big data. And the challenge is to design big data that has those positive impacts, not to wait and see."