Data-driven algorithms govern many aspects of life: university admissions, resume screening, and a person’s ability to get a car or home loan. Often, using data leads to more efficient allocation of resources and better outcomes for everyone. But algorithms can come with unintended consequences—and without care, their application can result in a society we don’t want.
Typically, we think of algorithms as being neutral and objective, but when software is written and trained by humans, it often encodes the biases and prejudices of the people that make and shape it. Ultimately, the biases built into algorithms can be racist and marginalize low-ranking socioeconomic groups. What’s truly worrying is that, unlike with people, the biases in algorithms are sometimes difficult to detect, undo, and fix.
No one sets out to create a racist model, but often bias creeps into algorithms inadvertently because of training data. To give a glaring example, Tay, the Microsoft chatbot, didn’t start out as offensive, but after interactions with malicious users, it parroted offensive content. Pokémon Go is another example where bias slipped in. Various reports have noted the tendency to see fewer Pokéstops in predominantly black neighborhoods than in white communities. The reason is that data for the game originally came from another location-based game called Ingress, which was more popular with white users who suggested points of interest.
Sometimes, the pathway to biased training data is even more circuitous, such as when Google Photos last year mistagged a photo with a young African-American couple with the label "gorilla." A Google spokeswoman said the company was appalled by the mistake and was taking action to improve automatic image labeling technology. Instead of "seeing" a face, these kinds of algorithms identify shapes, colors, and patterns in order to make educated guesses as to what the picture might actually be. However, it appears as if Google had simply never tested their algorithm on people with darker skin tones.
All of these examples are relatively benign, but when algorithms are deployed to make decisions, they can have much more serious consequences. For instance, recidivism models, which try to predict the likelihood of people committing future crimes after their release, label people who live in poor neighborhoods as more likely to relapse into criminal behavior. As a result, these people are frequently sentenced to longer jail terms by the courts. A speech given in 2014 by then-Attorney General Eric Holder was critical of such "risk assessments." Although they were crafted with good intentions, he warned such assessments "may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society."
Big data is increasingly being used to gauge the performance of workers. For example, software algorithms are being used to generate scores that evaluate teacher effectiveness. Teachers with the lowest scores are let go. Rating teachers is a laudable goal and could theoretically eliminate human bias in evaluation, but these algorithms have faced criticism because reducing human behavior to mathematical formulas is very hard. Student outcomes are affected by many factors, a number of which are outside the control of a teacher.
Using data to monitor business performance can have its advantages, but firms must recognize the limitations of machines and incorporate feedback to improve the algorithms over time.
Businesses may also be unaware of the unintended side effects of algorithms. There’s a danger that businesses using the data can in fact be unknowingly introducing bias, thus undoing the progress they’re making in other areas in order to meet the needs of diverse customers.
Human editors at Facebook, for example, were recently accused of intentionally suppressing articles from conservative news sources in its trending list. The company denied the allegations of bias, but nevertheless replaced human editors with algorithms. Days after making the shift, Facebook was criticized again when the algorithms published a fake news story. We expect human editors to act with journalistic integrity, but replacing people with algorithms doesn’t always solve the problem. In fact, by making the news selection process opaque, Facebook made it worse. The answer may lie in figuring out a better way for humans to oversee the machines and act as bridges between publishers and platforms.
Bias in Google’s search algorithms may also give the company the power to sway 25% of the world’s national elections, according to new research. The reason is that opinions shift in the direction of candidates who are favored in search ranking results. The search giant’s algorithms are changing the way people think every minute of every day, but we don’t know how because the algorithms are a tightly guarded secret.
Used properly, big data is able to solve incredible engineering challenges. Massive data has enabled innovative products like Apple’s Siri, Tesla’s Autopilot, and Google Translate.
We must not throw out the baby with the bathwater. We should be more cautious and aware of how data can be used and sometimes misused. We should figure out ways to verify that the algorithms we produce live up to the anti-bias commitment that we demand. And we should expose what algorithms are, what they can do, and how they operate.
Working at Bloomberg, I see firsthand the transformational impact data can have on business. But I also know how important it is for the data to be accurate and take the utmost care to generate algorithms that don’t inappropriately influence financial markets. When the stakes are so high, it’s imperative that integrity is maintained and the numbers are trusted by all who use them.
In order to build accurate, reliable algorithms, there should be a commitment to use data responsibly. This should include training data scientists on ethics, as well as technical skills. For example, a code of conduct for the industry would help spread awareness, while also holding it to account. For algorithms that guide significant decisions, like legal sentencing, we must develop auditing methods and screen algorithms for unintended bias, just as we would with people. This should be set up and managed by independent bodies that public bodies are accountable to.
Algorithms are not neutral. Over the past decade, data-driven algorithms have transformed society. As they become more entrenched in the fabric of society, we must recognize that they can be just as bigoted as a human.
Gideon Mann is head of data science at Bloomberg LP.