Perhaps the most well-developed aspect of AI is the area of work known as machine learning, which involves recognizing patterns in data and using those patterns to make predictions. Thanks to computers that are able to process vast amounts of data through complex algorithms, those predictions can be honed to a high degree of accuracy. In this way, businesses can use their data to better anticipate customer or market behavior.
Take a question that confounds consumer-facing businesses such as online retailers and banks: Which customers are most likely to leave? Known as churn, the number of quitters is an important gauge of a company’s health, since customers who leave have to be replaced, often at great cost. If a company knew which customers to target with retention efforts, such as special offers or incentives, it could save considerable money.
That sort of insight is becoming more accessible to businesses of widely varying types and sizes thanks to cloud-based tools developed and sold by some well-known tech companies.
“Imagine if all companies of the early 20th century had owned some oil, but they had to build the infrastructure to extract, transport, and refine that oil on their own,” write Chicago Booth’s Nicholas Polson and University of Texas’s James Scottin their 2018 book, AIQ: How People and Machines Are Smarter Together:
Any company with a new idea for making good use of its oil would have faced enormous fixed costs just to get started; as a result, most of the oil would have sat in the ground. Well, the same logic holds for data, the oil of the 21st century. Most hobbyists or small companies would face prohibitive costs if they had to buy all the gear and expertise needed to build an AI system from their data. But the cloud-computing resources provided by outfits such as Microsoft Azure, IBM, and Amazon Web Services have turned that fixed cost into a variable cost, radically changing the economic calculus for large-scale data storage and analysis. Today, anyone who wants to make use of their ‘oil’ can now do so cheaply, by renting someone else’s infrastructure.
Recognizing myriad uses for AI, the European Commission announced last year that between 2018 and 2020 it would invest €1.5 billion in AI research and applications, including in “the uptake of AI across Europe” via “a toolbox for potential users, with a focus on small and medium-sized enterprises, non-tech companies and public administrations.”
The inclusion of public administrations in the commission’s announcement is an acknowledgment that machine learning and AI could be applied anywhere—in the public or private sectors—that people have large quantities of data and need to make predictions to decide how best to distribute scarce resources, from criminal justice to education, health care to housing policy. The question is, should they be?
The dangers of algorithmic autonomy
It’s an axiom of computing that results are dependent on inputs: garbage in, garbage out. What if companies’ machine-learning projects come up with analyses that, while logical and algorithmically based, are premised on faulty assumptions or mismeasured data? What if these analyses lead to bad or ethically questionable decisions—either among business leaders or among policy makers and public authorities?
One of the appeals of AI as a supplement or substitute for human decision-making is that computers should be ignorant of the negative and often fallacious associations that bias people. Simply participating in society can make it easy for people to absorb ideas about race, gender, or other attributes that can lead to discriminatory behavior; but algorithms, in theory, shouldn’t be as impressionable.
Yet, George Washington University’s Aylin Caliskan, University of Bath’s Joanna J. Bryson, and Princeton’s Arvind Narayanan find that machine-learning systems can internalize the stereotypes present in the data they’re fed. Applying an analytical technique analogous to an implicit association test—used to ferret out the unconscious connections people make between certain words and concepts—to a commonly used machine-learning language tool, the researchers find that the tool exhibited the same biases found in human culture. European American names were more closely associated with pleasant words than they were with unpleasant ones, in comparison to African American names, and female names were more closely associated with words that have familial connotations than with career-oriented words, as compared to male names.
“Our findings suggest that if we build an intelligent system that learns enough about the properties of language to be able to understand and produce it, in the process it will also acquire historical cultural associations, some of which can be objectionable,” the researchers write.
The concern that AI systems could be founded upon biased data goes beyond linguistic applications. A 2016 ProPublica investigation of an algorithmic tool used to assign “risk scores” to defendants during the bail-setting process finds that the tool was more likely to misidentify black defendants as being at high risk for recidivism than white defendants, and conversely, more likely to mislabel white defendants as low-risk. Although a follow-up analysis by the tool’s creator disputes ProPublica’s conclusions, the findings nonetheless reflect the possibility that if authorities rely on AI to help make weighty decisions, the tools they turn to could be inherently partial.
The problem of skewed data is further compounded by the fact that while machine learning is typically useful for making predictions, it is less valuable for finding causal relationships. This can create issues when it’s used to facilitate decision-making.
Chicago Booth’s Sendhil Mullainathan and University of California at Berkeley’s Ziad Obermeyer offer the example of machine learning applied to medical data, which can identify, for example, that a history of acute sinusitis is predictive of a future stroke. But that doesn’t mean sinus infections lead to cardiovascular disease. Instead, there is a behavioral explanation for their association on a patient’s medical record: the decision to seek care.
“Medical data are as much behavioral as biological; whether a person decides to seek care can be as pivotal as actual stroke in determining whether they are diagnosed with stroke,” the researchers write.
A well-intentioned hospital administrator might attempt to use AI to help prioritize relevant emergency-room resources for patients at the highest risk of stroke. But she could end up prioritizing patients simply most likely to seek treatment, exacerbating health-care inequality in the process.
“The biases inherent in human decisions that generate the data could be automated or even magnified,” Mullainathan and Obermeyer write. “Done naively, algorithmic prediction could then magnify or perpetuate some of [the] policy problems we see in the health system, rather than fix them.”
Recommended Reading How Making Algorithms Transparent Can Promote Equity
Under the right circumstances, algorithms can be more transparent than human decision-making, and even can be used to develop a more equitable society.How Making Algorithms Transparent Can Promote Equity
Kleinberg, Ludwig, and Mullainathan, with Harvard postdoctoral fellow Himabindu Lakkaraju and Stanford’s Jure Leskovec, find in other research that judicial risk-assessment tools such as the one studied by ProPublica can be constructed to address numerous societal concerns. “The bail decision relies on machine learning’s unique strengths—maximize prediction quality—while avoiding its weakness: not guaranteeing causal or even consistent estimates,” the researchers write. Using data on arrests and bail decisions from New York City between 2008 and 2013, they find evidence that the judges making those decisions frequently misevaluated the flight risk of defendants (the only criterion judges in the state of New York are supposed to use in bail decisions) relative to the results of a machine-learning algorithm. The judges released nearly half the defendants the algorithm identified as the riskiest 1 percent of the sample—more than 56 percent of whom then failed to appear in court. With the aid of the algorithm, which ranked defendants in order of risk magnitude, the judges could have reduced the rate of failure to appear in court by nearly 25 percent without increasing the jail population, or reduced the jail population by more than 40 percent without raising the failure-to-appear rate.
What’s more, the algorithm could have done all this while also making the system more equitable. “An appropriately done re-ranking policy can reduce crime and jail populations while simultaneously reducing racial disparities,” the researchers write. “In this case, the algorithm is a force for racial equity.”
Stanford’s Sharad Goel, HomeAway’s Justin M. Rao, and New York University’s Ravi Shroff used machine learning to determine that New York could improve its stop-and-frisk policy by focusing on the most statistically relevant factors in stop-and-frisk incidents. The researchers note that between 2008 and 2012, black and Hispanic people were stopped in roughly 80 percent of such incidents, despite making up about half the city’s population, and that 90 percent of stop-and-frisk incidents didn’t result in any further action.
Focusing on stops from 2011 through 2012 in which the police suspected someone of possessing a weapon, the researchers find that 43 percent of stops had less than a 1 percent chance of finding a weapon. Their machine-learning model determined that 90 percent of weapons could have been recovered by conducting just 58 percent of the stops. And by homing in on three factors most likely to indicate the presence of a weapon—things such as a “suspicious bulge”—the police could have recovered half the weapons by conducting only 8 percent of the stops. Adopting such a strategy would result in a more racially equitable balance among those who are stopped.
Results such as these hint at the potential of AI to improve social well-being. Used judiciously, it could improve outcomes without costly trade-offs—reducing jail populations without increasing crime, for instance. But as AI comes of age, those who develop and rely on it will have to do so cautiously, lest it create as many problems as it solves.
- Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner, “Machine Bias,” ProPublica, May 2016.
- Eva Ascarza, “Retention Futility: Targeting High Risk Customers Might Be Ineffective,” Columbia Business School research paper, July 2017.
- Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan, “Semantics Derived Automatically from Language Corpora Contain Human-Like Biases,” Science, April 2017.
- Jean-Pierre Dubé and Sanjog Misra, “Scalable Price Targeting,” Working paper, October 2017.
- “Artificial Intelligence for Europe,” Communication from the Commission to the European Parliament, the European Council, the Council, the European Economic and Social Committee and the Committee of the Regions, April 2018.
- Sharad Goel, Justin M. Rao, and Ravi Shroff, “Precinct or Prejudice? Understanding Racial Disparities in New York City’s Stop-and-Frisk Policy,” Annals of Applied Statistics, March 2016.
- Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan, “Human Decisions and Machine Predictions,” NBER working paper, February 2017.
- Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Ashesh Rambachan, “Algorithmic Fairness,” AEA Papers and Proceedings, May 2018.
- Sendhil Mullainathan and Ziad Obermeyer, “Does Machine Learning Automate Moral Hazard and Error?” American Economic Review, May 2017.
More from Chicago Booth Review
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.