Lines connecting nodes to make the shape of the statue "The Thinker" — Credit: Liu Zishan/Shutterstock

Chicago Booth Review Podcast Do We Trust A.I. to Make the Right Decisions?

Algorithms are everywhere. They curate our music playlists, guide our online shopping—and, increasingly, dominate the news. But while most of us have accepted the idea of algorithms making, or helping make, some of the mundane decisions in our day-to-day lives, when do we prefer human discretion to the inscrutable judgment of A.I.? And how do we keep algorithms, some of which are tasked with making predictions that are far from mundane, from exacerbating existing problems of bias and inequality? On this episode of the Chicago Booth Review Podcast, we explore questions about the role of A.I. in society.

Audio Transcript

Hal Weitzman: Most of us are getting used to giving up some of our data to algorithms. We let them recommend what we should read and view next. We let them suggest what we should buy and where we should go. We let them organize our photos, music and videos.

But how comfortable are we letting algorithms make bigger choices for us and others, like moral choices? And would we prefer humans to be in charge in some cases, even if computers are more accurate?

Welcome to the Chicago Booth Review podcast, where we bring you groundbreaking academic research in a clear and straightforward way. I’m Hal Weitzman, and in this episode we’re looking at research that sheds light on what we really think about artificial intelligence and how much control we want it to have over our lives. And we’ll hear from one of the world’s leading researchers on applying AI.

(Music)

First, some research that appeared in the Summer 2022 issue of Chicago Booth Review’s print magazine under the headline “Why We Don’t Want Algorithms to Make Moral Choices.” The article was written by our contributor Kasandra Brabaw.

Reader: As life grows increasingly digital, Americans rely on algorithms in their daily decision-making—to pick their music on Pandora, guide their program choices on Netflix, and even suggest people to date on Tinder. But these computerized step-by-step processes also determine how self-driving cars navigate, who qualifies for an insurance policy or a mortgage, and when and even if someone will get parole. So where should we draw the line on automated decision-making?

For most people, it’s when the choices turn “morally relevant,” according to Chicago Booth’s Berkeley J. Dietvorst and Daniel Bartels. Decisions that “entail potential harm and/or the limitation of one or more persons’ resources, freedoms, or rights” often result in the need to make trade-offs, something that we think humans, not algorithms, are better equipped to do, their findings suggest.

Most people expect algorithms to make recommendations on the basis of maximizing some specific outcome, and many people are fine with that in amoral domains, according to the researchers. For example, more than 1 billion people trust Google Maps to get them where they’re going. But when the issue of moral relevance intrudes, people start to object, the researchers’ experiments demonstrate.

“I suspect people feel they don’t want to toss out human discretion in matters of right and wrong, in part because they likely feel they ‘know it when they see it,’” Bartels says. Moral questions are complicated, he says, especially because one person’s morals won’t always align with another’s.

To test this theory, Dietvorst and Bartels gave 700 study participants the scenario of a health-insurance company considering using an algorithm to make certain decisions about customers’ plans. After reading about which choices the company would outsource to an algorithm, participants were asked if they, as customers of the company, would switch to a different insurer because of the algorithm. The researchers find a strong correlation between how morally relevant participants found the choices to be and their intention to switch insurance companies.

From another study, which involved participants choosing whether they’d want a human or an algorithm to make various decisions, the researchers learn that people dislike algorithms in moral situations because they focus only on outcomes, not the moral trade-offs involved. People expect human decision makers to take into consideration values such as fairness and honesty and to make a judgment for each specific case.

When weighing these trade-offs takes a back seat to achieving an optimal outcome, people become wary of the decision maker, be it human or machine, according to Dietvorst and Bartels. In another experiment involving health-insurance choices, participants disliked it when human decision makers tried to maximize costs and benefits when deciding which medications to cover, although they still trusted humans over algorithms to make the choices.

As algorithm-based decision-making becomes more widespread in businesses and other organizations, as well as our daily lives, it seems unlikely that people will fully trust algorithms to make decisions that require moral trade-offs anytime soon, the researchers find.

“People have a ton of moral rules that often conflict,” Dietvorst says. “They have complicated prioritizations, where sometimes this is the most important rule, but in other contexts, that’s the most important rule.”

For now, Dietvorst advises that when considering using algorithms to make a set of choices, weigh the moral relevance involved. Navigating to a destination usually involves few moral implications, but prioritizing medical patients for treatment does—and many decisions, such as picking a restaurant or show to watch—occupy a gray area. If a decision has a clear moral impact, an organization should consider giving a human the final say. That way, the organization can capitalize on algorithms’ ability to often make better predictions than humans, but the final decision will feel more acceptable.

(Music interlude)

Hal Weitzman: Next, let’s look at a piece about some earlier research by Berkeley Dietvorst, that appeared in the Autumn 2020 issue of Chicago Booth Review, titled “Even When Algorithms Outperform Humans, People Often Reject Them” and written by CBR digital editor Jeff Cockrell.

Reader: Data science has created more and better applications for algorithms, particularly those that use machine learning, to help predict outcomes of interest to humans. But has the progress of algorithmic decision aids outpaced people’s willingness to trust them? Whether humans will put their faith in self-driving cars, ML-powered employment screening, and countless other technologies depends on not only the performance of algorithms, but also how would-be users perceive that performance.

In 2015, Chicago Booth’s Berkeley J. Dietvorst, with University of Pennsylvania’s Joseph P. Simmons and Cade Massey, coined the phrase “algorithm aversion” to describe people’s tendency to distrust the predictions of algorithms, even after seeing them outperform humans. Now, further research from Dietvorst and Booth PhD student Soaham Bharti suggests that people may not be averse to algorithms per se but rather are willing to take risks in pursuit of exceptional accuracy: they prefer the relatively high variance in how well human forecasters perform, especially in uncertain contexts. If there’s a higher likelihood of getting very good forecasts, they’ll put up with a higher likelihood of very bad ones.

In a series of experiments, Dietvorst and Bharti find evidence that people have “diminishing sensitivity to forecasting error”—in general, the first unit of error a forecaster (whether human or machine) realizes feels more significant than the fifth, which feels more significant than the 15th. The perceived difference between a perfectly accurate forecast and one that’s off by a measure of two is greater than the perceived difference between a forecast that’s off by 10 and a forecast that’s off by 12.

In one experiment, participants choosing between two forecast models were willing to pay more to upgrade to a model that was more likely to be perfect than they were for other models offering the same average performance—and the same expected earnings—but lower odds of perfection. In other studies, when presented with two models offering the same average performance, participants preferred the one that offered the best chance of a perfect or near-perfect forecast, even though it was also more likely to produce a comparatively inaccurate forecast. People were willing to risk a prediction that was well off the mark for the sake of a shot at a near-perfect prediction.

This premium on perfection or near perfection is what one would expect from diminishing sensitivity to error, Dietvorst says, and can help explain why people often prefer human forecasts to algorithmic ones. Algorithms generally offer consistency and greater average performance, whereas humans are “a little worse on average, but they could do anything,” he says. “They could knock it out of the park. They could have a really bad forecast. You don’t quite know what’s coming with a human.”

Dietvorst and Bharti also explored how the level of uncertainty in a decision-making setting interacts with diminishing sensitivity to error. The researchers assigned participants to forecasting tasks with varying levels of uncertainty and had them choose between their own predictions and those made by an algorithm. Even though the algorithm always followed the best possible forecasting strategy, its odds of a perfect forecast went down as uncertainty went up. This would be the case for any forecaster, algorithmic or otherwise—and yet participants became more likely to choose their own predictions as the level of uncertainty increased.

This held even when participants recognized that the algorithm performed better on average. A belief that their own predictions were more likely to be perfect—even if less accurate on average—was more important, the researchers find.

The findings have implications for business decision makers and the general public, who often face what the researchers call “irreducible uncertainty”—situations where complete certainty isn’t possible until the outcome is known. For example, there’s no way for managers to know next quarter’s product demand until the quarter ends. If they bypass a superior algorithmic forecast and trust a human instead, hoping for perfection, they’ll end up with a lower average forecast accuracy in the long run, which could lead to unnecessary inventory costs.

Similarly, beliefs that human drivers are more likely than an algorithm to make a perfect decision could cause us to delay adopting self-driving technology, even if such technology would dramatically improve traffic safety on average.

People’s diminishing sensitivity to error and preference for variance could penalize some algorithms less than others. Although he hasn’t studied whether or how these preferences vary across different types of algorithms, Dietvorst says the fact that machine-learning algorithms are able to improve their performance over time means that people may be more likely to believe they’re capable of perfect or near-perfect forecasts in the future.

When comparing an ML algorithm to one that’s constant, “you might believe that a machine-learning algorithm gives you a relatively higher chance of being near perfect,” Dietvorst says, “even if the past performance that you’ve seen from the two is identical.”

(Music interlude)

Hal Weitzman: Finally, we’ll hear from Sendhil Mullainathan one of the world’s leading researchers in AI, its applications and implications. Mullainathan is a professor at Chicago Booth and faculty director of the school’s Center for Applied Artificial Intelligence. He believes that, in spite of the challenges of using AI, it can help address human bias. We interviewed him in June 2019 as part of our Line of Inquiry video series.

Sendhil Mullainathan: If you look at a decision maker, say a scout in a professional league going and watching a player and saying, “Oh, is this player good?” What are they looking at? They’re looking at the objective data, how many baskets did they score, how many things they do. What are they doing when they make the judgment? They’re not just translating objective data. They’re adding their own intuition. In doing so, every piece of psychology tells us they’re also adding their own bias.

People, when looking at people or looking at anyone and making judgments, tend to make judgments that are stereotypical. They tend to make judgments that reflect their prejudices. So, algorithms don’t have any of that. They’re going to reflect the bias in the data, but they’re not going to add to it. Humans, on the other hand, take the data and then add an extra layer of bias, and so one of the biggest advantages of algorithms in promoting equity is that they remove that one layer. That doesn’t mean they don’t have other problems, but those problems are manageable, and at least we’re starting in a place closer to an objective, fair reality.

Yeah, one of the unfortunate things is, if we know that the algorithm is biased because the data is biased, we might say, well, let’s get unbiased data. Practically speaking, that’s very unlikely to happen, and even if you happen to be in a particular example where the data happens to be unbiased, it’s just a faulty way of thinking.

I kind of think of this as akin to fraud. Any good financial system, auditing system, accounting system, is always alive to fraud. Even if you put in a bunch of security checks and balances to make things work, you will always be alive to fraud, and I think in building algorithms, we have to have a similar attitude. We’re never going to have a situation free of bias. We always need to be alert to the bias in the data.

Yeah, so as you’re building these algorithms or you’re using them or you’re designing them, I think there are several ways that you can be alert to bias and resolve it. I think the first and foremost is just to ask the question: What are all the ways in which the outcome that the algorithm is predicting doesn’t match the outcome of interest? ‘Cause it’s never exactly the same. You have an outcome in the data set, and the outcome might be did a customer click on a link? That’s not the outcome you care about. You care about was the customer happy with the experience or did it induce them to buy more of the product? So, the difference between the thing we want to predict and the thing that we measure, that’s one big source of bias. So, even just asking the question, what are we predicting, what are we thinking we’re predicting, and what’s the difference between those, is I would say one of the first things people can do.

The second thing that people can do is to look at the output of the algorithm and actually ask, OK, well, this algorithm, how many of this group does it hire and how many of this group does it hire, or choose or predict? And by comparing the outputs, we can ask, wait, if there’s a disparity, is there a reason for that disparity? Is that the algorithm adding some bias? So I think both by looking at the inputs and then looking at what it produces, we can be very alert to, OK, was this a result of bias, or was it something more structural upstream from the algorithm?

When we use the word fairness, we mean lots of different things. For example, we might want that two people who look similar should be treated similarly. An alternative that you might want is that you might want that after the fact, we wouldn’t want people who committed a crime, say, to be released at different rates if they’re black than white, so one is after the fact, one is before the fact, and you could imagine coming up with these kinds of notions of fairness, and you would imagine you want all of them to be true, and in fact, when you look at how people evaluate algorithms, someone might use one kind, another person might use another kind.

What we show is that, actually, these three are actually all in conflict, and you can’t have all of them. And so, that illustrates, I think, one of the recurring themes in this fairness literature, is that when you start to model these things kind of carefully, there are trade-offs. There is no easy way to get all the things that you want, so we all understand that there might be a trade-off between fairness and efficiency, but it’s more basic than that. There are trade-offs between the different kinds of fairness we want, and there are trade-offs between fairness and simplicity, and so I think this next generation of literature is making it clear, unfairness is not something that we can just say, “Oh, we want to mete it out all the time.” It comes at costs, and there’s no way to get everything that you want.

I think that the places where algorithms have shown and people are most worried about bias are actually also the places where algorithms have the greatest potential to reduce bias. So, I’ll give you two examples. Take hiring. Hiring is the place where we’re worried that, “Oh, the underlying data is biased, so the algorithm will be biased, and that’s fair, but hiring is also the place where study after study has shown humans add a tremendous amount of bias in which resumes to look at, which people to hire conditional on the resume, et cetera, et cetera. And that bias is in addition to whatever was in the data. So if you take something like hiring, it’s ironically the places where we worry algorithms will be biased are also the places where they have the most potential to remove a lot of the biases of humans.

Same in the criminal justice system: It’s reasonable that people are worried algorithms in the criminal justice system might add bias, and we should worry about that and find ways to deal with it, but we have to remember that all of the literature to date has actually shown there’s an enormous amount of human bias that’s added to any decision, and so, again, even though we’re worried about bias by algorithms, it’s the bias by humans that makes this a great opportunity. And so, as a result, there’s this perverse set of circumstances where we should remember that the places where the data is the most biased, which is where humans are adding a lot of the bias, is the place where we should look to properly build algorithms to get rid of bias, so it’s kind of a perverse state of affairs.

When we think about whether AI will promote equity, I think one of the things I always think back to is people want to anthropomorphize technologies, especially these sort of artificial intelligence technologies, and I think in part because it has the word intelligence in it. People want to imagine these tools will have their own intelligence or humanity, almost, and so they want to know what these tools would do, but ultimately, they’re just tools, so whether AI in any context, system, social system, company, whether it promotes equity or not is going to simply be a consequence of the intentions of the people building these algorithms and their knowledgeability.

I think the science is moving forward that we can make the builders of these tools knowledgeable enough about the biases and everything that can happen and how to fix them, so that, in 10 years, what we’re left with is intention. If people want to use these tools to promote inequity, they will promote a great deal of inequity, because you can do more harm with this tool than you could just by yourself. If people want to use these tools to promote equity, they will be able to promote a great deal of equity. So, whether algorithms ultimately are good for equity or bad for equity, ultimately, simply comes down to the human motives and desires and regulations we put on people. It’s not going to be a technological problem. It’s going to be, ultimately, just a sociological problem.

Hal Weitzman: You can find a lot more on AI and the creep of algorithms into every aspect of our lives and society on Chicago Booth Review’s website at https://www.chicagobooth.edu/review. When you’re there, sign up for our weekly newsletter so you never miss the latest in business-focused academic research.

That’s it for this episode, which was produced by Andy Graham and Josh Stunkel. If you enjoyed this episode, please subscribe and please do leave us a 5-star review.

Until next time, I’m Hal Weitzman, and thanks for listening to the Chicago Booth Review podcast.

Listen to More of the CBR Podcast

Chicago Booth Review Podcast Do We Trust A.I. to Make the Right Decisions?

Related Topics

Related Topics