Chicago Booth Review Podcast The ‘Professor of Uncertainty’ on AI
- September 17, 2025
- CBR Podcast
Artificial intelligence is untrustworthy. It hallucinates, making up information or extrapolating by itself. That might make it frustrating to use, but could it prove more of a feature than a bug? Chicago Booth’s Veronika Ročková uses statistical methods that exploit the randomness in AI responses to improve medical diagnoses, and even to classify galaxies more efficiently.
Veronika Ročková: However, the combination is done in a very careful way because we acknowledge the fact that the predictions might be wrong, they might be biased, and they might actually even harm the predictions in some ways.
Hal Weitzman: AI is untrustworthy, it hallucinates making up information or extrapolating by itself that might make it frustrating to use, but could it prove more of a feature than a bug? Welcome to the Chicago Booth Review Podcast, where we bring you groundbreaking academic research in a clear and straightforward way. I'm Hal Weitzman, and today I'm talking with Chicago Booth's Veronika Ročková, who uses statistical methods that exploit the randomness in AI responses to improve medical diagnoses and even to classify galaxies more efficiently. Veronika Ročková, welcome to the Chicago Booth Review Podcast.
Veronika Ročková: Thank you so much for having me.
Hal Weitzman: Well, we're delighted to have you because we're talking about AI, which everybody wants to talk about, and anyone who's ever used AI knows that AI cannot be trusted. You have to verify everything that AI says, answers could be or the responses could be random, varied, and most of us think variability is problematic. Are we wrong to think that those random answers are a problem? And how did you decide to turn that into an area of research?
Veronika Ročková: Well, I think this is a really important point because randomness is inherently part of our world. So whenever we have a conversation, there's going to be some randomness involved. If you ask me to very same question again, I would be answering in a slightly different way. So randomness is something that people are not used to essentially coping with, but we should acknowledge it. So generative intelligence essentially embodies randomness. So the way generative intelligence works is that we have some random input and we transform it into some random output. So the randomness is really inherent. I worry sometimes that people are not aware of the degree of randomness and somehow-
Hal Weitzman: Right, it's treated as a bug, not as a feature.
Veronika Ročková: Right. So let me explain the following concept. So in statistics, we like to distinguish between a point prediction, which is essentially guess at some unknown quantity, and it's essentially single guess and then distributional prediction. So the distributional prediction is when we acknowledge all possible outcomes and we attach probabilities to these outcomes, and for statistical inference, it's so much better to have distributional forecasts because they really embody uncertainty, not only just a typical answers, but also uncertainty around that. And what we decided to in this work is somehow leverage uncertainty embodied in these black book systems because there is randomness involved. And even though there might be some relevant information, we know that information might be biased, it might be variable, but we can still leverage it in a form of a distributional prediction.
Hal Weitzman: Okay. So AI reflects the real world, reflects how we think, that's why it's artificial intelligence.
Veronika Ročková: Well, I think that it's nice that it mimics randomness just like human intelligence.
Hal Weitzman: And you're saying randomness is just inherent in the world?
Veronika Ročková: That's correct. And statisticians, we are used to uncertainty. We love uncertainty. You might introduce me as a professor of econometrics, but I like to think of myself as a professor of uncertainty. And so we do like to report uncertainty in terms of confidence intervals. These intervals essentially encapsulate range of plausible outcomes. And AI is not quite doing that yet. And that's where the statistics research can really help enhance AI by essentially providing some sort of a platform which could be implemented on top of AI, which can essentially report these intervals of plausibility. So uncertainty quantification as the technical term is really what I think we need to focus on more in AI research.
Hal Weitzman: Okay. Just as a side note, do you think that most of us are using AI wrong then because we're expecting it to be, quote, unquote, correct rather than expecting it to be somewhat random as we would if we were interacting with another person?
Veronika Ročková: Have you ever asked the same question twice a person for AI?
Hal Weitzman: Sure, yes. And then you get totally different answers. Oh, you mean to a person or to an AI?
Veronika Ročková: Both.
Hal Weitzman: Yes.
Veronika Ročková: And what was your conclusion from that?
Hal Weitzman: My conclusion is... Yeah, I mean, I guess you're right. I'm not really thinking it through, but the world is somewhat random and we don't plan things out.
Veronika Ročková: Yeah, I guess my conclusion would be can I really trust this person or can I trust this machine if I'm getting inconsistent answers? And I think that the repeated prompting or asking the same question multiple times or formulating the question differently is somewhat important for developing trust. And what we are doing in this research is we acknowledge that there is uncertainty and we try to essentially somehow assess the degree of uncertainty and then incorporate that into essentially analysis of real data. So we believe that the predictive systems, I'm not talking about large language models right now. I'm not talking about just ChatGPT, but any predictive system which has been trained on some massive amounts of data, which I don't have access to, machine has seen it, I haven't seen it, but somehow I believe that there is some information in those data sets that these systems have been trained on that I can leverage. And I'm just hoping to construct some probabilistic prediction, again, distributional prediction, which gives me an idea about a typical answer of the system as well as the variance in the answer.
Hal Weitzman: So you're also a professor of trust as well as a professor of uncertainty.
Veronika Ročková: I wouldn't go as far as that.
Hal Weitzman: But your work involves a concept called Bayesian inference. So in the spirit of explaining to us about econometrics and statistics, what's Bayesian inference and why is that a good way of making or a good method to help us make decisions when there is this uncertainty?
Veronika Ročková: So Bayesian way of thinking or Bayesian statistical inference hinges on the following principle. So we may start with some prior belief, and as we go through life or as we accrue data, we update these prior beliefs with evidence in data. And the updating is done through a certain mathematical formula, which is attributed to Thomas Bayes. That's why it's called Bayesian inference. But what it really boils down to is coherent, probabilistic updating of information. I use the word probabilistic, and that's somewhat important because probability is the main instrument of statistics. So we measure uncertainty with probability in a very much similar way as we measure temperature with a thermometer. So probability is really instrument for communication of uncertainty. And the Bayes rule, the Bayes theorem has such just beautiful way of updating information with newfound evidence. And what we obtain from the Bayesian inference is uncertainty quantification, because the prior itself is a distribution.
Again, distribution is essentially collection of probabilities that we attach to unknown outcomes. And then we may start with some prior beliefs that could be asking experts for opinion, maybe we have some past data, there is some information that we want to leverage, and that information is a distribution. And then we collect data on some subject and we essentially modify or we shift that prior distribution towards evidence in the data. And that shift is completed by that formula. And what we end up with is posterior distributions prior plus data implies posterior, and the posterior again is a distribution, but it encapsulates uncertainty after having seen the data and it uncertainty again as a distribution. What is the likelihood of this event happening? What is the likelihood of that event happening? And having a range of these different outcomes and having probabilities attached to them is super important because what people typically do is they rely on a single prediction.
So we have some unknown target and we want to guess at the center of the target, we should want dart and we get some estimate single number, and then we may want to make a decision based on that single number. That's I think far riskier than essentially making a decision based on variety of maybe darts we have attempted to shoot at target. So we have multiple essential scenarios. We know how likely these scenarios are, and we can incorporate this uncertainty in decision-making through weighted averages. So we have every outcome attached probability to it, and we do weighted average as opposed to plugging in a single scenario. And that concept of averaging out uncertainty or integrating out uncertainty, you can essentially show mathematically that you end up having decisions which smaller risk and mathematically you can actually prove that. So the Bayesian approach of integrating out is in fact better than just relying on one simple single prediction.
Hal Weitzman: So we're all doing Bayesian inference every day of our lives it sounds like.
Veronika Ročková: I think that's a natural way of being. We update information.
Hal Weitzman: But it sounds like we could also collect more data about those decisions. Is that right?
Veronika Ročková: We could definitely sharpen our understanding of the world, right? We could correct for our biases by collecting more data, and we could definitely reduce uncertainty by collecting more data.
Hal Weitzman: So in your research you described this as the Bayesian alternative to other ways of using AI to augment data. So what's the advantage of using Bayesian approach here with AI?
Veronika Ročková: So again, the AI platforms, many of them are generative. So the generative aspect again refers to the idea that we don't just get a single label. We got a range of answers which vary somehow around the typical behavior. So we wanted to leverage that. The previous data augmentation strategies, we were in fact inspired by this prediction powered inference framework. That framework relies on data imputation, essentially imputing labels. So when we have patients and we don't know their diagnosis, we might run patient characteristics through some system and we get some predicted essentially predicted condition, which the patient might have.
And then we could treat these predictions as true diagnoses and we could enhance analysis of small data sets of patients with rare diseases where we just don't have enough data. We might actually enhance analysis of these small data sets with these imputations. What we do, however, is we don't rely on the single imputation. We rely on essentially multiple imputations from systems, and we also report Bayesian uncertainty. What it means is that we essentially treat somehow the generative system as a prior distribution that could be combined with data to obtain potentially better predictions.
Hal Weitzman: So you use AI to create that prior thing, the prior information that you talked about.
Veronika Ročková: Right. So the system could be in some sense mined for prior information. So the way statisticians have been acquiring priors, obviously differs from person to person, but some traditional strategies are expert opinion mining or historical data. And if you don't have access to the massive amounts of data that these systems have been trained on, what we can do is we can essentially ask the right questions and the system will essentially reveal in some ways the data has been trained on through these questions. And then these essentially answers could be are random and they could form a distribution, and the distribution could be combined with analysis of real data to obtain Bayesian inference.
Hal Weitzman: Just explain, because you started talking about the medical application of this research, which is fascinating. So explain how that works there with the priors and then with the Bayesian inference.
Veronika Ročková: Okay, so I can give you example of skin diagnosis.
Hal Weitzman: Skin diseases, right? So these are rare diseases.
Veronika Ročková: So this particular application that we have in the paper is a not necessarily rare disease, but it's a disease characterized by six conditions, which are sometimes difficult to tell apart. So what we have done is we have essentially obtained actual real data set, which has the patient characteristics, and it also has diagnoses, real diagnoses diagnosed by humans. And because the sample size is somewhat small, we wanted to come up with some sort of a system which could provide better predictions for future patients based on information that's embedded within say, ChatGPT. Now, you might think that this might be a little bit crazy, why should ChatGPT give us even remotely close diagnoses? But you would be surprised that actually 70% of the test cases or the patients that were not actually involved in training the system were correctly diagnosed. So among those six conditions, ChatGPT just was able to correctly identify 70% of those.
And we thought that that's somewhat interesting, and maybe the information in ChatGPT is somewhat orthogonal to the information that we might have in a proprietary data sets that we might collect on actual real patients. So it was compelling to think about ways in which we could combine that information from that predictive system, which has not seen the proprietary data set and our proprietary data set, and what a better way to do so than through probabilities, right? Through probabilistic updating of evidence. And the Bayes theorem has a way to guide us towards that solution. So what we have done is essentially done some form of augmentation where the real data set was augmented with predictions, essentially labels of say predicted labels, predicted diagnoses of patients for which we just didn't have the diagnosis. And then these predictions were somehow essentially imputing some prior knowledge into the system. So that's how we accomplished Bayesian inference and Bayesian reporting of uncertainty.
Hal Weitzman: And the result is that what, you're more accurately able to predict that these or diagnose these conditions.
Veronika Ročková: So we ended up actually improving the prediction by 2.5%, which may or may not be meaningful, but it was an improvement and it was really nice to see that we were able to leverage information in that system.
Hal Weitzman: If you're enjoying this podcast, there's another University of Chicago podcast network show you should check out. It's called Big Brains. Big Brains brings you the stories behind the pivotal scientific breakthroughs and research that are reshaping our world, change how you see the world and keep up with the latest academic thinking with Big Brains, part of the University of Chicago podcast network. Okay, Veronika Ročková, in the first half we talked about your research on AI and using Bayesian inference to augment AI. And you talked about probability and how you've used your approach to improve, it sounds significant to me, but by 2.5% the diagnosis of certain conditions, medical conditions, I'm sure it's important to the people who were diagnosed and for those who were cleared. So to me, that's significant. We talked about imaginary, the fact that AI is somewhat random, and in your research you talk about using imaginary data or fake data generated by AI. What exactly is that? And then how do you use that to form the prior guess that you talked about?
Veronika Ročková: I mentioned previously that one way how to construct a meaningful prior is by looking at historical data. Because the way information proliferates is that we accumulate information, we update information with new-found data. So Bayesians are used to constructing priors from historical data. Now, in a lot of situations we don't have historical data, and the term imaginary data is actually not my term. That term could be attributed to perhaps Arnold Zellner, if not sooner. So he was a professor actually at Chicago Business School, Bayesian Econometrician, and he came up with this concept of imaginary data for constructing priors.
So the idea is that actually if we had access to historical data, we would somehow convert them into a prior distribution and observe data that we have right now would conceivably be very similar to the historical data. So he exploited that similarity by essentially constructing prior, which took advantage of some of the aspects of the observed data. And he argued, well, if I had some historical data, which I don't have, say it's imaginary historical data, the data would've looked similarly. And that's why we are using this prior, which kind of resembles the historical-
Hal Weitzman: Extrapolating backwards.
Veronika Ročková: In some sense. So that's the imaginary data part. Now this day we have generative systems, so we don't have historical data, but we do have these imaginary data. So in some sense we could think, "Well, maybe the predictions from these systems are giving us some data which is somewhat compatible with the data we are actually collecting in real life." So we could leverage that compatibility. So that's one way how to think about prior construction. So in our work, however, we went a little bit step further. So in Beijing, in France, we could try to construct priors on things we can never really observe or we can construct priors on the things we actually will end up observing. So let me give you an example. So in elasticity estimation, elasticity is essentially coefficient, which tells us how much say the sales might go down with increased price, that coefficient, that elasticity, we can never really truly observe it.
He may try to collect some historical data, come up with a prior how much the decrease could be, but we can never really know the true coefficient. So focusing on the unobservable quantities is somewhat challenging for constructing priors. So what we did, we focused on observables. So the actual data that we have, the actual sales data, which we cannot observe, we cannot observe the elasticity, but we can observe the sales. So what we did, we constructed a prior on the observables, and that's what makes our approach slightly different from these previous approaches by Arnold Zellner and so on.
Hal Weitzman: Okay, so that's how you form the prior guess. And then there's another technique I want you to talk about and explain, which is posterior bootstrap. It sounds painful, explain what it is and what it helps us to do.
Veronika Ročková: The painful aspect is that raising oneself by one's bootstraps is challenging. So that analogy has been actually used in the context of bootstrap, statistical bootstrap because it refers to impossibility. Bootstrap is a technique which bypasses the fact that in real life we just have a small data set, we don't have access to all the data out there. We just have that one small data set. And oftentimes you like to report uncertainty by doing the following imaginary experiment. What if I could collect the data over and over and over and over again. If I had access to all these different data sets, I could see how my answers change from one data set to another.
Maybe if I collect slightly different data set, my prediction of my estimate of the elasticity might be a little bit different. The variability from that kind of imaginary experiment is what we like to report. That's the kind of uncertainty that we often like to report, but we just don't have access to these data sets. So what the bootstrap idea is, self-sufficient can raise ourselves by our bootstraps because we can just face with the little data set that we have.
Hal Weitzman: And project it out.
Veronika Ročková: And project out. So what we do is instead of collecting new data, we shake the data that we have, we shuffle and shake it. So we shake it in a way that will make it sufficiently different but not the same, but sufficiently different from the data that we actually have. And then we analyze these shuffle data sets and see how our answers change. And then we report that uncertainty, that variability. And that's a beautiful concept, which I think is a fascinating concept in statistics. However, we bring it to a Bayesian domain. So the Bayesian approach that we use injects priors is through data augmentation. So we take our observed data and we augment it with these simulated or imaginary data, and then we do the reshuffling. So the Bayesian bootstrap is essentially a concept which does augmentation and then reshuffling. So we obtain these variants of these augmented data sets, and then we analyze these. And then from the variability of the answers on these augmented data sets, we essentially report the uncertainty.
Hal Weitzman: So we can turn small data sets into larger-
Veronika Ročková: Well, we can extrapolate.
Hal Weitzman: We can extrapolate. So one thing that's fascinating hearing you talk about this is in the past decade or so, there's been an explosion in so-called retractions of academic research, and some of it at least is based on people making up numbers. And it occurs to me that you are sort of doing the positive of that and saying, "Yes, we can absolutely do that and we can use it to give ourselves larger data sets that we can then analyze and get better results from."
Veronika Ročková: So I guess that the way we do that imputation is purely through the Bayesian framework. So we say, "Well, if there is some meaningful information in the system, we might like to construct a distribution from it and then combine it with the observed data." However, the combination is done in a very careful way because we acknowledge the fact that the predictions might be wrong, they might be biased, and they might actually even harm the predictions in some ways. So that's why calibration is super important. So calibration means we essentially... It's like a volume on a radio. So it's like we need to find the right, I would say, loudness for the prior to be able to convey its information, but without overtaking the observed data.
Hal Weitzman: Fascinating talking of calibration, I noticed that the prompt that you used, and we were talking about your skin condition research, the prompt uses ChatGPT to impute these diagnoses pretty detailed. So I was wondering how important is the wording of this prompt when you're generating these imaginary data and you sort of talk about that, and we talk about the volume.
Veronika Ročková: Obviously you wanted to give ChatGPT the best chance at giving us the best predictions it can, right? Because otherwise, why would we want to use that prior information? So we did design the prompt quite carefully, and it actually ended up yielding 70%, as I mentioned previously, 70% of these diagnoses were correct. So that was result of the careful prompting. So the prompt actually starts off with a sentence, "Imagine that you are a professional advanced AI medical assistant." And then we go on, we explain these six different skin conditions that we are trying to distinguish between, and then those were medical definitions. And then we go on by essentially explaining the data itself. Because what we have is a collection of numbers and how do we translate these numbers to ChatGPT? So we have conditions which have a couple of say categorical levels, like level of redness or itchiness, degrees of pain, and these could be verbalized.
So if you have a number between zero one and two, we could translate it as nonexistent, mild, or severe manifestation of that condition. So we do that translation of these symptoms, and then at the end we explain very carefully it would be want from ChatGPT, and that's the formatting of the answer in terms of probabilistic predictions. So we are really hoping for ChatGPT to tell us, "Okay, among these six conditions, what is the probability attach to each one of these six conditions based on these symptoms?" And what's somewhat amusing in fact that when I was looking at a code, I noticed that my co-author, Sean O'Hagan, actually asked ChatGPT very politely, "Please, could you format the output in this way?" I'm not sure if that changed to quality of the predictions, but it was at least curious to find out that my student is got a very good manners.
Hal Weitzman: He's a polite person. Absolutely. So apart from the skin condition research, you've also used this to classify galaxies. Just tell us very briefly about that.
Veronika Ročková: So image classification can be challenging. That could be a lot of demand for humans to classify images in radiology or histological samples in cancer diagnoses. So it takes far longer for a human to look at an image and classify it as opposed to a predictive system. So this particular application essentially looks at images of galaxies, and then we have say a small portion of human-labeled images, and then we have access to this computer vision model, which can give us the label, yes or no. Is it a spiral galaxy, yes or no? And the goal was to estimate for proportion of spiral galaxies, and we found that using this computer vision model, which is actually high quality, we were able to reduce the uncertainty, essentially report an interval which was narrower around actually around the value, which we could conceive as being a kind of true proportion of spiral galaxies. So I think image classification is another big area. I think that the obligations are endless.
Hal Weitzman: Yeah. What other applications have you... I mean, give us some more quotidian, more kind of day-to-day ones rather than the galaxy perhaps, although I know that's important.
Veronika Ročková: I guess foundation models are now being fine-tuned for various tasks, speech recognition, sentiment analysis. I think the vision transformers are huge. So I think that every time we have some foundation model, which has been trained on massive amounts of proprietary data that physicians maybe don't have access to. Physicians in Europe might not have access to data sets from American hospitals and universities. There might be value in essentially exchanging some of these sort of systems in terms of predictions. So medical diagnosis is definitely one immediate thing that comes to my mind.
Hal Weitzman: Okay. Well, Veronika Ročková, this has been such a fascinating... I feel like my brain has expanded exponentially during this discussion. The Professor of Uncertainty, thank you so much for coming on the Chicago Booth Review Podcast.
Veronika Ročková: Thank you so much for having me.
Hal Weitzman: That's it for this episode of the Chicago Booth Review Podcast, part of the University of Chicago Podcast Network. For more research, analysis, and insights, visit our website at chicagobooth.edu/review. When you're there, sign up for our weekly newsletter so you never miss the latest in business-focused academic research.
This episode was produced by Josh Stunkel. If you enjoyed it, please subscribe, and please do leave us a five-star review. Until next time, I'm Hal Weitzman. Thanks for listening.
Your Privacy
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.