Chicago Booth Review Podcast Can You Catch AI-Written Reviews?
- December 10, 2025
- CBR Podcast
Schools are trying to restrict or at least identify the use of artificial intelligence. How effective are the tools they’re using? Chicago Booth principal researcher Brian Jabarian points out that it’s not just education that’s affected by AI writing, but also the reviews that drive consumer purchases. So can online tools help you detect if that restaurant review was written by ChatGPT? Jabarian talks about his research on AI detection tools.
Brian Jabarian: The students, but also everyone now, is adapting their writing style and their writing strategy to this type of AI detection. So they are using what is called humanizers, which are AI tools that help users to cover up the fact they've been using AI by changing the style to mimic a human's text.
Hal Weitzman: Schools are trying to restrict, or at least identify the use of AI, and colleges are attempting to catch students who cut and paste generated text to use in their assignments. How effective are the tools that they're using? Welcome to the Chicago Booth Review Podcast, where we bring you ground-breaking academic research in a clear and straightforward way. I'm Hal Weitzman, and today I'm talking with Chicago Booth's Brian Jabarian about his research on AI detection tools. Jabarian points out that it's not just education that's affected, but also the reviews that drive consumer purchases. So can online tools help you detect if that restaurant review was written by ChatGPT?
Brian Jabarian, welcome back to the Chicago Booth Review Podcast.
Brian Jabarian: Thanks for having me again, Hal.
Hal Weitzman: So we are going to talk to you again about AI, of course, because you're an economist at the Center for AI here at Chicago Booth, as well as at the Center for Decision Research. So this is research that we want to talk to you about, AI detection, something that we care a lot about, those of us in the education industry, and you and Chicago Booth's Alex Imas evaluated different tools for AI detection. First of all, is there a particular incident that got you interested in this topic?
Brian Jabarian: Alex, I think, has a lot of anecdotes on how many students are using AI-
Hal Weitzman: Even here at the University of Chicago?
Brian Jabarian: Oh yeah. Yes, absolutely. I remember walking through the classroom and just seeing people constantly on ChatGPT, basically. So that's, I think, one of the anecdotes. Another one is that, actually, with Alex, we started trying to see whether non-experts, humans, could detect AI-generated content with a lab experiment at Mindworks, basically, in Chicago.
Hal Weitzman: Mindworks is our lab in downtown Chicago, yeah.
Brian Jabarian: Yeah. And people were just so bad, basically, 50% flipping a coin to detect to know. And so, I mean, now that was quite something, and...
Hal Weitzman: So people are bad at it, but apart from academic integrity, obviously we want our students not just to have their stuff written by an AI. But why should we care?
Brian Jabarian: Oh yeah, absolutely. I mean, one of the key reason that we should care about whether I am the one speaking or writing is that you want to be able to assess my... or any human performance. So to assess any human performance, you need to know what's coming from me, what's not coming from me, basically. So that's the first thing. The second thing is that the economy is running a lot on human documents. Legal documents being signed, you want to be sure they're from a human. You use Google Maps, I guess, to know where you should eat. You trust the Google reviews because you assume they've been written by your fellow humans, etc, etc. So the economy is running a lot on human-generated content. So you want to be sure that you can ensure the integrity of this human generation.
Hal Weitzman: Okay. You've convinced me that it's important. Although now I think about that, legal document, if I get a legal document, there's all sorts of places now where you can buy cheap legal documents. Presumably, they're written to pretty much a standardized format.
Brian Jabarian: Oh, I mean-
Hal Weitzman: It's not bad?
Brian Jabarian: It's not that bad. I've been talking to a lot of my lawyers friends as well, and any laws, any legal-
Hal Weitzman: Well of course the lawyers aren't going to like it.
Brian Jabarian: No, it's not that they're not going to it. It's just like a legal situation is highly specific to your context. So you need-
Hal Weitzman: [inaudible 00:04:00] not a standardized contract [inaudible 00:04:03] specialized.
Brian Jabarian: Yeah, for sure. It's not part of the contract, etc., for sure. But at the end of the day, you want to be sure, like your institutions are guaranteed by some kind of trust, and this trust is ensured by the fact that you know whether it come from an AI or not. We're not saying that we don't want AI-generated content. We're saying we want to know when it comes from AI-generated content.
Hal Weitzman: I see. Okay. So what have you found out? Tell us what your headline is here about AI detection software.
Brian Jabarian: Yeah, so this type of facts have been generated a lot of demand for this AI detection. People being bad on their own, so we need AI detection. And the problem we had so far is that, well, most of the leading AI detectors are actually commercial private ones, and everyone is claiming to be the best in town. So basically, with Alex, we said, what if we just run a benchmark to understand who is actually the best in town, if there is one, and there is one, to our surprise, because so far, a detection has been really bad, not working that well, but they are actually working super well, like Pangram, for instance, which is the AI rocking... All the type of statistics we've been checking in this paper is very impressive, for instance.
Hal Weitzman: Okay, so it does work?
Brian Jabarian: Absolutely.
Hal Weitzman: Is the one finding. Another one is, they're not all the same.
Brian Jabarian: No.
Hal Weitzman: So if you just Google best AI detection, it's not going to get you the best one?
Brian Jabarian: No, don't.
Hal Weitzman: What happens if you do?
Brian Jabarian: Well, you will go on the web page, like I did, to any type of AI detection claiming they have 99% of success, and you have no way to know it except if you run this type of benchmark.
Hal Weitzman: Okay. All right. So there's a big diversity of results. And one of the reasons is because it's challenging to build an AI detector. It's hard to get what we would call a low false-negative rate. They're catching AI text, and a low false positive rate, not falsely accusing humans who've actually written their hard-earned writing of being AI. It's hard to do both of these at the same time. What's the trade-off between those?
Brian Jabarian: Yeah, so the first thing is that, well, AI writing come from large language model, which have been trained on human texts. So because of that type of training, the AI-generated text carried this type of cues and human style. And now you have the human writing as well. So both sets, human writing and AI writing, well, they overlap sometimes, which is why you can have both worlds at the same time, a low false-negative rate and a low false-positive rate because you have to make a concession which one of the two risks you're more willing to tolerate, basically. And each of these detectors has been building its own optimized algorithm for detection, which is unknown to users. You don't know how they built it. I mean, it's a classifier, it's a filter. They look for style of writing that most likely appeared in AI writing or human writing, sure. But the exact detail are known to users because they're private, and you don't know the data training they've been using.
So some detector might be good for the text you want to try, but you don't know. They don't tell. If they tell, well, it's a marketing communication. So you're a decision-maker, and now you have to make your choice between different types of AI detectors, and you want to know which one you should use, basically. And this is also why we proposed this policy framework to help people without having access to the machine, how they can make a informed choice for universities, or on social media, whatever you want to regulate.
Hal Weitzman: Okay. So as you said, you looked at these four different detectors, Pangram, Originality.AI, GPTZero, and one called RoBERTa. And as you say, there's loads out there. So these are four of the ones you looked at. How do they approach this task of AI detection, these different platforms?
Brian Jabarian: They approach it with the same goal, which is providing the users the tranquility of mind when they are going to use the detectors, and telling you whether it's an AI or a human text. So at the end of the day, most of them are trying to give you like a binary classification, this is AI, this is not AI. And to come to this classification, they're using an internal scoring rule, which says after a certain degree of resemble, I'm going to say it's an AI or a human. This scoring rule is dependent on the algorithm that they're using and developing-
Hal Weitzman: Which they're not going to tell us about obviously because then we would start to build an AI that would get around it?
Brian Jabarian: Well, the RoBERTa one, for instance, is open source. It's not the leading one. It's one of-
Hal Weitzman: RoBERTa? Right.
Brian Jabarian: Yeah. But it's really bad, basically. So it shows why you need [inaudible 00:08:44] detectors-
Hal Weitzman: Yeah, you found RoBERTa was the worst, right?
Brian Jabarian: Absolutely.
Hal Weitzman: And Pangram, as you said, was the best. So what made Pangram so effective?
Brian Jabarian: I mean, the way we have been testing their robustness is as following: First, we have been collecting texts that we know were written by humans without any doubt. So we took texts before 2020, which was before GPT-2 and 3 arrived, and we used the diversity of texts to resemble a bit the economy. So like short text, Amazon reviews, genuinely written by humans, short text on Google Maps, etc., and novels, essays, blog, whatever. So that was our human sample. And then we generated copies of those type of human text with each of the different LLM models, so GPT, Claude, etc. After that, we just basically tested each detector and seeing which one was the best in term of accuracy, but also in term of this false-positive rate and false-negative rate.
Over the overall sample, Pangram is the one that is always ranking first. But then we did two robust test check, like stress-test because each of these AI detectors require a minimum number of words to be able to function. It can vary, 50 words, 30 words, 70 words, whatever. But a huge part of the digital economy nowadays run on short text that we call stubs, like Twitter is 180 characters. So we want the AI detectors to be able to catch AI-generated text and not wrongfully accuse human of using AI in this type of environment as well. And Originality.AI, for instance, broke, didn't run properly, so we couldn't even report the result from Originality.AI. Only Pangram constantly showed near zero all the time, false-positive rate and so false-negative rate, so that was very impressive.
The other type of risk is that, well, the students, but also everyone now is adapting their writing style and the writing strategy to this type of AI detection. So they are using what is called humanizers, which are AI tools that help users to cover up the fact they've been using AI by changing the style to mimic a human text. So it's like this race between the human detectors and other humanizers like StealthGPT. And here there is actually a more diverse ranking where Pangram is still ranking very well, top, first in term of false-positive rate and false-negative rate. But now, Originality and GPTZero switch places, depending on the type of text, basically.
Hal Weitzman: Carry the Two is the show that pulls back the curtain to reveal the mathematical and statistical gears that turn the world. Co-hosts Sadie Witkowski and Ian Martin bring unique perspectives from the fields of mathematics and statistics to convey how mathematical research drives the world around us. With each episode tackling a different topic, subscribe to Carry the Two, part of the award-winning University of Chicago Podcast Network.
Brian, in the first half we talked about your research with Alex Imas into AI detection. I know that you and I both tried to change our writing. It's amazing how one of the effects that AI has had is that it's changed the way that we write. I was a journalist for most of my career, and I got used to writing em dashes, the long dashes, and now I don't like to write them because it looks... I have it in my head that that makes it look AI.
Of course there's a backlash against that backlash, but there's certain words. My kids know... they have a joke that the word "delve" is something that only a ChatGPT uses. No one uses in real life. So we're changing our language because of AI, and that's part of this humanizing thing that you're talking about where we're sort of trying to get around AI by being more human than we really are or something. It's just an interesting development because, at the same time, the technology is getting better and better, and it's better able, presumably, to humanize, and generative AI in general is getting better and we're going to use it more and more. So I don't know, where do you think it's all going?
Brian Jabarian: Yeah, thanks for asking. I mean, I'm French, so I learn English, also, in a specific way. So "delve" was a fancy word that we should use during our essays, and the em dash was also [inaudible 00:13:22]-
Hal Weitzman: Probably how it found its way in.
Brian Jabarian: Yeah. I 100% agree with what you're saying. First of all, I want to make clear that AI detection is not about preventing the AI adoption. If this is what they're going after, this is a lost cause, and they're not going after that. Good for them, but good for us as well. The point is just we want to be transparent in our way of using AI. No one is saying you shouldn't use AI. We just want to know how you use AI so we can know whether you should be paid that way or that or the way, basically. So there is this highly anticipated development of watermark to understand whether the word came from an AI machine, for instance, which is quite hard to put in place, but also, in which way the AI was used.
Did you start generating a document yourself as a human, and then putting through Grammarly, let's say, to just fix some grammar, like I'm doing all the time, or did you just copy-paste directly an essay from ChatGPT and trying to get an A, for instance? Those are two complete different situations, and they don't warrant the same type of policy response.
Hal Weitzman: To go back to your earlier example, what if I generate a legal document, I read through it, and I tweak a couple of words? Now, presumably that could still get flagged as mostly written by AI, but I have gone through it diligently, I've been very careful, I've edited it. Haven't I done what I'm supposed to do? I mean, I've used it to write a first draft. I've tweaked that, but maybe haven't tweaked it very much.
Brian Jabarian: Absolutely. And the limit, of course-
Hal Weitzman: If I put good inputs in, after all, the first draft would be pretty good. So what is the point of AI, this sort of AI detection? Is it-
Brian Jabarian: This is why, with Alex, we proposed this policy framework to basically capture... So of course we all want to be able for the AI detection to trace the way you have been using AI in the way you describe it, that's for sure, but it's quite challenging for now. We hope it'll come soon. Depending on the type of environment you are, you want basically to decide which type of false-passive rate you want to enforce as your policy. In the classroom, or in the legal setting, you don't want to falsely accuse someone of having used an AI because that could end their career. So in that case here, you can set a very conservative false-positive rate, which would mean for now letting through some kind of usage of AI in the writing, like the one you said, basically, having used AI for grammar check, et cetera. Well, you won't catch those people, but that's okay because your risk that you care is about not firstly accusing someone.
That's okay for legal environment and academic environment, but on Twitter, let's say, for instance on social media, I think you want the reverse, which is like you don't really care about falsely accusing someone using AI on social media, you want to be sure that you detect all the AI bots basically generating all the time non-authentic content there.
Hal Weitzman: Yeah. But you do highlight on the other hand that there's a taboo. It's sort of an accusation in many different environments, right? There's something to seem to be not positive about using AI, even for the first draft. The difference you described between the person who writes something themselves, then uses Grammarly, that's somehow better than somebody... Even if the product is the same.
Brian Jabarian: I mean, the taboo, I 100% agree with you, and I think this is what we're trying to move away from, this conception of AI detection as being the cop trying to catch people using AI. This has to stop because this is not the point of AI detection, in my opinion. I mean, if it's that, then what the point? People are using more and more AI. I think what we just want to be able to do, but again, this is just the beginning, and we need more effort. And Pangram, for instance, already tells you within your documents where it is more likely that you use the AI, and where it's less likely that you used AI. So they have this curve saying you at the beginning you probably used AI in the middle, less, et cetera. So this is already a step towards how AI was used during the writing.
But for sure, what we want is being able to trust the entire way of having used AI through the process. There is nothing wrong generating your first draft with an AI if you're going to work through it. Basically, what we want, I think, is being able to capture the effort in some way, or the value added by a human. We are talking a lot about AI adoption in two ways these days. So in one is AI augmentation, and another one is AI automation. How do you classify these two type of adoption if we don't know whether the document you've been giving me, you used AI in a way to augment your writing, or just completely replace your writing? That's what we want to know, because in one case we know how you use it. In other case, well, you didn't do anything, basically, except click on a button. We want to know that. That would count as AI automation. So even to know basically how AI is being adopted, we need better measures on this type of traceability and processes.
Hal Weitzman: One of the reasons, I guess, the taboo stops people admitting it once we get... If we are to, which I assume at some point we will get over that taboo, then the whole idea of AI detection might fall away because, who cares?
Brian Jabarian: Yeah. I mean, at some point maybe they will change their name and saying AI traceability, or AI watermark, or AI transparency. I think the name would be just to know how people use AI. Alex had this other paper about how people under-report, under-report AI usage, and-
Hal Weitzman: Yeah, which we covered on the podcast.
Brian Jabarian: Yeah, exactly. And this type of survey, which are rare, by the way, in the survey landscape, because most of the survey we just asked here, that was a nice way of showing this type of taboo, and this type of taboo and fear of reporting will go away if we just say, this is fine. This is societal discussion. That's why this policy framework is truly like some kind of welfare economics. Those risk is truly about social risk. Which one are we okay with? Which one are we not okay with? If for a classroom you're saying, "I want you to write an essay, but I don't want you to start generating the first draft in AI. I want you to think about it on your own." Coming up with some part here and there, it's part of the process of thinking and becoming a PhD student, or a master's student.
Well, that would be your policy. But if you're telling me, for instance, in legal environment, most of the case are like standard case, let's generate the first one with AI, and having the human lawyer in the loop validating it, why not? But those are discussion that we need to have as a society, basically. Each cases will run any type of policy caps that we propose, some will say we really need to be careful of first accusation, and the other one, we really want to make sure we catch all the fish, basically, of this AI-generated writing.
Hal Weitzman: Brian Jabarian, fascinating, as always, to talk to you. Thanks very much for coming on the Chicago Booth Review Podcast.
Brian Jabarian: Thanks a lot for having me again, Hal.
Hal Weitzman: That's it for this episode of the Chicago Booth Review Podcast, part of the University of Chicago Podcast Network. For more research, analysis, and insights, visit our website at chicagobooth.edu/review. When you're there, sign up for our weekly newsletter so you never miss the latest in business-focused academic research. This episode was produced by Josh Stunkel. If you've enjoyed it, please subscribe, and please do leave us a five-star review. Until next time, I'm Hal Weitzman. Thanks for listening.
Your Privacy
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.