The risks facing a company can be hard to ascertain in a typical earnings call transcript. After all, managers aren’t keen to highlight their challenges. “Corporate risk exposures are often subtly implied in conference call discussions rather than explicitly stated,” write Chicago Booth PhD student Alex G. Kim and Booth’s Maximilian Muhn and Valeri Nikolaev.

But the researchers find that generative large language models, a form of artificial intelligence that powers chatbots such as ChatGPT, can be used to detect elements of risk. Their research suggests that an LLM can make inferences by using the vast amount of information it was trained on to pick up on complex and nuanced relationships between statements scattered throughout a transcript’s text. Not only that, Kim notes, but “LLMs can even understand the concept of new risks, such as AI-related risks.”

In the past, a number of different researchers have tried to pull insights on corporate risk from earnings call transcripts using natural language processing, with limited success. In this type of text mining, an NLP algorithm searches the transcripts for specific words related to a particular risk topic. It then counts risk-related terms such as uncertainty or liability that appear near those words. The higher the count, the more risk. But executives, knowing that investors are analyzing their words, may choose their language carefully. Additionally, an NLP algorithm generally cannot understand the context of what is stated and can miss relevant words that don’t fall close to the risk topic.

Recommended Reading

The researchers used ChatGPT to analyze nearly 70,000 quarterly earnings call transcripts from almost 5,000 publicly traded companies. The transcripts covered January 2018 to March 2023, during which time there were significant changes in the uncertainties around the three types of risks the researchers focused on—political, climate, and AI.

At the time of the research, the cost to use OpenAI’s GPT-3.5 Turbo as the LLM was a mere 5 percent of what it would’ve cost to use GPT-4, and both models yielded similar results in a small sample tested. Notably, the cutoff date for GPT-3.5’s training was September 2021, which allowed Kim, Muhn, and Nikolaev to use the quarterly earnings call transcripts from January 2022 to March 2023 to reexamine their results, since there was no possibility that the LLM had been trained on any of the transcripts during that period.

For each earnings call transcript, the researchers used the LLM to create two quantitative measures: a risk summary of the transcript and an assessment of the risk. For the summary, the researchers only permitted the LLM to use information from the transcript itself, lifting this restriction for the assessment. Each risk summary and risk assessment was calculated separately for the political, climate, and AI-related topics.

The researchers then calculated a ratio of the length (in words) of each risk summary and risk assessment scaled by the length of the transcript they were based on. The higher the ratio, the greater the risk found.

They compared the results with those found using the keyword-based search algorithm. Then, to up the ante, the researchers used the results of their risk summary and assessment measures to predict future stock-price volatility in the company’s stock following the earnings call.

When it came to explaining volatility, the LLM-based measures were consistently more informative than the keyword-based ones. The risk assessments, because of the LLM’s ability to leverage its general knowledge for them, outperformed the risk summaries, according to the researchers. The LLM’s results remained strong even for the time period after it was trained.

All this was true for both political and climate risks. There were no public NLP-based measures for AI-related risk available for comparison, which Kim, Muhn, and Nikolaev attribute to the recency of AI disruptions in the corporate world. But when the researchers analyzed only the last two years of their data, the AI measures became significant variables in predicting stock-price volatility, suggesting the LLM had some ability to capture even relatively new risks that were not commonly seen in its training data.

More from Chicago Booth Review

More from Chicago Booth

Your Privacy
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.