The Thinking Behind the LLM

How Booth’s Bryon Aragam is working behind the scenes to improve and understand AI models

Here's an experiment: do not think about elephants. 

What came to mind? More likely than not, you’ve conjured up the image of a trunk and grey papery skin. Beyond a little bit of contrarianism, the brain is primed to process and work within its immediate context. Thoughts and ideas do not exist in a vacuum; surroundings and other miscellaneous details shape how the subconscious works.

What happens when AI is asked not to think about an elephant and then prompted to generate content? With the emergence of customized and highly-attuned Large Language Models (LLMs), developers are looking for ways to build models that capture more nuanced thinking patterns. Can the intricate algorithmic web behind the LLM mirror the neural networks that power human thinking? 

Let’s look at another exercise: imagine you are talking to a friend about Chicago. Unprompted, they ask “Where is the Eiffel Tower?” You’d likely answer “Paris, France.” The question might feel random, but your brain knows to brush aside irrelevant context (like facts about Chicago) to retrieve the correct answer. This type of logical information filtering is a natural human brain process that has evolved over time. For LLMs however, this process is less intuitive. Models can get mixed up, allowing earlier context to bleed into their answers in a way that humans never would.

These quirks fascinate Chicago Booth’s Bryon Aragam, Associate Professor of Econometrics and Statistics and Robert H. Topel Faculty Scholar. Aragam aims to understand the way LLMs are wired, running experiments on model behavior from simple fact retrieval to semantics. He even tries to confuse LLMs by feeding them irrelevant information—a process coined “context hijacking”—to test their ability to sift through details and evade distractions when generating answers. And while researchers like Aragam observe the striking similarities between how LLMs and humans categorize words and ideas, these nascent models still make very simple mistakes (like calling the Windy City home to the Eiffel Tower).

Fed different data and coded with varied basic functions, Aragam has tested multiple different popular models including GPT, Gemma, and LlaMA. Most models routinely ace simple factual recall questions. Yet, slight changes to the prompt produced unexpected results. Prepending the statement “the Eiffel Tower is not in the city of Chicago” before asking the model to identify its true location led all four models to incorrectly answer “Chicago, IL.” Intentionally misleading prompts—even seemingly harmless ones like snippets from the Chicago Wikipedia page—could cause ‘hallucinations’, or false answers to simple questions.

An LLM naming Chicago as home to the Eiffel Tower is not hugely consequential; a simple browser search will point to the right answer. Yet, LLMs are increasingly deployed for much more high-stakes and complex tasks, such as confidential document audits in law firms or electronic medical records analysis in hospitals. In such settings, the ability to distinguish relevant from irrelevant context could undermine audits and compliance checks, or hinder crucial stages of legal discovery.

Context hijacking demonstrates the fragility of current models and highlights challenges in deploying self-acting AI agents at scale. Operating ideally, an LLM could navigate any new context, integrating prior knowledge without losing the strength of its reasoning. This challenge is the focus for many machine learning and AI researchers, and Aragam holds strong on his dedication to understanding the inner workings of LLMs.

Models Making Meaning: LLMs & Semantics

Less skilled at reading between the lines, some models are unexpectedly adept at understanding semantics and how ideas, words, and concepts are associated and follow a hierarchy.  Models have been carefully trained to categorize language, a fundamental cognitive process in humans.. For instance, when asked whether a tomato is more similar to a carrot or a strawberry, models correctly group it with other fruits rather than vegetables. This semantic clustering goes far beyond rote memorization; it implies an internal map of conceptual relationships. 

As early as childhood, categorization is used to make sense of the world. Grouping objects, ideas, and experiences into meaningful clusters can fast track understanding and guide decision-making. For LLMs, mirroring human-like semantic organization is critical to long-term functionality. 

To understand the hidden logic behind these semantic relationships, Aragam and his team turned to the embeddings used by LLMs to generate text. One of the key steps in training LLMs is creating token embeddings, which are the vectorized representations of words. Statistically speaking, these embeddings do not need to have geometric or semantic meaning; miraculously, they do. Constructing simple categories— take ‘fruit’ versus ‘vegetable’—is an inherent task.

In other domains, this task becomes much more abstract. How, for example, might a model identify whether a statement is legally binding? How to associate medical symptoms with specific illnesses, or whether a financial transaction seems normal versus fraudulent? Without reliable categorization, models risk collapsing distinctions, and producing incoherent or unsafe outputs. With it, they demonstrate the beginnings of structured reasoning.

Into the Human World

Understanding flaws in language model logic is vital to pushing AI agents into the human realm. Stress-testing LLMs to find holes in their reasoning continues to be a focus for Aragam, who likens current models to a fresh college graduate. New grads usually possess an incredible store of knowledge and can even surprise supervisors with novel insights, but simultaneously make naive errors and need training and supervision before being granted full independence. Similarly, AI agents require careful finetuning, testing, and monitoring before being left to their own devices. By continuing to probe and understand the structure behind these models, the path from “fresh graduate” to “trusted professional” may be a reality for not only us, but LLMs alike. 

More from Chicago Booth