Expectations are this really central object in economics, particularly in macro and finance. But this raises a pretty fundamental challenge for our field because expectations aren’t something we directly observe. I can’t see what’s going on inside of your head when you’re making decisions under uncertainty.
Now, historically, one of the best ways we’ve had to solve this problem is by assuming rational expectations. And rational expectations essentially says that agents don’t make systematic mistakes; they make the best forecasts with the information they have available. Now, what’s great about that assumption is it sort of solves this measurement problem; it kind of cuts through this Gordian knot by tying people’s expectations to realized outcomes. But the problem is if you go and actually look at people’s expectations, you find that they’re often very far from rational, right? And this is something if you sort of self-reflect a bit, I think you can probably see in yourself as well.
Now, I started writing this paper in November of 2022 when ChatGPT came out. And what we had with ChatGPT seemed to be this machine that responded to information in a really humanlike way. So what I wanted to do in this paper is say: If we take this idea seriously, if we take seriously the idea that LLMs respond to information like humans do, that they sort of mimic our behavior and our beliefs, can we use these machines as a new source of data, as a way to generate expectations to help us move beyond rational expectations and try to engage with richer behavioral theories of the world?
So what I did was actually pretty simple. I took a particular instance of one of these LLMs, in this case just GPT-3.5, and then I took a bunch of samples of random articles from “The Wall Street Journal.” So I took about 300 articles every month from “The Wall Street Journal.” I then took the headline from each of those articles and fed them through the LLM one by one. So I gave the LLM a headline and I asked, what do you think is going to happen to, say, industrial production growth based on this piece of news? Do you think there’s gonna be an increase, a decrease, or are you uncertain?
So I took these very granular responses that the LLM gave me for things like industrial production growth, CPI, the S&P 500 Index, and I aggregated those up every month, and so I formed this time series of what I call generated expectations. And that’s sort of the main object I used for the rest of the paper.
Now, there were a couple of things that the LLM did very well. So the first is if you took this time series of generated expectations and you tried to look at how it relates to, say, our best existing survey data. So we have some pretty good surveys of human return expectations over the last, say, 20 years. And if you look at the correlation between the generated return expectations and what we have in the surveys, it’s very high, right? So it seems like the LLM is sort of picking up on the same things that the humans are.
Now, this goes beyond returns, though. You can look at, say, macroeconomic expectations. So one of the classic surveys in this literature is the survey of professional forecasters. And if you look at, say, the LLM’s generated expectations for industrial production growth, those are very strongly correlated with what you see in the SPF, and this holds for a pretty broad range of surveys.
So first, I guess the big thing is that the LLM seems to match the time-series variation and what we have for the surveys. But what I find goes beyond that, not only does the LLM match this time-series variation; it also seems to make a lot of the same mistakes that the humans do, right? So this is really important. If I wanna kind of move beyond rational expectations, I need to not only pick up the correct components of the expectations; I need to pick up the errors as well.
So let me give you an example. So one fact we know about a lot of human-based return expectations is that they’re negatively correlated with things that do a good job of forecasting the market. So something like log dividend price ratios. What this suggests is that human-based return expectations are often wrong, right? When I think the market’s gonna go up, it actually ends up going down. One thing I wanted to ask is: Does the LLM make that same mistake? If I look at the generated expectations for the LLM, does it exhibit the same negative correlation with things that actually forecast the market? And I find that indeed it does.
One last thing I guess I’ll highlight is I mentioned that I formed these expectations at a very granular level, right? I formed them at the article level, OK? So something I do in addition to asking the LLM to give me its forecast is I have it give me an explanation of why it thinks the S&P 500 is gonna go up based on this piece of news. So something I can do with that explanation is I can do sort of a systematic analysis to try and understand: Where did these mistakes come from? So something I find with the return expectations is that the LLM doesn’t seem to have a very good understanding of general equilibrium. It doesn’t seem to understand that if there’s some good cash-flow news today, that should show up in prices today, not in the future. And so this mistake in the LLM’s reasoning can help us to understand where this sort of disconnect from objective measures of expected returns might come from.
One thing that’s really cool about this approach is you can basically use it to form expectations data wherever you have texts, right? So one thing I do in the paper is I take a sample of news over the last 120 years, OK? And I use that sample of news to form generated expectations over that entire period. So this is one of a very long time series, about 120 years of expectations, but it’s also extremely high frequency.
So this is far beyond what we’re able to do with existing survey measures. Most surveys are only really available over the past 20 to 40 years, and they’re generally only available at, say, the monthly frequency. So I formed this much longer time series of generated expectations at this very high frequency. And one thing I wanted to do with this is ask: Can we use this approach to sort of engage with some of these richer behavioral theories, particularly in asset pricing?
So one of the big behavioral stories that often comes up in asset pricing involves bubbles. So there’s this idea that people’s beliefs deviate from rationality, and as a result of these deviations from rationality, prices deviate from fundamentals. So this is the classic irrational exuberance story. The problem, though, is that this is a very hard thing to test. There’s something called the joint hypothesis problem. And this is really a representation of that. You need some way of actually measuring people’s beliefs or their preferences if you wanna test these behavioral stories around bubbles.
So given this really awesome data set I built, I thought, hey, maybe I can try to engage with these theories. Maybe I can try to make some progress on this problem. So what I do is I take a sample of industry-level portfolios over the last 120 years and I define what I call a run-up. So a run-up is this period where the industry’s return has been going up a lot over the last, say, two years. So I define these run-ups and I wanna ask: Can I use my measure of expectations and in particularly the sort of systematic errors that I extract from those expectations, to predict which of these run-ups is gonna keep going up and which of these is going to crash? So can I basically ex ante time a bubble?
So what I do is I take my measure of systematic errors, my measure of sentiment. I compute an asset’s beta, an industry’s beta with respect to that measure of sentiment. Basically, the correlation between the industry’s return during the run-up and the sentiment series. And what I find is that those sentiment betas do a very good job of forecasting which of these run-ups is gonna keep going up, and which of these is gonna crash, far better than anything we’ve already studied in the literature.
So one of my colleagues here, Eugene Fama, as you know, basically some years ago put out sort of a call for actual ex ante evidence for the existence of bubbles, and I think what I found in my paper is exactly that. This is very concrete evidence that you can time bubbles prior to the actual crash, which is something that we’ve had a very hard job doing up until this point.
You look at the history of scientific innovation, and generally the big changes come when you have some new data or some new way of looking at the world. And what I think we have with these LLMs is a new way to sort of peer inside of human minds and sort of understand where our beliefs actually come from. And I think that really opens up a lot of potential for just revolutionizing how we do economics.
Now, I guess what worries me is that this stuff is so new. There’s a lot we don’t know. There’s a lot of unknowns around these LLMs. And so we in economics really need to be engaging with people in industry. We need to be engaging with people in the computer science community to best understand how these methods work. So we can really establish a strong foundation for this generated expectations approach such that it can be sort of a cornerstone of the economics tool kit going forward.