Computer and charts
Credit: ToeiiaB/Adobe Stock

AI Reveals What Investors Really Think About Stocks

Using transformer models on portfolio data, researchers uncovered relationships missed by traditional financial metrics.

Asset pricing models have historically relied on categories: Does a company have a large market capitalization or small? Is it profitable or struggling? Is it value or growth? These groupings help identify similar companies, but traditional financial metrics offer limited information and don’t explain the diverse reasons investors buy the same stock. For instance, one investor might hold Apple for its chip hardware, another for its app store, and a third because it’s a large-cap stock.

Harvard’s Xavier Gabaix, Chicago Booth’s Ralph S. J. Koijen, New York University’s Robert J. Richmond, and Princeton’s Motohiro Yogo used artificial intelligence to analyze investors’ decisions and differing motivations. Their research demonstrates how AI can extract information from portfolio holdings, which represent the culmination of the private research, expert calls, proprietary analysis, and investment insights that investors employ. Accounting data and text from financial reports only capture the publicly available subset of this information.

The researchers adapted transformer models (the same architecture behind Open AI’s ChatGPT) to financial data to create asset embeddings and investor embeddings, which are vector representations of companies and investment strategies, respectively. These vectors capture how investors group stocks into categories, including growth; sector exposure; environmental, social, and governance preferences; and sensitivity to macroeconomic issues—even when these investing themes aren’t explicitly mentioned in financial statements.

To understand embeddings, think about basketball star LeBron James. Traditional statistics indicate that he is a 6 ft., 9 in., 250-lb. forward. But fans know he can play point guard in some lineups, and center, shooting guard, small forward, or power forward in others. His role depends on who else is on the court.

Similarly, a company can play different roles for investors—in one portfolio, Apple is a growth anchor, in another it’s a defensive cash-flow play, and so on. The researchers’ model looks at the full financial lineup and generates a contextualized embedding for each stock based on its role in that portfolio.

The researchers used various AI techniques to analyze US equity portfolios from 2005 to 2022, including mutual funds, exchange-traded funds, closed-end funds, variable annuity funds, and hedge funds. The researchers didn’t train the model on all available portfolio data. They purposefully left out 10 percent of the portfolios so they could test whether their model could accurately predict how investors construct portfolios. Then they employed masked asset modeling, which involves removing some stocks from portfolios and using the embeddings to predict what was removed.

By tweaking embeddings to simulate how investors might behave under stress, institutions can model scenarios that have never occurred before.

This setup mirrors how language models predict missing words using surrounding context. Each portfolio is treated like a sentence, with its holdings ordered by position size. The model learns which stocks tend to appear together, but it also learns how the role a stock plays can shift depending on its neighbors. The embeddings reveal these identities using only portfolio data—not fundamentals, text, or price histories.

The researchers generated, examined, and tested the performance of embedding vectors of different sizes, covering up to 128 dimensions. To benchmark these new representations, they compared these portfolio-data embeddings to traditional financial metrics and text-based embeddings from leading AI companies including OpenAI and Cohere. Across several benchmarks designed to explain valuations, capture comovement, and predict portfolio holdings, holdings-based embeddings compared favorably to other methods used to characterize companies.

And because embeddings are just a collection of numbers, the researchers used modern large language models to understand why assets and investors cluster together. For instance, the researchers picked out the 10 companies that were most like Apple in terms of holdings-based asset embeddings, and then had LLMs find the commonalities in those companies’ earnings calls to determine why investors considered these stocks to be similar. Using fund prospectuses rather than earnings calls, they applied the same approach to uncover the connective tissue between investors with similar investor embeddings.

The findings open new investing frameworks for both money managers and researchers. For example, by tweaking embeddings to simulate how investors might behave under stress, institutions can model scenarios that have never occurred before—generating hypothetical shifts in valuation and exposure, explain Gabaix, Koijen, Richmond, and Yogo. They point to United Airlines and Zoom as two stocks exposed to COVID-19 risks.

As they note, this research bridges recent advances in AI with practical finance applications. Just as LLMs have revolutionized how we process text, these techniques could transform how we understand financial markets by aggregating investor decisions at a scale previously not possible. The same research team used embeddings based on holdings of institutional bond investors in order to detect credit risk earlier and more accurately than ratings agencies. (See “AI identifies early signs of ratings downgrades.”)

Financial markets encode more information than traditional analyses capture, both projects suggest. Every portfolio decision made by investors contributes to a collective intelligence about firm value and risk—revealing not just what investors own, but why they own it. While this information was technically always in the data, it has taken modern computing power and AI to process millions of holdings and reveal it.

More from Chicago Booth Review
More from Chicago Booth

Your Privacy
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.