by Emily Bembeneck and Manav Chaudhary
AI + Finance: Booth Research Advancing the Field
The field of finance is seeing research advances driven by AI, and Booth faculty are key figures leading this work. There are two main areas of machine learning that are contributing to the progress we are seeing in finance. Below, we will give a brief overview of these techniques and the specific contributions of Booth faculty in each area.
High-Dimensional Problems in Finance
The first technique that underlies some of this new research involves machine learning’s ability to both harness insights from high-dimensional data and to reduce dimensionality in order to make very complex data more manageable for practical applications.
Stefan Nagel, the Fama Family Distinguished Service Professor of Finance at Chicago Booth, and University of Maryland co-authors Serhiy Kozak, PhD ‘13, and Shrihari Santosh, PhD ‘13, apply machine learning techniques to a classic high-dimensional finance problem. Ever since Eugene Fama and Kenneth French introduced their Nobel-winning work that used a factor model to reconcile return anomalies, there has been a proliferation of factors as researchers and practitioners attempt to explain one anomaly or another. At this point, there are hundreds if not thousands of these “factors” or “characteristics” used to explain differences in average returns between stocks, but using traditional methods, it is only feasible to focus on a small minority. In two papers, Nagel et. al. use machine learning to both harness insights from this high-dimensional setting, and also to reduce the dimensionality of the problem in theoretical work. In the first paper, the authors propose an (augmented) elastic net method which finds the most efficient combination for explaining returns, using all available factors and characteristics. In the second paper, Nagel et al. take a more theoretical approach and show that, assuming investors don’t leave (too much) free money on the table, only a handful of statistical factors are needed to predict observed returns.
"A natural question an investor may have is, given hundreds of such factors, how to best go about predicting the cross-section of stock returns. This a classic high dimensional problem where ML techniques shine."
In additional theoretical work, Nagel and co-author Ian Martin (LSE) have developed a model to understand the behavior of asset prices in an economy where investors rely on vast amounts of real-time data to learn about their target firms—an apt description of the investment problem faced in the real world. They demonstrate that investors can use machine learning techniques to learn as much as statistically feasible, leading to real-time efficient asset prices. However, because investors can't assimilate all information, markets may appear inefficient when examined by an econometrician at a later time, even though they are real-time efficient.
Dacheng Xiu, Professor of Econometrics and Statistics at Chicago Booth, along with co-authors Bryan Kelly (Yale) and current Booth student Shihao Gu, use machine learning techniques to harness high-dimensionality in order to identify non-linear relationships. Traditional econometric methods assume simple functional forms such as linearity, because otherwise estimation would be infeasible with these tools. Xiu et al. are able to show that by using neural nets and regression trees from the machine learning playbook, they can generate considerable predictive gains for asset pricing.
Natural Language Processing Applied to Finance
The world of finance generates enormous amounts of text, audio, and image data. Only recently with the advances in machine learning have we been able to analyze this data in a systematic manner. Data of this kind includes shareholder letters, AGM calls, Federal Reserve meeting minutes, government regulations, etc. We are now making progress on understanding how this data can be harnessed and applied to generate new insights in finance thanks to recent advances in natural language processing.
Booth faculty Rahuram Rajan, Katherine Dusak Miller Distinguished Service Professor of Finance, Luigi Zingales, Robert C. McCormack Distinguished Service Professor of Entrepreneurship and Finance, and current Booth student Pietro Ramella used a BERT model (the same model that underlies ChatGPT) to analyze over 8,000 shareholder letters from 1955 to 2020. They find that the goals corporations purport have changed significantly in recent years, both in quantity and quality. Related work by Will Cassidy, PhD ‘23, now Assistant Professor of Finance at WashU, uses a Latent Dirichlet Allocation model to identify White House climate policy announcements, showing that stocks return a significant climate policy risk premium.
"We show how language models, including transformer models that feature prominently in large language models such as BERT and GPT, can handle numerical information, and in particular holdings data to estimate asset embeddings."
Ralph Koijen, AQR Capital Management Distinguished Service Professor of Finance and Fama Faculty Fellow, along with co-authors Xavier Gabaix (Harvard) and Motohiro Yogo (Princeton), have also produced innovative recent work by building on advances in large language models. They use the technique of generating “embeddings” to better understand relationships between firms. Large Language Models like ChatGPT are able to work so well partially due to embeddings, the process by which they learn the semantic relationships between words. Koijen et al. apply this technique to portfolios, thereby generating an asset embedding model, which can be applied to traditional use cases such as firm valuations, return co-movement, and asset substitution patterns. They further show that asset embeddings outperform traditional stock characteristics used for these purposes.
In our final example, Bryan Kelly, former Booth faculty and current Professor at Yale, and co-authors developed a regression model for text. This model allowed a researcher to use underlying economic or financial data to predict words in a textual document, such as a financial news article. Once the model has been estimated, it can also be reversed, allowing text to predict economic variables. Kelly et al. apply this method to both forecast and backcast economic variables using newspaper articles in combination with traditional variables. For example, by training a model on relationships between news articles and the ICR, an asset pricing measure created in 1970, the model can then be used to backcast the ICR as far back as 1920, using newspaper articles.
Booth faculty are continuing to lead the field of finance forward by applying new methods and techniques from the field of machine learning to both solve old problems and generate new ways of understanding finance.
Cassidy, W. (2023). Elections Have Consequences: The Impact of Political Agency on Climate Policy and Asset Prices [Working paper]. University of Chicago Booth School of Business. https://williamcassidy.com/files/ElectionsHaveConsequences.pdf
Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56. https://doi.org/10.1016/0304-405x(93)90023-5
Gabaix, X., Koijen, R. S. J., & Yogo, M. (2023). Asset embeddings. Social Science Research Network. https://doi.org/10.2139/ssrn.4507511
Gu, S., Kelly, B. T., & Xiu, D. (2020). Empirical asset pricing via machine learning. Review of Financial Studies, 33(5), 2223–2273. https://doi.org/10.1093/rfs/hhaa009
Kelly, B. T., Manela, A., & Moreira, A. (2021). Text selection. Journal of Business & Economic Statistics, 39(4), 859–879. https://doi.org/10.1080/07350015.2021.1947843
Kozak, S., Nagel, S., & Santosh, S. (2018). Interpreting factor models. The Journal of Finance, 73(3), 1183–1223. https://doi.org/10.1111/jofi.12612
Kozak, S., Nagel, S., & Santosh, S. (2020). Shrinking the cross-section. Journal of Financial Economics, 135(2), 271–292. https://doi.org/10.1016/j.jfineco.2019.06.008
Martin, I., & Nagel, S. (2022). Market efficiency in the age of big data. Journal of Financial Economics, 145(1), 154–177. https://doi.org/10.1016/j.jfineco.2021.10.006
Rajan, R., Ramella, P., & Zingales, L. (2023). What purpose do corporations purport? Evidence from letters to shareholders [Working paper]. University of Chicago. https://www.nber.org/system/files/working_papers/w31054/w31054.pdf