A single word in a news report—a well-placed “undervalue,” for example—can drive a company’s stock price up or down. Investors can benefit if they can figure out which words matter within a few days, research suggests.

Investors and researchers have suspected for decades that text could be used to predict markets, some trying and failing. But applying machine-learning techniques originated by computer scientists, Harvard’s Zheng Tracy Ke, Yale’s Bryan T. Kelly, and Chicago Booth’s Dacheng Xiu have built a model that in early tests outperformed a similar strategy based on scores from RavenPack, the leading vendor of news-sentiment scores. 

Traditionally finance researchers and market practitioners have relied on accounting data and fundamentals to predict where the market is headed. But quarterly reports arrive slowly for a market moving at warp speed, which led researchers and traders to look for other sources of predictive information, including news. To find out if news reports could be used to predict stock prices, Ke, Kelly, and Xiu borrowed machine-learning techniques used by computer scientists, who are increasingly training machines to understand text.

Efforts to predict market direction by parsing financial journalism date back to 1933, when economist and businessman Alfred Cowles III classified pieces in the Wall Street Journal as bullishbearish, or neutral to inform trading strategies. That didn’t necessarily work—Cowles’s theoretical portfolio would have underperformed the market by more than 3 percent a year from 1902 to 1929, the researchers note—but other people have continued to pursue the idea of extracting useful information from text. Among them, Northwestern’s Scott R. Baker, Stanford’s Nicholas Bloom, and Chicago Booth’s Steven J. Davis analyzed years of newspaper articles to identify words associated with economic uncertainty, and have used those words to inform dozens of uncertainty-related indexes.   

Some efforts to assess sentiment in text rely on preexisting dictionaries created for other purposes—such as the Harvard-IV Dictionary, a manually selected list of positive and negative psychosocial words, and the Loughran-McDonald Master Dictionary, developed to highlight meaningful words in financial texts and the sentiment associated with those words. The latter starts with word lists and uses US Securities and Exchange Commission filings to add terms relevant to the finance sector. For example, the dictionary added Scholes for the Black-Scholes modeling tool used with financial derivatives. 

Ke, Kelly, and Xiu created a model that essentially automatically generates a dictionary of relevant words and allows for contextually specific sentiment scores. Using supervised machine learning and a method that required only a laptop and basic statistical capabilities, the researchers analyzed more than 22 million articles published from 1989 to 2017 by Dow Jones Newswires. Classifying words as either positive or negative, the researchers generated article-level sentiment scores—to highlight how news likely to be perceived as positive or negative would impact stock prices.

Recommended Reading

The first step in the process involved screening articles for words frequently associated with positive or negative returns. “Undervalue,” “repurchase,” and “surpass” are good for a share’s price, and “shortfall,” “downgrade,” and “disappointing” are bad, the model establishes. Several of the most impactful words highlighted by the research, such as “repurchase,” don’t appear in the other dictionaries used to assess sentiment. Next, the model isolated and weighted terms most likely to be informative about a stock’s future price. Finally, it gave articles sentiment scores on the basis of the words assessed.

Some funds have likely been using natural language processing to trade for several years, with dubious success. A 2016 article in MIT Technology Review called analyzing language data to predict markets “one of the most promising uses of new AI techniques,” but one of the handful of funds it mentioned, Sentient, liquidated in 2018. The research by Ke, Kelly, and Xiu provides an academic framework for applying such processing to markets.

To demonstrate their model’s predictive capacity, the researchers devised a simple trading strategy to buy assets associated with positive recent news sentiment and sell assets associated with articles containing negative sentiment. The resulting portfolio outperformed a similar strategy based on scores from RavenPack, the leading vendor of news-sentiment scores. Returns didn’t begin to even out between the two until five days after an article’s publication.

More from Chicago Booth Review

More from Chicago Booth

Your Privacy
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.