AI brain and computer chip
Credit: cono0430/Shutterstock

Can a Machine-Learning Model Reason Like an Expert?

When US Airways Flight 1549 lost all power after hitting a flock of geese in 2009, Captain Chesley “Sully” Sullenberger’s background as a glider pilot helped him manage the aircraft and see landing opportunities that the air traffic controllers couldn’t. With insufficient altitude and airspeed to reach nearby airports, he gauged the potential to glide to the Hudson River as the passengers’ and crew’s best chance for survival. His successful water landing saved all 155 people aboard and illustrates how specialized knowledge can reveal hidden possibilities.

Research by University of California at Los Angeles graduate student Bingxuan Li, Purdue’s Pengyi Shi, and Chicago Booth’s Amy Ward suggests that a model they developed, which they dub FLAME, can similarly incorporate domain knowledge such as Sullenberger’s to ultimately produce better predictions.

Feature mining is the process of transforming raw data into meaningful variables that provide insights or enhance predictive power. This process often involves combining, filtering, or transforming existing data to create features, or variables, that capture information not readily apparent in the raw dataset. Such mining can help improve the accuracy and interpretability of machine-learning models.

For example, think about recruiters for professional sports teams. Thanks to their domain knowledge, they can look beyond a college athlete’s game stats or physical attributes and uncover other features—such as adaptability or mental toughness—that may better correlate with the player’s chance of success in the professional arena. They might determine the player’s adaptability by assessing performance changes with a different position or coach. And they might determine the player’s mental toughness by filtering the data and taking a subset of the stats to measure performance after errors.

FLAME uses large language models to do this mining—which, as Li, Shi, and Ward explain, can be particularly helpful in sensitive domains where data are limited.

For example, the researchers applied FLAME to an Illinois program that offers people convicted of minor offenses community-based rehabilitation instead of jail time. One challenge for the program is that it’s hard to identify who should be offered a spot. Many crucial factors, such as the level of long-term support participants can expect to receive from their social networks, are difficult to collect directly and ethically.

The program does have basic data about participants, however. These include demographics, education, housing status, criminal history, and referral source (as in, who recommended them for the program)—and whether they successfully completed the program, dropped out, or committed new crimes. This is where FLAME’s four-step approach came into play.

First, it analyzed how an experienced case manager deduced a client’s challenges and support needs, reasoning that the manager likely used factors such as employment status, education, and living situation, along with auxiliary contextual information available publicly such as the average income of different zip codes. The framework started with 40 examples of the manager’s deductions or decisions. Then it included some of these examples in a prompt, leading an LLM (GPT-3.5) to generate 3,000 synthetic training cases. After that, it fine-tuned the LLM on the expanded dataset to understand the complex data relationships. Finally, it applied this fine-tuned model to infer hidden features about individuals—such as whether they might benefit from substance use treatment, educational support, or mental health services—in order to improve downstream predictive accuracy.

AI recognizes complex patterns

The researchers’ FLAME model uses machine learning to uncover hidden insights in data and ultimately improve predictions. Below is an example of how the model can identify individuals convicted of minor offenses who would most likely succeed in a rehabilitation program. 

The incorporation of new support-needs variables with the model’s existing ones improved the framework’s prediction of the client’s risk level for committing a new crime, information that’s integral to determining eligibility for incarceration-diversion programs such as this one. It also outperformed traditional multiclassification approaches, including logistic regression and neural networks, among others. And the newly created variables were important in predicting whether the client would successfully complete the program.

Li, Shi, and Ward also applied FLAME to hospital discharge planning, using the Medical Information Mart for Intensive Care dataset, a comprehensive collection of deidentified patient records. Inferring factors not explicitly recorded in medical charts (for example, the availability of someone to take care of the patient at home), the researchers’ framework helped predict whether patients had been discharged to their homes or other facilities such as a skilled nursing facility. The addition of these factors improved prediction accuracy by an average of nearly 9 percent.

In both case studies, the framework proved valuable in high-stakes domains in which it’s important to be able to explain how a model arrives at its prediction. They conclude that FLAME offers two key advantages over traditional feature mining methods: It integrates contextual information, and, by emulating human reasoning, creates more interpretable results.

More from Chicago Booth Review
More from Chicago Booth

Your Privacy
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.