Why AI Can't Yet Perform in the Operating Room

Digital image of operating room with AI equipment

Practicing surgeons and Booth researchers jointly investigates the limits and future of surgical AI.

Have you gotten your annual check-up this year? If yes, you may have scheduled it online with an AI assistant, or had the door opened by a mobile robot transporting supplies across the clinic, or your primary care physician asked whether you’re comfortable with AI taking visit notes. AI now has a presence in some of our most intimate spaces and interactions. 

Claims (and concerns) that AI will soon be taking over as medical practitioners or evaluating test results are buzzing around the internet. Yet, one lingering exception to this total permeation of medical AI—a space already situated apart from most other exam rooms—is in the operating room. This absence was glaringly obvious to Dr. Daniel A. Donoho, Pediatric Neurosurgeon at Children’s National Hospital and founder of the Surgical Data Science Collective (SDSC)

And it prompted the joint effort behind a new study that challenged state-of-the-art AI methods to detect surgical tools in medical images and videos. Researchers from the University of Chicago Booth School of Business led a study in tandem with medical practitioners from the SDSC and the Children’s National Hospital—both based in Washington D.C.—to analyze a large dataset of video imagery from neurosurgical procedures. 

Led by X.Y. Han, Assistant Professor of Operations Management, the Booth research team included Center for Applied Artificial Intelligence Predoctoral Researchers Kirill Skobelev, Eric Fithian, and Yegor Baranovski. 

Against popular convention, when it comes to medical applications of AI, more quantities of data by itself does not necessarily make for more accurate AI models. Quite the opposite, in fact. Massive amounts of generalized data weaken the ability of surgical AI tools to do things as simple as recognizing tools and equipment in images of surgical procedures. 

The notion that a giant model—think ChatGPT or Claude—is ready out-of-the-box to be slotted into any given situation is called the ‘foundation model paradigm.’ After all, these models have been fed the whole kitchen sink of available data and information about all spheres of reality…shouldn’t they be able to tackle something as simple as object location?

“Right now, the prevailing notion is that if you have a big enough model you’ll be able to do anything. In reality, and what we’ve found through this study with SDSC, you get much better results by using much smaller, modular AI,” said X.Y. Han. “Instead of an all-encompassing foundation model that has the entire internet pushed into it, you train a small model on a chain of tasks.”

Rather than equipping and entrusting a single surgical AI assistant with the entire spectrum of tasks and phases of surgery, multiple specialized AI may be delegated hyper-specific tasks in a structured and ordered manner which could then feed back to support the human medical team.

The caveat? Smaller models require special, curated data which is not only expensive, but difficult to capture, and risky for surgeons to share. As it is, surgeons and medical staff are chronically overextended, making video capture a low priority in the OR. And while the 66 videos used in the study were in compliance with HIPPA—all of procedures nearly impossible to identify and trace back to the patients—it’s a tough sell to get surgeons to put their sometimes imperfect work on display, let alone for the purpose of training widely used AI tools.

AI Stat—On Ethical Integration in the Operating Room

What’s unique about this approach—bringing together Booth researchers and practicing surgeons from SDSC—is having a direct line to the people intimately familiar with the trials and tribulations of the operating room. Through SDSC, surgeons and machine learning researchers are linked to devise AI-powered automations to reduce friction during surgery. Their vision? Surgeons could hand over HIPPA-safe videos of their procedures to an AI to timestamp specific parts in the procedure for benchmarking against other surgeons in the field, or to mark moments for improvement or compare techniques. 

For surgeons grappling with the exponential rise of AI in their field, the central tension has less to do with automation than with access. Millions of necessary surgeries go unperformed every year, not because solutions are missing, but because the expertise required to perform them safely is concentrated in a remarkably small number of people, poorly documented, and with a high bar to entry. That is a systemic failure with a very human cost. Add burnout to the scarcity issue and the imbalance between patient need and available care is striking.  

As is true of many fields, it’s worth wondering whether surgeons really want AI in their OR as a solution. Training smaller, specialized models will require an upfront cost, time and cooperation, from surgical specialists to properly review and correct AI. Without the buy-in of the surgical community, or an intermediary like SDSC to translate surgical terms and practices to ML and vice versa, the transition will have friction.

The Irreplaceability of Human Touch, Discernment

The good news? The hyperspecialized, intuition-guided touch of a human with years of experience handling nuanced medical cases still vastly outperforms current AI model. Even non-experts—able to spot surgical tools with near perfect marks—grossly outperform AI.

“AI should not replace the surgeon’s judgment; it should make that judgment more informed, more consistent, and less burdened by administrative overhead,” said Neeraj Mainkar, Chief Technology Officer at SDSC. “If AI can help with planning, documentation, workflow tracking, and retrospective analysis, the surgeon’s role becomes even more focused on what humans do best: contextual judgment, ethical responsibility, adapting to the unexpected, and communicating with patients and teams.”

Surgeons and medical staff in operating room with tools

“In surgery, the hard part is not just recognizing patterns. It is deciding what matters for a particular patient, in a particular moment, under uncertainty. I expect AI to become a cognitive support layer around the procedure, but the surgeon remains the accountable decision-maker. In that sense, the role evolves from being both operator and information processor toward being an operator, integrator, and leader of a human-AI system.”

— Neeraj Mainkar, Chief Technology Office at SDSC

Perhaps surgery is more of an art than a science after all. Perhaps subtlety, a flare for the unconventional, or the quite human ability to make a ‘gut call’ are crucial skills in the OR.

The real question about the future of surgical AI is more dogmatic in nature. Hyper-trained, specialized models may flatten surgical techniques into a single standardized template, eliminating flexibility around what it means to perform a “correct” procedure. Every patient exists in a context within which surgical tactics must be situated.

“The fundamental question about 'correctness' of surgery presupposes a match between the correct intervention or method and the correct patient,” said Dr. Donoho. “Diverse or unusual techniques will always have a role. In fact, as surgery might become more individualized, it might actually become less homogenous.”

AI may still yet have a future in the surgical world. Critically, the study indicated that specialized models with fewer parameters—internal variables that shape and produce model outputs—have higher overall success rates for equipment identification. Specialty model YOLOv12-m, with only 26 million parameters, outperformed all multi-billion parameter vision language models (VLM) used in the study. Maybe surgical AI just needs a bit—or a lot—of expert tutoring. 

For the team at Booth, there may be a future in which generalist VLMs play the role of a project manager, delegating more niche tasks like object identification to smaller, specialized models. Where and when surgeons, nurses, and other medical staff can slip into the VLM’s ‘workflow’ and override its orders is a question for future research.

“The future state [of surgical AI], which is coming quickly, is that these systems are explicitly given some degree of autonomy,” Donoho said. “When they do so, their owners and manufacturers will each have different degrees of responsibility for their actions.”

Frictions and Future Research

Between high costs and sensitive data, the question of who incurs the cost to train the specialized, smaller models lingers unanswered. 

“We’ve had these AI tools for decades now, but incentives need to be aligned to take meaningful steps forward,” said X.Y. Han. “And we’ll need to act quickly because other global leaders with fewer restrictions around medical data are making great strides.” Nonprofit organizations like SDSC are driven by a collective of curious surgeons, looking to the horizon of what is possible as new technologies emerge and evolve. With public interest as a motivator, the monetary incentives to continue this research are missing. 

The integration of AI will necessarily look different across hospitals and clinics with varying needs and levels of technical capacity to integrate new technologies. “Is AI the first thing that under resourced hospitals need? Or do they need better cameras, better tools, better microscopes, endoscopes, and more clinicians?” Asks Margaux Masson-Forsythe, Director of AI & Data Science at SDSC. “A single model like AI by itself won't solve multifaceted problems.” For Masson-Forsythe, ongoing consultation with frontline medical teams is essential. “We need to start by integrating [underprivileged medical teams] early on in the discussions, make sure they have the equipment they need, advise them, and partner them with better resources hospitals, which is what we do at SDSC.”

Organized support from a coalition of hospitals dedicated to exploring the future of AI and surgery may be the solution. This, plus buy-in from individual surgeons—and approval from their patients—to share the data from their procedures, will be needed as long as strong medical data protections like HIPAA remain in place in the U.S. 

“It’s interesting to be doing this kind of research at a business school. Medical care is a huge industry with lots of money going into care and, now, AI,” said Kirill Skobelev. “Future research could build on this study to look closely at the economics behind the widescale rollout of AI and what that will actually cost the medical industry.” 

Dr. Donoho remains optimistic. “I absolutely support AI being given the permission to intervene or provide suggestions to humans during task performance - provided those modes of interaction are carefully studied, proven to be beneficial, and monitored for their actual effects,” he said.

With a healthy dose of caution and a reminder of the need for continued oversight: “It's the last part - the monitoring of deployed systems - that is most critical and most different about AI systems. These systems will have novel and potentially unexpected modes of success and failure. We must learn from them, and teach them.”


This research was supported by the Center for Applied Artificial Intelligence and the Tolan Center for Healthcare, which provided administrative support for the acquisition of the data used by the research team. The Tolan Center aims to integrate business and medical points of view on the complex challenges facing the industry, in order to advance rigorous inquiry that will have a positive impact on the healthcare system. Explore more research at the intersection of Applied AI and healthcare.

More from Chicago Booth