Paper A Comparative Study in Surgical AI: Datasets, Foundation Models, and Barriers to Med-AGI

Recent advances in artificial intelligence (AI) have enabled models to match or surpass human experts across a range of biomedical benchmarks. However, surgical applications — which demand multimodal reasoning, real-time human interaction, and physical situational awareness — remain underrepresented in standard medical AI evaluation suites. Given the complexity of surgical environments, broadly capable AI systems could serve as valuable collaborative tools, but their readiness for such roles remains uncertain.One promising path toward more capable surgical AI is scaling: increasing model size and expanding training datasets. This approach is particularly appealing given the millions of hours of surgical video recorded annually. However, preparing surgical data for AI training demands a high degree of specialized professional expertise, and the computational costs of training on such data are substantial. These competing factors create an ambiguous outlook for AI's near-term utility in surgical practice. To investigate this question, researchers conducted a case study on surgical tool detection in neurosurgery using state-of-the-art AI methods as of 2026. Despite employing models with billions of parameters and extensive training regimens, current Vision Language Models performed poorly on the seemingly straightforward task of identifying surgical instruments in video. Scaling experiments further revealed that increasing model size and training duration yielded only diminishing performance gains. These findings suggest that significant obstacles remain before AI systems can be reliably deployed in surgical settings. Critically, some of these limitations persisted across diverse model architectures and could not be resolved simply by applying greater computational resources, raising important questions about whether data quality and label availability are the primary bottlenecks, or whether deeper architectural challenges are at play. The study identifies key contributors to these constraints and proposes potential directions for future research aimed at bridging the gap between general AI capability and the demands of real-world surgical practice.

Get the paper

Authored by X. Y. Han Yegor Baranovski Eric Fithian Kirill Skobelev
2026
CAAI - Healthcare
Share This Page