Partnering to Advance the Future of AI in Health

laptop and notepad displaying medical images

CAAI and UCSF UC Berkeley’s Computational Precision Health program to jointly host diverse, high-quality health data repository.

The remarkable recent advances in machine learning and data science are grounded in large-scale open datasets—ImageNet, Wikipedia, and others—as much as in GPUs and algorithms. But the health sector is starved for high-quality, open medical data: most available datasets come from narrow U.S. populations or specific contexts like intensive care units, and very few measure the real health outcomes that are most important to patient care.

This has many negative consequences. Researchers and students spend years navigating the legal and logistical hurdles required to obtain usable data—or worse, give up before they get access. And AI models trained in one dataset underperform when deployed in new clinical settings or in different populations.

In a major step forward for the field of artificial intelligence in healthcare, the UCSF UC Berkeley joint program in Computational Precision Health (CPH) is now taking a lead role in hosting and managing the Nightingale Open Science dataset—one of the world’s most diverse, open-access medical datasets. The cross-university collaboration, in partnership with the Center for Applied Artificial Intelligence (CAAI) at The University of Chicago’s Booth School of Business, expands access to a key resource to address some of the most persistent challenges in health AI research and deployment.

Nightingale Open Science, co-founded by UC Berkeley faculty member and longtime CIAA collaborator Ziad Obermeyer, is a repository of diverse and "groundtruthed" medical datasets—labeled with actual medical outcomes—that span multiple health conditions and diverse populations from around the world. All data are deidentified, preserving patient privacy.

Nightingale offers researchers the ability to develop, test, and refine AI models for improving prediction and healthcare for a range of conditions. With a vast collection of medical images—X-rays, ECG waveforms, digital pathology and more—the data repository can help develop machine learning approaches that identify complex patterns or signals that are difficult for humans to see or fully process.

Interdisciplinary Collaboration

Nightingale data will be available to the UCSF UC Berkeley Computational Precision Health program this fall. CPH unites Berkeley’s strengths in computer science, statistics, and public health with UCSF’s expertise in clinical medicine and informatics. CPH is an academic program delivering real-world impact—making it particularly suited to manage and leverage the Nightingale dataset for both educational and translational purposes.

This new phase of Nightingale management marks a distinctive kind of cross-university collaboration. Linking two top-tier California research institutions—UC Berkeley and UCSF—with the expertise of The University of Chicago, the project brings together leaders in data science, medicine, and public health to build infrastructure for the next generation of health AI. It reflects a growing consensus in the scientific community that no single institution can drive meaningful change in health tech alone—collaboration, diversity, and openness are critical.

Integrating Nightingale into Berkeley’s Health AI Ecosystem

The inclusion of Nightingale adds an important tool to CPH’s ongoing research and education efforts. Students and faculty at Berkeley and UCSF will be able to leverage this data resource for teaching, model development, biomarker discovery, and clinical research. Possible examples include:

  • Using electrocardiograms to predict sudden cardiac death and silent heart attacks
  • Predicting knee pain and functionality in diverse patient populations using radiographic imaging
  • Developing and improving approaches to tuberculosis prediction in global health

These types of projects reflect CPH’s commitment to transforming the health data ecosystem, not just through cutting-edge research, but by building platforms and tools that make scalable, ethical, and effective health AI possible.

Situating Nightingale within CPH continues CAAI’s work toward a more open, collaborative, and globally relevant approach to health AI. With real-world clinical outcomes, international scope, and a commitment to transparency and accessibility, Nightingale is helping to transform the health data infrastructure and paving the way for safe and equitable AI integration into clinical care.

Learn more about the Nightingale Open Science dataset.

More from Chicago Booth