Harper Center Ceiling

Nightingale Open Science

Advancing computational medicine through de-identified data and outcome based labels.

Our Data

Our datasets offer comprehensive, structured data designed for machine learning, trend analysis, and hypothesis testing. Researchers and developers can confidently train models and uncover insights. Explore the data and start building today. 

Our datasets are curated around medical mysteries—heart attack, cancer metastasis, cardiac arrest, bone aging, Covid-19—where machine learning can be transformative. We designed these datasets with four key principles in mind:

  1. The core of each dataset is a large collection of medical images: x-rays, ECG waveforms, digital pathology (and more to come). These rich, high-dimensional signals are too complex for humans to fully see or process—so machine vision can add huge value.
  2. Each image is linked to at least one ground truth outcome: data on what happened to the patient, not a doctor’s interpretation of the image. This allows researchers to build algorithms that learn from nature—not from humans.
  3. The data are diverse: we work with health systems across the US and the world, including under-resourced ones whose data aren’t usually represented in machine learning. This lets the resulting algorithm speak to the needs of diverse populations.
  4. Access is secure and ethical: all data are completely deidentified, and as an extra precaution, no download is allowed. Only non-commercial use is allowed, so the knowledge generated from the data benefits everyone.