A 3D-rendered globe sits on a tablet surrounded by health paperwork and a stethoscope

Tackling Public-Health Research’s Data-Sharing Problem

A new tool kit from the Center for Applied Artificial Intelligence helps institutions collaborate to leverage data more effectively and improve health care.

By Linda Pantale
June 29, 2021
Big Data
Share This Page

Linda Pantale, ’21, works as a project manager and researcher at Chicago Booth’s Center for Applied Artificial Intelligence (CAAI). Most recently, she helped oversee the center’s Computational Medicine Legal Accelerator along with her colleagues Amy Pitelka, project lead and chief legal officer of Nightingale Open Science, and Stephanie Nguyen, principal designer and researcher focused on data privacy, product design, and policy. Below, Pantale delves deep into how the initiative is helping accelerate machine learning research to improve clinical care and, hopefully, help fight off a future pandemic.

In the early days of the coronavirus pandemic, institutions and policymakers across the globe coalesced around a common goal: get health data in the hands of machine-learning researchers in order to discover more about the virus—and to develop treatments and vaccines to fight off the burgeoning public-health disaster.

It was inspiring and even humbling to see these collaborations take shape. Yet, deeply ingrained institutional barriers to speedy and efficient data sharing were also making it difficult for computational medicine researchers to trade information with each other securely, quickly, and at scale.

Health data are highly protected and subject to strenuous legal requirements, for good reason, not least of which is to maintain patient privacy. They are also collected and stored in a patchwork of medical centers and must be woven together to reach the minimum scale requirements necessary to apply machine learning. Machine learning can help uncover patterns that may lead to breakthroughs in diagnoses and patient care and make better predictions about treatment efficacy and disease susceptibility.

As COVID-19 began its global spread, networks of researchers offered to share cell-phone-movement data to map possible transmission hotspots of the virus, and university medical centers offered patient and clinical trial data. But getting data from one institution to another requires a legal review, data governance and security assessments, and approval from oversight committees such as the Institutional Review Board. This process can take months or even years.

“The hope is that research institutions, armed with these resources, will be able to give computational medicine experts the data they need to use machine learning to better fight off a future pandemic and to confront a host of other health questions.”

— Linda Pantale, ’21

Booth’s Center for Applied Artificial Intelligence (CAAI) set to the task of figuring out how to improve public-health data sharing. In July 2020, the CAAI launched the Computational Medicine Legal Accelerator (CMLA), a coalition of individuals and organizations across 15 institutions, comprising data governance teams, researchers, computational-medicine startups, and legal offices.

The coalition spent the next year assessing the state of data sharing and diagnosing barriers. Its efforts revealed several key challenges, but also demonstrated how a few straightforward, standardized templates can be used to combat them. Based on the research, the CMLA developed a tool kit of resources intended to shave six months to a year off the review process.

The components of the tool kit—the process map template, legal white paper, template data use agreement (DUA), and data-sharing guide—are not brand-new ideas, but rather familiar enough to be adopted and improved enough to be helpful. They are informed by a deep understanding of the end users: the researchers, lawyers, and support teams focused every day on advancing scientific discovery. The hope is that research institutions, armed with these resources, will be able to give computational medicine experts the data they need to use machine learning to better fight off a future pandemic and to confront a host of other health questions.

Creating Shared Process Maps

Sharing information is complicated when many stakeholders are involved. Researchers, legal teams, data providers, and compliance and security teams (the people involved in the kinds of projects the coalition studied) seldom work in the same office, and individual players vary from project to project. Often, these efforts lack a process to thread the work and findings of all the teams together.

To address this issue, the coalition developed a template that institutions can use to outline the steps of their data-sharing process, state key questions, and share resources. The hope is that putting this information in a standardized document, or shared process map, will help align expectations and bring clarity to people on and off the team about how the different parties operate. It’s an easier, more transparent, and faster way to conceptualize the process, leaving researchers more time to do what they do best: advance science.

The CAAI offers this data-sharing-process template free to all members of the coalition and the broader university research community. As other universities use the templates and set expectations for data sharing, the CAAI will house the maps in an open-access library.

Streamlining and Simplifying the Legalese

Technology goes hand in hand with a complex legal landscape. Particularly with health data, there is little consensus on how to assess and address new risks, and staying up to date on the legal precedent and regulatory requirements for working with such data is a daunting task. The coalition’s assessment uncovered frequent incentive misalignment between researchers (who see the benefits of using clinical data) and university leadership and legal teams (who are primed to see the risk). Institutions tend toward a conservative approach to data sharing, which is a great tactic for managing a reputation but less ideal for advancing scientific research.

To give institutions a shared understanding of the challenges involved in health-data sharing, the coalition worked with the law firm Ropes and Gray to produce a whitepaper on the current legal landscape of computational medicine. This is a stepping-stone toward a professional consensus on the risks and opportunities of sharing health data.

The coalition also created a templated Data Use Agreement (DUA) that lays out, in a user-friendly format, the legal requirements for sharing data. DUAs are standard practice but can also drain resources. All parties can benefit from a document that is simple and standardized yet customizable. The white paper and DUA template should help institutions focus more on advancing research and spend less time lawyering agreements.

Information Tools in Action

Research in computational medicine is complex, sitting at the intersection of technology, academia, and human welfare, and getting the right information to the right people at the right time remains a problem. Increased education and training may sound like an obvious answer to this problem, but reference documents and webinars just aren’t sufficient.

With this in mind, the coalition sought to answer the question: What information is most important to know as soon as possible? It assumed that when institutions come together to share data, information is fragmented, and research teams are distributed. The downloadable template it designed centralizes the most critical information and provides a framework for sharing it with all stakeholders.

The coalition took a model laid out by the surgeon, writer, and public-health leader Atul Gawande in his 2009 book, Checklist Manifesto. Gawande advocates using checklists in medicine and management, arguing that they are transformative in building a culture of transparency and continuous improvement. What started out as a one-page guide to data sharing quickly turned into a seven-page interactive guide with checklists, contact list templates, and a researcher cover sheet.

By working together with a shared set of tools and a vision of a better process, the coalition hopes the members of the American research ecosystem will be better equipped to collaborate and address many challenges in today’s health-care landscape.

Center for Applied Artificial Intelligence

This center supports researchers from across Booth and UChicago in making revolutionary advances in the applications of AI. Their work touches fields as diverse as finance, healthcare, public policy, education, and behavioral science.