What Is Federated Learning in Health Care? And How Should Health IT Teams Prepare?

Medical research has long been stymied by patient data privacy concerns, which have often prevented researchers from gaining access to larger and more diverse data pools. Researchers have had to confront regulatory constraints from the Health Insurance Portability and Accountability Act of 1996 (HIPAA) and other data protections that have compounded the hospitals’ desire to protect patient privacy. These constraints, while necessary, have ultimately slowed the pace of innovation, particularly in artificial intelligence. 

As AI and machine learning take root in medicine, this paucity of data becomes an even more pressing problem. AI models need access to a lot of data, and a lot of different types of data, to improve their accuracy and sophistication. 

Enter federated learning, which allows researchers access to the robust datasets they need while still maintaining patient privacy. A new type of machine learning, federated learning allows institutions to exchange data in a way that’s both anonymous and decentralized.

Federated learning is revitalizing medical research and encouraging the adoption of AI-models in clinical settings. Previously, models were often deemed too unreliable because the algorithms didn’t perform particularly well when researchers tried to generalize them to broader populations. These data-hungry algorithms need data that’s representative of different demographics to improve their predictive power and accuracy. 

But while federated learning technologies are allowing medical research to progress, many health care institutions are too siloed to handle enormous levels of data exchange across different networks and systems. A lack of data compatibility could derail researchers’ best efforts, despite the passing of the 21st Century Cures Act, which sought to create standardized data protocols. 

It’s time for the health care industry to prepare for the next stage in the evolution of data exchange and align themselves with new data protocols. 

How federated learning works

The federated learning approach comes in contrast to traditional machine learning, which collects data from various sources and uploads it to a single server. But having all that data stored in one place creates cybersecurity vulnerabilities along with potential violations of patient privacy. 

Traditionally, data consolidation requires a de-identification process to ensure data could not be associated with any given individual; all identifying factors (name, address, etc.) would be removed from the patient data. However, sharing data among many institutions increases the risk of re-identification, whereby anonymous information can be matched with publicly available data to determine who a certain data point—or patient—really is. 

So instead of researchers handling the data directly, federated learning sends algorithms out to collect and interpret the data on decentralized servers—which means the data never actually leaves the hospital or research institute. The data remains protected behind the participating institution’s firewalls; the algorithms simply travel to the data to learn from it, which leads to more sophisticated AI models. 

Federated learning allows medical researchers to train AI models on larger, more diverse, and representative data sets that can also be subject to stronger governance. The more diverse data available, the better researchers can understand the impacts of various disorders and experimental treatments on different populations. Additionally, AI algorithms developed on more diverse data become more generalizable and the risk of transferring any bias that may be present in more limited data is reduced. 

The need for interoperability 

But there’s a significant hurdle that medical institutions need to overcome in order to truly reap the benefits of federated learning. Medical institutions need to be able to exchange data across systems and interpret the data in a shared way. This means there needs to be a significant degree of data standardization, with data in a common format that allows for collaborative research and analytics. 

Because health care providers collect information for a multitude of reasons—insurance claims, clinical studies, patient care—the data is often stored in different formats and information models, sometimes even within the same institution. Consider, then, how different the information models could be for a health care provider in the U.S. versus a medical research center in, say, Singapore. 

That leads to the need for a common standardized data model, such as the protocol established in the United States Core Data for Interoperability (USCDI) by the Office of the National Coordinator of Health Information (ONC). The mandate established a standardized set of health data classes and elements to enable health information to be exchanged nationwide. 

Yet the USDCI is limited to the U.S., so it doesn’t solve issues in data exchange with and among foreign institutions. However, the researchers working with the data should be responsible for determining a process of harmonization to support international collaboration. Also, the adoption of common data models requires significant investment of time and resources by health care institutions, which to date has caused the adoption to be partial at best.

Data governance teams at health care institutions are already tasked with ensuring the availability of accurate, high quality, consistent, and compliant data. With federated learning, that data may be more frequently involved in data collaborations, which poses additional requirements to ensure that data is of high quality. This requires strong data governance.

Ultimately, by protecting patient privacy while also improving access to larger and more diverse data pools, federated learning enables large-scale collaborative research, more robust clinical models with better generalizability, and—most important—better health outcomes. 

About the Author

Ittai Dayan is the co-founder and CEO of Rhino Health. His background is in developing artificial intelligence and diagnostics, as well as clinical medicine and research. He is a former core member of BCG’s healthcare practice and hospital executive. He is currently focused on contributing to the development of safe, equitable and impactful AI in healthcare and life sciences industry. At Rhino Health, they are using distributed compute and Federated Learning as a means for maintaining patient privacy and fostering collaboration across the fragmented healthcare landscape. He served in the IDF – special forces, led the largest Academic-medical-center based translational AI center in the world. He is an expert in AI development and commercialization, and a long-distance runner.

Sign up for the free insideAI News newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideAI NewsNOW