Distributed System Architectures for Healthcare and Life Sciences

The insideAI News Guide to Healthcare & Life Sciences is a useful new resource directed toward enterprise thought leaders who wish to gain strategic insights into this exciting new area of technology. This guide is a useful new resource directed toward enterprise thought leaders who wish to gain strategic insights into this exciting new area of technology. The guide provides an overview of the utilization of big data technologies as an emerging discipline in healthcare and life sciences. It explores the characteristics of this business strategy and the benefits of leveraging big data technologies within these sectors. It also touches on the challenges and future directions of big data and analytics in the healthcare and life sciences industries. The complete insideAI News Guide to Healthcare & Life Sciences is available for download from the insideAI News White Paper Library.

insidebigdata_guide_healthcare_lifesciences_dell_emcDistributed System Architectures

Many organizations in the healthcare and life sciences industries are just now exploring the opportunities surrounding the dominant entries in the  distributed processing architectures, Hadoop and Spark, and are looking for ways for getting started. Even though Hadoop has been in the market  since 2011, it is still a relatively new technology in these sectors, while oftentimes Spark is for phase 2, for different use cases and for the advanced  user.

Difficult challenges and choices face today’s healthcare industry—researchers, clinicians and administrators have to make important decisions—often  without sufficient data. Distributed systems like Hadoop and Spark offer open source platforms to make healthcare data available and actionable —researchers explore the genetic architecture of cancer cells; nurses and physicians monitor intensive care patients; administrators submit  reimbursement claims before patients leave the hospital. Distributed computing systems are transforming healthcare.

Healthcare providers can get more valuable insights, manage costs, and provide better care options to patients by using data analytics. Big data technologies are enabling providers to store, analyze, and correlate various data sources to extrapolate knowledge. Benefits include efficient clinical decision support, lower administrative costs, faster fraud detection, and streamlined data exchange formats. It is projected that adoption of health data analytics will increase to almost 50 percent by 2017 from 10 percent in 2011, representing a 37.9 percent compound annual growth rate.

Hadoop is a strong example of a technology that allows healthcare to store data in its native form. If Hadoop didn’t exist, decisions would have to be made about what can be incorporated into the data warehouse or the electronic medical record (and what cannot). Now everything can be brought  into Hadoop, regardless of data format or speed of ingest. If a new data source is found, it can be stored immediately. No data is left behind. By the end of 2017, the number of health records of millions of people is likely to increase into tens of billions. Thus, the computing technology and  infrastructure must be able to render a cost efficient implementation of:

  1. parallel data processing that is unconstrained
  2. provide storage for billions and trillions of unstructured data sets
  3. fault tolerance along with high availability of the system

Hadoop technology is successful in meeting the above challenges faced by the healthcare industry as the MapReduce engine and Hadoop Distributed File System (HDFS) have the capability to process thousands of terabytes of data. Hadoop makes use of highly optimized, yet inexpensive commodity hardware making it a budget friendly investment for the healthcare industry.

Hadoop Use Cases

In conventional IT environments, clinical, operational and financial data are managed in data silos. Meanwhile, with the movement from paper-based to electronic health records, and with the increase in usage of machines and medical devices that produce steady streams of data, the volume of data  that healthcare institutions capture and analyze has skyrocketed, while the variety of that data has grown.

The Hadoop platform allows healthcare organizations to process and manage an ever larger influx of data in a secure and cost-effective manner to  improve quality and affordability. They can leverage the platform to bring together large volumes of detailed data from diverse sources, in a variety of  formats, and consolidate it into a single flexible and scalable system for long-term storage and analysis.

  • Cancer treatments and genomics – There has been an uptake in adopting Hadoop in the life sciences community, mostly targeting   next-generation sequencing, and simple read mapping because what developers discovered was that a number of bioinformatics problems  transferred very well to Hadoop, especially at scale.
  • Monitoring patient vitals – There are several hospitals across the world that use Hadoop to help the hospital staff work efficiently with big data.  Without Hadoop, most patient care systems could not even imagine working with unstructured data for analysis.
  • Hospital networks – Hadoop technology is used to help medical experts analyze high velocity data in real time from diverse sources such as  financial data, payroll data, and EHRs.
  • Healthcare intelligence – Hadoop technology is used to cultivate healthcare intelligence applications that assist hospitals, payers and healthcare  agencies increase their competitive advantages by devising smart business solutions.
  • Fraud detection and prevention – Using Hadoop technology, insurance companies have been successful in developing predictive models to  identify fraud by making use of real-time and historical data of medical claims, weather data, wages, voice recordings, demographics, cost of  attorneys and call center notes. Hadoop’s capability to store large unstructured data sets in NoSQL databases and using MapReduce to analyze this data helps in the analysis and detection of patterns in the field of fraud detection.

Spark Use Cases

  • Precision medicine – The promise of precision medicine is a far-reaching goal that will require sweeping changes to the ways physicians treat  patients, health data is collected, and global collaborative research is performed. Precision medicine typically describes an approach for treating and preventing disease that takes into account a patient’s individual variation in genes, lifestyle, and environment. Achieving this mission relies  on the intersection of several technology innovations and a major restructuring of health data to focus on the genetic makeup of an individual.  The healthcare ecosystem has chosen a variety of tools and techniques for working with big data, but one tool that comes up again and again in many of the new architectures is Spark. Spark is already known for being a major player in big data analysis, but it is additionally uniquely  capable in advancing genomics algorithms given the complex nature of genomics research.
  • Genomics algorithms – The transitioning of today’s popular genomics algorithms to Spark is one path that researchers are taking to take  advantage of the distributed processing capabilities of the cloud. Many of these are already being built on top of Spark. Although Spark provides many infrastructure advantages, Spark still speaks the languages that are popular with the research community. Languages like SparkR make  for an easy transition into the cloud.
  • Computational neuroscience – One example of a research project taking advantage of Spark is the Howard Hughes Brain Institute. The project’s  goal is to understand brain function by monitoring and interpreting the activity of large networks of neurons during behavior. An hour of brain  imaging for a mouse can yield 50-100 gigabytes of data. The researchers developed a library of analytical tools called Thunder which is based on  Spark using the Python API along with existing libraries for scientific computing and visualization. The core of Thunder is expressing different  neuroscience analyses in the language of RDD operations. Many computations such as summary statistics, regression and clustering can be  parallelized using MapReduce.

If you prefer, the complete insideAI News Guide to Healthcare & Life Sciences is available for download in PDF from the insideAI News White Paper Library, courtesy of Dell and Intel.