BlueData, Intel Compare Bare-Metal & Containers for Big Data Workloads

Has your business considered running Hadoop in Docker containers rather than on bare-metal? BlueData and Intel collaborated to tackle this very issue in a benchmark study of the performance for Big Data workloads.

So, can you enjoy the performance of a bare-metal environment while also taking advantage of the flexibility of Docker containers for your Big Data needs? According to the study, which evaluated performance for Big Data workloads in a bare-metal environment versus a container-based environment, you can do just that.

Results showed that, with the BlueData EPIC software platform, your business can run Hadoop in Docker containers while still maintaining the security and performance of a bare-metal environment.

In fact, the study showed that performance ratios for container-based Hadoop workloads on BlueData EPIC are equal to – and in some bases, better than – bare-metal Hadoop. Take this stat, for example: the BlueData EPIC platform demonstrated an average of 2.33 percent higher performance over bare-metal, for a configuration with 50 Hadoop compute nodes and 10 terabytes of data.

“Results show that your business can take advantage of the agility and flexibility benefits of running Hadoop in Docker containers – while ensuring the security and performance of a bare-metal environment.”

The recent benchmark study is documented in a new white paper, “Bare-metal performance for Big Data workloads on Docker Containers,” that provides an in-depth report of the challenges, test environments, performance metrics, detailed results, best practices and more.

So, how is the BlueData EPIC software platform able to deliver bare-metal performance with Docker containers? With BlueData, the container-based clusters “look and feel like standard physical clusters in a bare-metal deployment.” But the platform is specifically tailored to the performance needs for Big Data workloads such as Hadoop and Spark. This includes technology that boosts the input/output performance and scalability of container-based clusters, using data caching and other innovations.

To ensure the best comparison, the Intel team evaluated benchmark execution times in a bare-metal environment and in a container-based environment using BlueData EPIC, and both ran on identical hardware. The team used the BigBench benchmark kit, which is an industry-standard benchmark for measuring the performance of Hadoop-based Big Data systems.

The study also found that the BlueData EPIC software platform helps to solve challenges that in the past have slowed or stalled on-premises Big Data deployments. Data science teams can create on-demand Hadoop and Spark clusters without having to submit requests for scarce IT resources or wait for an environment to be set up for them. Further, the technology allows multiple business units and user groups to share the same physical cluster resources. This helps businesses to avoid the complexity and cost of each group needing its own dedicated Big Data infrastructure.  Because of this and more, BlueData EPIC software, running on Intel architecture, is becoming a popular solution stack choice for many Big Data initiatives.

In the complete white paper, BlueData and Intel explore the following:

  • Big Data challenges
  • Benchmark test environments
  • Performance metrics
  • Benchmark data model
  • Study results
  • Deployment considerations and guidance

To learn more about performance benchmarking for running Big Data workloads on Docker containers, download the full white paper.