BlueData Offers New Turnkey Solution for Fast Data with Spark, Kafka, and Cassandra

bluedata_logo_NEWBlueData, provider of a leading infrastructure software platform for Big Data and Big Memory applications, announced a new solution for building real-time data pipelines with Spark Streaming, Kafka and Cassandra. This new turnkey offering is designed for organizations that want to develop and test applications for analyzing “Fast Data”: real-time or near real-time data that requires instant awareness, faster decision-making, and immediate action.

Fast Data use cases are emerging in almost every industry: ranging from fraud detection for financial transactions; to Internet of Things (IoT) monitoring with sensor-generated data; to campaign optimization and real-time bidding in advertising technology. Real-time analysis of these new high-velocity data streams (from financial markets, sensor data, machine logs, social media, mobile applications, and other sources) can bring tremendous value – whether in delivering competitive business advantage, averting potential crises, or creating new revenue opportunities. But this data is perishable, and may lose its operational value in a very short time frame. Speed is of the essence.

For data scientists and developers working with real-time pipelines, the stack of Spark-Kafka-Cassandra has quickly emerged as the best place to start. This new trinity of open source systems delivers on key requirements for Fast Data:

  • Spark: a fast in-memory data processing engine, and the fastest growing Apache open source technology. Spark Streaming is an extension of the core Spark API; it allows integration of real-time data from disparate event streams.
  • Kafka: a messaging system to capture and publish streams of data. With Spark you can ingest data from Kafka, filter that stream down to a smaller data set, augment the data, and then push that refined data set to a persistent data store.
  • Cassandra: this data needs to be written to a scalable and resilient operational database like Cassandra for persistence, easy application development, and real-time analytics.

However, the infrastructure for the Spark-Kafka-Cassandra stack is time-consuming to assemble and most organizations lack the skills to deploy and configure each of the necessary components. BlueData’s mission is to make this infrastructure deployment easy. The BlueData EPIC software platform is purpose-built to simplify and accelerate the infrastructure deployment for Hadoop, Spark, and related tools for Big Data (and Fast Data) analytics – leveraging patent-pending innovations and Docker container technology.

The new Spark-Kafka-Cassandra solution provides a full enterprise license of BlueData EPIC software along with the professional services needed to deploy an on-premises lab environment for building real-time data pipelines. With BlueData, customers will have a multi-tenant sandbox for prototyping, developing, and testing new Fast Data applications and use cases with this popular stack (either with or without Hadoop).

This new turnkey solution includes the following:

  • An accelerated deployment for real-time streaming, with BlueData EPIC software running on five physical servers or five virtual machines.
  • A ready-to-run, fully functional data pipeline integrated with Spark Streaming, Kafka, and Cassandra for immediate use.
  • Sample datasets and sample use cases for real-time streaming, with assistance from BlueData experts to help customers get started with this new Fast Data stack.
  • Rapid prototyping and agile application development with the ability to spin up new clusters in a matter of minutes via self-service, with just a few mouse clicks.
  • Improved developer productivity with web-based Apache Zeppelin notebooks that can be shared with other users in a multi-tenant environment on shared infrastructure.

Batch processing of large datasets was the start for many Big Data analytics initiatives. But now there’s growing demand from organizations analyzing real-time ‘data in motion’ in addition to the more traditional batch-oriented ‘data at rest’ use cases,” said Kumar Sreekanti, CEO of BlueData. “For real-time data pipelines, we’ve seen Spark Streaming together with Kafka and Cassandra emerge as a popular stack. BlueData makes it easy for enterprises to get started quickly with these new tools and technologies in a turnkey on-premises lab environment.”

 

Sign up for the free insideAI News newsletter.