Big Data Analytics Receive a “Spark” In the Arm

Print Friendly, PDF & Email

In this special guest feature, Anand Venugopal, head of StreamAnalytix at Impetus Technologies, discusses real-time streaming analytics applications and how companies can use Apache Spark for data processing and analytics functionality. Anand is an AVP at Impetus Technologies, where he is the head of StreamAnalytix − an open-source enabled, enterprise-grade multi-engine stream processing and machine learning platform that empowers enterprises across industries to make smart decisions, and also act on them in real time.

Real-time data and analytics processes are the central nervous system of today’s enterprise, which makes it no surprise that the global revenue in the business intelligence (BI) and analytics software market is forecast to reach $22.8 billion by the end of 2020.

Like leaving your hand on a hot stove because of a delay in the pain signal reaching your brain; receiving delayed insights can be detrimental to the health of an organization. Whether an information security breach or malfunctioning equipment on the factory floor, it’s imperative to the health of all enterprises to recognize and react to anomalous events immediately before significant damage can occur.  And of course, there are equally robust examples of how the ability to react to both anomalous events and new revenue opportunities immediately can translate directly to a competitive advantage and new and better ways of doing business.

Electrifying big data use

Real-time streaming data analytics is catching on in enterprises as businesses require faster insights to act on business opportunities and see greater value from their data lake investments. Apache Spark is helping to drive this trend. In the past five years, there has been a steady rise in the deployment of Apache Spark within enterprises, especially given Spark Streaming which was introduced in Spark 2.0. This engine is one of the most widely used streaming data engines and is becoming the de-facto big data processing platform used for both traditional ingest and ETL functionality, loading the data lake, machine learning and predictive analytics.

Spark structured streaming APIs and functionality lets enterprises “build once and deploy both as batch and streaming jobs.” Essentially, this means you can run identical code in both modes with very minimal changes. As a result, it is easier for analysts to use the platform for streaming data analytics as well as functions where batch processing still makes sense – all with strong guarantees related to data consistency and exactly-once semantics.

In fact, Spark deployments by experienced users who started with plain Ingest and ETL jobs are beginning to include a wider range of use cases such as insider threat detection, marketing lead detection, real-time contact center analytics, fraud reduction and many more. Penetration levels could even surpass Hadoop adoption due to cloud-based approaches and non-Hadoop usage of Apache Spark as enterprises race to adopt real-time streaming analytic strategies.

Keeping a steady current

Consistent input and output of data into the real-time streaming analytics platform is critical as errors will result in detrimental business decisions and missed opportunities. Building streaming applications on Spark will ensure that these issues are addressed through the structured streaming feature. Constructing stream processing applications normally requires strong ‘reasoning’ for end-to-end guarantees, intermediate aggregates and data consistency. However, structured streaming allows late data handling and watermarking to enable these guarantees.

And as these Spark implementations rise, there will be a greater demand for productivity tools and user interfaces to manage it and other big data jobs. To remain competitive, businesses need to enable real-time streaming data analytics in their firms. Companies cannot wait days – or even hours – for valuable insight around operational procedures. Businesses will also start to seed their organizations with the self-service functionality to build big data and fast data analytic applications and visualizations even without deep technical skills. This will enable the long-predicted democratization of big data to truly become a reality.


Sign up for the free insideAI News newsletter.




Speak Your Mind



  1. It’s true that today’s enterprises wouldn’t be the same without analytics and business intelligence. These days, enterprises can collect data at a high rate, which means that they are able to view it in virtual real time. This also means that businesses are able to gain insights from analytics quickly; and as long as they react to these insights in a timely manner, it will benefit their company.