OpsClarity, a provider of intelligent monitoring solutions for modern fast data and streaming applications, announced that 92 percent of companies across retail, technology, banking, healthcare and life sciences are investing in real-time analysis technology for human and machine-generated data. In its survey and report, “The 2016 State of Fast Data & Streaming Applications,” OpsClarity examined the current state of adoption and maturity of fast data and real-time stream processing technologies.
With new fast data technologies, companies can make real-time decisions about customer intentions and provide instant and highly personalized offers, rather than sending an offline offer in an email a week later,” said Dhruv Jain, CEO and co-founder of OpsClarity. “It also allows companies to almost instantaneously detect fraud and intrusions, rather than waiting to collect all the data and processing it after it is too late.”
The survey, fielded among software developers, architects and DevOps professionals from companies as diverse as technology, finance, consumer entertainment and public sectors, found that 92 percent plan on increasing their investment in streaming data applications in the next year, 79 percent plan to reduce or eliminate investment in batch processing and 44 percent cite lack of expertise on new data frameworks to analyze data pipeline failures. The survey also determined the most popular data sinks (HDFS, Cassandra and Elasticsearch), message brokers (Apache Kafka, Apache Flume and Rabbit MQ) and data processing technologies (Apache Spark, Map Reduce and Apache Storm).
The ability to harness the power of real-time data analysis gives businesses a competitive edge in today’s digital economy by enabling them to become more agile and rapidly innovative,” said Jain. “However, as the underlying stream processing data frameworks and applications are relatively new and heterogeneous in nature, end-to-end monitoring and visibility into these fast data applications is critical for organizations to accelerate development and reliably operate their business-critical functions.”
About fast data
While big data is associated with collecting, storing and analyzing large volumes of data, the analysis is usually done in offline mode, on historical data. Large-scale data processing has now moved to a new age of sophistication–fast data, which allows for real-time analysis of data. Fast data is live and interactive, enabling real-time decisions and real-time responses that can actionably affect a business’ bottom line.
Business drivers for leveraging big data/fast data
The survey indicated an increasing shift in the use of data processing and analysis to serve core customer-facing applications as opposed to only historical analytics such as big data to optimize internal business processes and aid future decision-making. Businesses can now leverage insights gleaned from multiple streams of real-time data to enable timely decisions and responses. This type of real-time analysis is being built directly into customer-facing, business-critical applications. When asked what the key business drivers are to leverage fast data/big data, 32 percent said these powered core customer-facing applications, while 29 percent said these technologies powered analytics to optimize internal business processes; 39 percent said both.
Real-time data and stream processing are becoming central to how a modern company harnesses data,” said Jay Kreps, CEO and co-founder of Confluent. “For modern companies, data is no longer just powering stale daily reports–it’s being baked into an increasingly sophisticated set of applications, from detecting fraud and powering real-time analytics to guiding smarter customer interactions. Apache Kafka provides the real-time platform for thousands of companies, including Uber, Netflix and Goldman Sachs.”
State of adoption of fast data applications/technologies
While many of the current applications focus on batch processing, most businesses are looking to adopt stream processing technologies aggressively within the next year. According to the survey, 89 percent of software developers, architects and DevOps professionals currently use batch processing. However, more than 92 percent plan to increase their investment in stream processing applications in the next year.
Most popular data processing technologies, data sinks and message brokers
Eighty-six percent of software developers, architects and DevOps professionals use message broker Apache Kafka. At 70 percent, Apache Spark is the data processing technology of choice, and 54 percent prefer HDFS data sink.
Enterprises have significant heterogeneity in terms of the different data frameworks that they deploy, a strong indication that a wide variety of these data frameworks are here to stay in the near future. The survey also revealed a strong preference for open source technologies – 47 percent of software developers, architects and DevOps professionals say they exclusively use open source, and another 44 percent use both commercial and open source.
Developing and monitoring data pipelines
The modern data pipeline is a highly complex, interconnected mesh of different heterogeneous components, each of which by itself is extremely complex and distributed. This has created a completely new set of challenges when it comes to monitoring, troubleshooting and maintaining the health of the application. Forty-four percent of the respondents agreed that it is tedious to correlate issues across the pipeline, while more than half (55 percent) claim they have no visibility into end-to-end pipeline performance.
When it comes to developing and managing data pipelines, 68 percent of the respondents identified lack of experience and the underlying complexity of the new data frameworks as their primary challenge and barrier to adoption.
Sign up for the free insideAI News newsletter.