The Future of Open Source Big Data Platforms

Three well-funded startups – Cloudera Inc., Hortonworks Inc., and MapR Technologies Inc. — emerged a decade ago to commercialize products and services in the open-source ecosystem around Hadoop, a popular software framework for processing huge amounts of data. The hype peaked in early 2014 when Cloudera raised a massive $900 million funding round, valuing it at $4.1 billion.

The recent struggles of Cloudera and MapR have made many headlines and left some wondering what this means for the future of big data,” observed Unravel Data CEO Kunal Agarwal. “Is enterprise interest in data waning? Not at all. These companies have faltered as a result of big data’s rapid transition to the public cloud, leaving little growth potential for platforms like these that were designed for on-premises deployments. Big data is a better fit in the cloud due to its highly elastic compute requirements. In addition, modern data systems are becoming more complex, and they’re more difficult to manage on-premises than in the cloud. There’s a new data stack emerging and Hadoop is no longer the definitive big data technology: technologies like Spark and Kafka are rising to support modern data applications that use artificial intelligence and machine learning. Hadoop won’t disappear and not every data workload will go to the cloud, but the public cloud and technologies like Spark will increasingly define big data and any vendors who don’t aggressively support them will continue to suffer.

Hortonworks went public in 2014 and Cloudera followed in 2017, but both saw shares tumble as market competition intensified and customers began moving rapidly to the cloud. Cloudera and Hortonworks merged last fall, but the stock of the combined entity has continued to fall, slicing market value by half over the last seven months. MapR announced its intentions to go public more than four years ago, but never followed through, opting instead to raise two more rounds of venture funding in 2016 and 2017. It was recently revealed that MapR may cut up to 122 jobs and shut down its Santa Clara, California headquarters if it can’t secure additional funding.

“The recent news around Cloudera and MapR is stirring up a lot of debate around the future viability of Hadoop, and really all open-sourced frameworks for managing big data workloads,” observed Chandra Ambadipudi, CEO of Clairvoyant. “A big factor is that Hadoop was greatly underestimated by the market regarding the resources needed to manage and leverage it. Hadoop did deliver on its promise as a low cost, scalable and robust open-source solution, but the talent and number of data engineers required to manage its complexity, and shortage thereof, has come to a head.”

With Cloudera now being the remaining significant Hadoop company, past the MapR news dust up, the following are some insights and thoughts about the future of open source big data platforms being tied to the cloud (and cloud giants like Microsoft, AWS, Google):

  • The viability of Hadoop is in question, not due to it being a bad technology (the tech is good), but due to the bottleneck of talent needed to manage the complexity of Hadoop as open source. The level of resources required was way underestimated compared to the hype.
  • The question is whether cloud giants will completely take over the space. Databricks and Snowflake are moving in to address the skills gap with big data implementations.
  • The consolidation seen still coming in the ecosystem (something like Microsoft buying up MapR) and only time will tell whether all this is good for the ecosystem (locking companies into a single vendor).
  • In a similar vein, the rise of popularity of platforms such as Apache Kafka may face similar challenges as an open source solution (just like Cloudera capitalized on Hadoop).

“As the cloud giants continue to ‘eat the world,’ the rise of platforms like Snowflake and DataBricks start to address some of this talent and skills gap,” added Ambadipudi. “I wouldn’t be surprised to see further market consolidation with some of the Cloud players acquiring MapR and other Hadoop players. Kafka is rising in popularity and is seeing mass adoption due to its low latency and scalability. Just as Cloudera capitalized on Hadoop, Confluent is doing the same thing with enterprise Kafka, but may face the same challenges as an open source platform. No matter what kind of big data implementation, the skills needed today are in short supply, and the need for expert managed services will remain high.”

Contributed by Daniel D. Gutierrez, Managing Editor and Resident Data Scientist for insideAI News. In addition to being a tech journalist, Daniel also is a consultant in data scientist, author, educator and sits on a number of advisory boards for various start-up companies. 

Sign up for the free insideAI News newsletter.

Comments

  1. Why would Microsoft buy MapR. A mega cloud vendor would not throw a lifeline to a technology that aims to build on-prem data lakes. A hybrid player like Oracle or HPE might consider. IBM sold BigInsight to Hortonworks, so they clearly have no taste for maintaining Hadoop stack. A white box provider like Broadcom or some storage vendor like Western Digital might be more interested.