In this special guest feature, Hiro Yoshikawa, co-founder and CEO of Treasure Data, counters the growing perception that Hadoop is a failure and focuses on the lessons that organizations can learn from the rush to adopt Hadoop. Prior to Treasure Data, Hiro worked at Red Hat, and at Mitsui Ventures, the venture capital arm of Mitsui, where he recognized the potential for open source and Hadoop and advocated for being an early-stage investor in data startups.
It’s in vogue right now to say Hadoop has failed. The Hadoop detractors point out that Hortonworks isn’t on a curve of exponential growth and, although Cloudera went public, its stock hasn’t soared as some predicted. They also say that many companies invested in Hadoop and found it difficult to use and, frankly, not an improvement over their previous RDBMS-based solutions for data warehousing and analytics.
All that is true—and yet it doesn’t mean Hadoop is a failure. For all the “failures” cited in the previous paragraph, there are many counterbalancing successes. Amazon’s EMR, which is essentially Hadoop-as-a-service, is doing extremely well. Microsoft Azure has added Hortonworks to their platform, called it HDInsight, and has seen it add to top-line revenues. Google has had success offering Cloud Dataproc, a fully managed solution for running Apache Spark and Hadoop clusters.
So it’s hardly an open-and-shut case that Hadoop has failed. But leaving that debate aside, the question is: What can businesses learn from the rush to adopt the latest data technology? Here are five key lessons.
1—Recognize that the way to monetize open source software is changing
The reason Amazon, Google and Microsoft are doing well and Hadoop vendors are not is that the cloud economic model has changed the way to monetize open source. Red Hat’s successful marketing strategy was to say Linux is free open source software, but someone has to make sure it runs smoothly and securely in an enterprise environment. Cloudera, Hortonworks and other vendors made the same case with Hadoop. But, in fact, they can no longer promise the same amount of added value as Red Hat could back in the day because now there’s a better alternative, namely, deploy Hadoop as a cloud service. So it’s not that Hadoop failed, it’s that the value proposition Hadoop vendors are offering has been overtaken by a newer, better alternative.
2—Evaluate your business needs first, technology second
Google has used Google File System (GFS) and MapReduce, which inspired Apache Hadoop, to create a giant business in search and advertising, but, unfortunately, that doesn’t mean you can. Google needed GFS and MapReduce not only to handle massive amounts of data, but also all the unstructured and semi-structured data on the web pages it crawls. If your data is mostly structured, and you’re not indexing the entire web, you can probably get better results with an RDBMS than with Hadoop. Is that a failure of Hadoop? Clearly not—it’s just a matter of defining the need, then finding the right technology solution.
3—Focus on convenience and capabilities
Hadoop is a complex ecosystem of open-source software, and it was initially possible to monetize it by making it convenient. That’s why there’s a Hadoop distribution business, just like there’s a Linux distribution business. But Hadoop distributors are losing their edge because what’s even easier than using Cloudera or Hortonworks distribution is to go to Amazon, Google, or Microsoft and spin up a Hadoop cluster without ever compiling or downloading anything. Combined with Lesson 2, this lesson highlights two big problems for Hadoop vendors: Many businesses don’t need Hadoop, and for those who do and are willing to embrace cloud computing, there’s a more convenient option than distribution.
4—Focus on data rather than data technology
Your data is the lifeblood of your company, and you should beware of giving control of it to someone else, like Amazon, Facebook or Google. It’s more important to focus on the quality of your data and how you analyze it than to focus on data technology, even technology as sexy as Hadoop.
5—Remember that the developer experience matters
Developers like convenience and will migrate to environments that are the most convenient. Open source was initially convenient because, unlike proprietary software, it allowed developers to tinker with and evolve the software. Hadoop was initially popular because it was the only economically feasible way for developers to tinker and create the kind of infrastructure Google and Facebook enjoyed. But what’s even more convenient is being able to deploy the same infrastructure by simply logging in to the cloud. So, if you’re building software, focus on the developer experience and make it as good as possible; that’s a bigger consideration than whether your software should be open source or not.
Hadoop is a powerful, useful solution that meets an essential need—for the right company. If you decide you’re not that company, it doesn’t mean Hadoop is a failure. It just means you’re smart enough to understand your business and its data technology needs.
thanks for sharing nice information….