I found an interesting discussion going on in the Global Big Data & Analytics group on LinkedIn – “Why do Hadoop projects fail?” Having just returned from the Hadoop Summit 2014 in San Jose, I witnessed plenty of use case examples of Hadoop implementations that were wildly successful. I was therefore intrigued by the notion to itemize causes for failed projects. Here is a list of causes that served to start off the discussion:
Data Related
- Inability to ingest data
- Inability to access data in-place
- Security
Hardware/Software Related
- Inability to get a cluster running and stable
- Inability to reach a successful proof of concept
- Inability to complete a successful pilot project
Business Related
- No financially compelling use case
- Lack of exec sponsorship, budget, priority
The discussion took some interesting paths. Here is a sampling of comments accompanied by my own take:
Lack of articulation and clarity of the business problem that one is attempting to solve linked with what data is required and how the question would be answered using the data.”
This is an excellent point, one that I’ve seen in my own data science consulting practice. If a client says “Here’s some data, now go do your magic” then I’ll run for the hills because the project is destined for failure without a singular purpose and well-defined goal. But this is true of all data science projects, not only those based on Hadoop.
I am surprised there is no mention of organizational barriers. Unlike many other technologies, Big Data may require coordination and agreement across many business units that are not used to work together. Because one of the most common use cases is to centralize data, or create a data lake, companies need to deal with wide initiatives to enable the implementation of such projects. Business units will usually look at the benefits of such corporate initiatives at their own level, balancing this against their existing business priorities and roadmap. In order to succeed, Big Data projects may require organization structure changes.”
So true! Someone in the enterprise needs to “own” the big data project, and a Hadoop project is no different. The need may originate with finance, but in all likelihood, IT will need to weigh in since they are probably responsible for the company’s data assets. If the organization has no concept of data governance, then a big data project is going to be a stretch for its organizational capabilities. A step back to put policies in place may be well worth the time.
I’ve led teams on two successful Hadoop projects, but I’m not sure that Hadoop projects are any different than any other technical projects, and if I think about the reasons on tech projects that I’ve been a part of and failed it’s pretty consistent with the literature on why tech projects fail (which is not often about the technology) – poor executive sponsorship, no clear objectives, poor scoping, bad requirements, poor project management, mismatched resourcing, cross functional alignment, mismanaged expectations, etc.”
The above obviously said by someone who’s been down in the trenches for an enterprise-wide technology project – his experience speaks volumes. Great advice!
If you’re been down this road yourself and have some wisdom to offer, please leave us a note here.
Daniel – Managing Editor, insideAI News
Sign up for the free insideAI News newsletter.
I was at the summit.
I did not perceive at all that Hadoop projects fail.
With all due respect, I think that if you (writer of the article) truly attended, you should say what you got out of it, not quote someone else.
That would be more valuable.
Perhaps I am getting out of topic but my take of the summit was :
– an overall excitement by the growth of flexibility of the new hadoop. There were presentations on the technical side to talk about the new components and business presentations that talk on how to use those components. So it was from my own view fairly well balanced. I would have hated the summit to be flooded by purely marketing data about upcoming commercial versions of hadoop.
– there were presentations that provided a lot of insight on how to extract value out of data. Tying it up with specific building blocks of the new hadoop sw ecosystem. Others used the word of monetization of data.
– it was made clear that one of the new assets is data governance and security. That is an obvious requirement to move from an R&D project to a production project where you are going to deal with customer sensitive data (eg. health care, insurance).
– There was at least one presentation that talked about challenges to adopt hadoop. There was one of the keynote sessions where an analyst provided a summary of the evolution of analytic, and it concluded with some of the barriers for its adoption. If I had to summarize the main issue in a statement, it would be “you need to get educated so you can start acknowledging the competitive value that it provides to the organizations that will adopt it”. I got quite a bit of years of experience in HPC in a wide range of areas and it has quite similar barriers in terms of adoption within enterprise. For those that are far from understanding the value of it , I would understand that it is normal to feel either threatened by its power or skeptical to trust on a “black box analyzer”. But machines just do, in a repeated manner, what we, the humans learn to solve. Just made at larger scale (with more data, thoroughly, tirelessly). If you want to keep your job and provide vision/strategy to your company, you better leverage the computing power to analyze those “lakes” of data.
If you want to get into what and how to analyze data, then a tiny bit of it was provided in the summit through some high level machine learning examples. But it wasn´t the right conference to get into specific algorithms on how to extract value out of data. There were though several talks from UC Berkeley, Stanford and a company in Europe that went over their projects, again leveraging the new building blocks like spark and storm. It was also explained the capability to visualize data through efficient/fast interaction with the databases, again, thanks to the new building blocks in the new hadoop.
So I overall I did not heard even a single presentation that they used the words “we failed implementing the project”, or “there was no real value out of the analysis capabilities that it provided”, or “it got cancelled unfortunately due to lack of funds”.
I hope some people read through this and appreciate the summary I provided of the summit.
Sincerely,
Joshua Mora.
Joshua, thank you for posting your review of the Hadoop Summit 2014. I agree it was a fantastic event, but I think you need to go back and read my original post. I was commenting on a discussion that occurred on a LinkedIn discussion forum, not what I saw at the Summit. In fact, I go on to say that I did NOT see anything at the Summit that remotely resembled any failures. It is my opinion that Hadoop is seeing many wonderful and successful use case examples.
Agree on the fact of organisational commitment to make the Big data projects successful. Most of the organisations has information silos and many custodians of the same . They do not want to share one line of business data with another line of business in the same enterprise and it leads to ownership or internal politics. I have seen many organisations succeed in adopting the Big data technologies over last couple years and scaling its operations . They had systematic approach of Big data strategy definition, exploration and highest level of commitment. There are product uvendors playing in the market who push different tools and confuses the decision makers.