Big Data, Hadoop & Cloud: Tackling a Chain of Emerging Challenges

In this special guest feature, Chandra Ambadipudi, CEO of Clairvoyant, provides a compelling tour de force through the recent history of the big data industry and how Hadoop and the cloud have made steady acceleration possible. Also offered are recommendations for how to address several challenges faced by enterprises with respect to big data cloud implementations. Chandra in his current role as the CEO of Clairvoyant, co-founded the company in 2012 and has driven the company to become a leading big data player with multiple Fortune 500 customers today. A highly motivated senior leader in software engineering with a proven track record. He also co-founded BlueCanary Data, a predictive analytics product company focused on higher education, and lead it through a successful acquisition last year.

Data has often been heralded as the new “oil” – a commodity more precious than any natural resource in today’s digital economy. To be fair, oil and data is not an apples-to-apples comparison. Data can “drive” an autonomous car, but you can’t fill the gas tank with ones and zeros. However, in line with the analogy, data has travailed phases similar to oil exploration and drilling.

First came the data “land grab” phase. About 10 years ago at the advent of big data hype, companies scrambled to ensure they didn’t miss out. Then came the delineation phase, where the industry more tightly defined big data boundaries and applications. We’re now in an efficiency phase. Just like with oil drilling, extracting maximum value from data is all about combining the right expertise with the right technology.

For all of big data’s promises, many challenges came to light during the delineation phase and continue today as companies implement big data projects. According to Gartner, many organizations that have invested in big data projects remain stuck in the pilot stage. So what are the main challenges causing these stalls?

Big Data Challenges

Traditional big data storage and analysis systems have buckled under the weight of large volumes of unstructured data. Due to cost and scalability issues, companies have shifted to more agile, cost-efficient open source solutions like Apache Hadoop and Spark, as well as Lumify, MongoDB, and Elasticsearch and many others. Navigating the sea of big data tools is its own challenge, but let’s focus on Hadoop, a solution at the center of the big data transformation.

For all of the difficulties many companies experienced in their Hadoop journey over the years, it has now become mainstream, with significant ROI demonstrated across industries. Financial service and healthcare companies are augmenting, and in some cases completely replacing traditional BI/DW-based data management systems with large scale Hadoop deployments.

While Hadoop does solve many data problems, it’s opened up new challenges too. Acknowledging Hadoop’s potential, the hard truth is that Hadoop implementation and management (especially on-premise) is difficult and can end up causing more problems than it solves. Hadoop’s learning curve and required level of expertise per industry and use case can challenge a company’s internal data professionals and strain available IT resources.

Additionally, scaling Hadoop on premise can be a challenge, requiring more investment in physical infrastructure – something many companies don’t have resources for. This is why many enterprises are moving to cloud-based Hadoop solutions, including private, public, and hybrid cloud deployments.

Cloud Migration Challenges

Cloud-based Hadoop solutions allow companies to scale in a more agile fashion as their data needs increase. This can solve the problem of having to add more on-prem infrastructure over time, but as with any solution, migrating big data analytics to a cloud infrastructure begets its own set of challenges.

These challenges largely revolve around ensuring performance, reliability, accessibility, and scalability of data. With big data and cloud implementations, there is also the looming elephant in the room of data security. This key concern has been put in the spotlight with the numerous recent high-profile enterprise data breaches and an ever-growing list of industry regulations such as HIPAA, PCI, PHI, FERPA, and GDPR.

So, what can enterprises do to tackle some of these ongoing challenges?

1. Think 5-10 years out

Enterprises adopting Hadoop (or any) big data tools need to think about what’s coming next. This is especially true for companies building their own big data platforms. Flexibility and scalability are essential as emerging technologies like autonomy, AI, virtual reality and IoT will generate new kinds of data faster than ever before. Big data solutions should be as future proofed as possible, as the last thing an enterprise wants to do is implement new big data infrastructure and tools only to have to turn around and do the same thing in another couple of years.

2. Hire or partner with the right experts

With oil exploration and drilling, the ability to maximize land investment and rig efficiency comes from combining up-to-date technology with the most specific expertise on local geology. In the same way, big data projects are more than just implementing a Hadoop solution. As mentioned earlier, Hadoop implementation and management can be difficult, and having access to experts that understand an individual enterprises’ specific “geology” is key to success. This means making sure you have the right internal talent, or partnering with the right experts.

3. Take a data-centric approach to security

Enterprises need to remember when migrating big data to cloud infrastructure – whether public, private, or hybrid – the onus to secure data is theirs, not the cloud provider’s. No matter what perimeter security measures are taken, data stored in a cloud environment is especially susceptible to breach. Enterprises need to think beyond perimeter security and move to identify sensitive data – both structured and unstructured, then secure it in Hadoop/data lake as it’s ingested, and constantly monitor cloud data sources for violations.

Final thoughts

While big data has dropped off the hype cycle, it’s not going away; in fact it will only get “bigger”. Likewise, the Hadoop ecosystem has matured significantly and will continue to do so with all of the big distributions offering data science capabilities. AI is the next frontier for companies with an existing large Hadoop footprint. The desire amongst enterprises to migrate big data to the cloud will continue to increase with managed services gaining momentum. There are many challenges that come with all of these things – but with the right strategy and foresight, enterprises can truly maximize big data’s value.

 

Sign up for the free insideAI News newsletter.

Comments

  1. Wonderful article which you post and this article explain clearly and easily about Hadoop course. And Techenoid is the best online training for Hadoop because they educate based on real time use cases, explain project architecture flows even you are at entry level of Hadoop and they make you easy to understand the concepts very easily.