The Problem with Data Lakes: How Enterprises Can Make the Most of Their Data

In this special guest feature, Bob Friday, CTO of Mist Systems, a Juniper Networks company, believes that data lakes have grown beyond what anyone initially imagined, to the point that they have become data oceans. Fortunately, there are a number of ways that enterprises can navigate these waters to make the most of their data. Bob started his career in wireless at Metricom (Ricochet wireless network) developing and deploying wireless mesh networks across the country to connect the first generation of Internet browsers. Following Metricom, he co-founded Airespace, a start-up focused on helping enterprises manage the flood of employees bringing unlicensed WiFi technology into their businesses. Following Cisco’s acquisition of Airespace in 2005, Bob became the VP/CTO of Cisco enterprise mobility and drove mobility strategy and investments in the wireless business (e.g., Navini, Cognio, ThinkSmart, Phunware, Wilocity, Meraki).

Enterprises are looking at data lakes to solve a number of network challenges, from diagnosing network problems to detecting security breaches in networks. With the business impact of COVID-19 and network data traffic patterns changing, companies increasingly must answer questions that require distributed datasets being stored across multiple vendors – a move that will make us re-think the way we access data to help solve remote business problems. In the work-from-home future, poor conference call quality can go from a minor annoyance to a major impediment, and having the right data to help solve that problem will become necessary.

Although data lakes play a vital role in helping enterprises solve a number of network issues, our increasing dependence on data lakes has created a new issue in itself – they are simply too large given the exponentially growing amount of data being generated. Data lakes have grown beyond what anyone initially imagined, to the point that they have become data oceans. Fortunately, there are a number of ways that enterprises can navigate these waters to make the most of their data.

The Problem with Data Oceans

The transformation from data lakes to data oceans creates a number of new implications. In a world of remote workers, one of the biggest pain points that IT teams are trying to address is diagnosing where an issue or root problem is happening in the network. This becomes much harder for IT to solve when looking at an ocean of data. There are simply too many sources of information, making it difficult to understand where to begin looking for an answer.

While the vast amount of data can be overwhelming for IT and data scientists, the monetary cost of storing data can be equally overwhelming. If enterprises ignore the problem, then over time, the cost of data storage will continue to go up, adding an expensive burden to an existing problem. To use data most effectively, enterprises must learn how to virtualize their distributed data sets. That starts with making sure they have the right data needed to answer the question.

Not All Data is Made Equal

Before trying to organize these vast amounts of data, enterprises need to step back and ask themselves, “What is the question that I need answered?” This is a necessary step before piling data into data lakes. Once this question is determined, one can narrow down where to begin the journey of looking for an answer. In this way, it all starts with using the right data.

The right data should be comprehensive, accurate and current. As simple as this may sound, data quality is something that can be easily overlooked and neglected. However, with employees relying more on data lakes than ever, and with employees transitioning to remote work, this is a good time for enterprises to re-think business strategy. That strategy should include investing in data quality management tools that can help get data in a good place to answer new questions and solve new problems.

Once enterprises have identified the right information to answer their questions and virtualize their distributed datasets, they are ready to leverage their AIOps investment. This means moving from a paradigm of managing network elements to one of managing the end-to-end user experiences on their networks and the end-to-end app experiences in their data centers.

Looking Ahead: AI Self-Driving Networks

So, you have your data tagged, your distributed datasets virtualized and AI technology implemented. What’s next? From an industry perspective, the next goal is to get to the point of self-driving networks. With a self-driving network, future IT departments will have a virtual AI assistant to free up resources to work on more strategic business efforts. The virtual AI assistant of the future will detect and resolve network issues on par with a network domain expert, but at the speed of a computer.

In order to make this goal a reality, enterprises must do the work to get there. This includes creating better solutions and strategies to correlate data and virtualize access to ever-increasing distributed data sets. By starting with the initial question and the right data, enterprises can create a strong foundation to make the most of the data that they have and solve their most critical problems, while moving the industry toward the next level of AI-driven data solutions.

Sign up for the free insideAI News newsletter.