In this special guest feature, Mike Vogt, Data Practice Lead at SPR Consulting, believes it is absolutely critical for organizations that deal with large amounts of data to carefully protect their information repositories. He suggests that it’s very important that IT departments set boundaries and give clear directives on how all data is used when the rest of their organization begins “swimming” — or using this data to make decisions — in the data lake, and offers four rules to follow. Mike is the Practice Director for Data, Analytics, and Machine Learning at SPR, a digital technology consultancy in Chicago. A technology leader with experience in implementing cutting-edge enterprise solutions, Mike leads the practice strategy, growth, human capital management, and offerings development for data engagements within multiple industry settings, including healthcare, capital markets, financial services, manufacturing, and transportation.
The term “data lake” can be incredibly misleading.
When most people think of a lake, it’s a serene setting full of clear water, or a liquid playground where you plop in a boat and swim for the day. But in business settings, the data lake — a central repository where raw data is stored — is often anything but. It’s more like a murky swamp of information, with hidden hazards that can lead to bad business decisions and lost revenue.
It is absolutely critical for organizations that deal with large amounts of data to carefully protect their information repositories, just as natural lakes often come with rules and regulations to make sure those who swim are safe. The reasons to do so are overwhelming, as 40 percent of people surveyed recently by PwC report having made a decision based on data. But, if decisions are being made with flawed, inconsistent data, it makes the whole effort of collecting and storing it in the first place irrelevant.
It is critical that IT departments set boundaries and give clear directives on how all data is used when the rest of their organization begins “swimming” — or using this data to make decisions — in the data lake. To do so, they should consider setting the following rules:
- Create swimmable areas: Lakes have plenty of signage that communicates where it is safe to get in the water. This is likely because the area has been cleared of debris and hazards. For IT, the idea is the same. Context is absolutely everything when it comes to big data. Making sure all data sets are labeled for which functions departments can use it, and for what decisions, is imperative. Sales may be able to decide a certain strategy based on one set of numbers, but finance may not be able to forecast the next quarters with that same set. When everyone can clearly see where it is safe for them to swim, it’s much less likely data gets used for the wrong decisions.
- Clear the water deliberately: Instead of focusing on filtering and scrubbing the entire repository of data for general use, it’s critical to instead focus on the data that is most important, and will be used most often for decision making. Some data is and will remain irrelevant to an organization, and it’s wasteful to spend time trying to clear it up. Be strategic about what data is filtered and what is left raw.
- Make sure there is a lifeguard on duty: Creating safe places to swim is one thing, but actively watching those areas is another. IT should make sure they are policing the data on a consistent basis. This means setting permissions to who has access to it within each department, and having regular meetings with those individuals about how it is being used to make decisions. This also means regularly going back to make sure all metadata and other policies that dictate how the data is to be used is up to date and clearly understood across the organization.
- Always keep a current map: A lake’s ecosystem is constantly evolving and changing, and the data lake is no different. With new inputs and information sources being added all the time, it’s quite likely an organization’s data lake is constantly getting bigger. Because of this, it is important for IT to regularly survey the data ecosystem. In many cases, there may be new opportunities that can allow leadership to make more informed decisions, or have data confirm the right decisions have been made in the past. Doing this brings IT much closer to the table with leadership, and helps them make a direct impact on creating value across departments.
As businesses increasingly look to data to influence their decisions and guide their strategy, IT acts as the most important custodian of the data lake, making sure it is safe and navigable for anyone that is looking to use it. When IT takes the role of steward and lifeguard, they are ensuring an organization is safely using data to its fullest and most valuable potential.
Sign up for the free insideAI News newsletter.