In this special guest feature, John Thielens of Cleo discusses how the ability to connect to non-traditional storage repositories can solve the security, access control, and scalability challenges of data lakes, which are more suited to handle today’s less structured data. John is Vice President of Technology at Cleo and is responsible for crafting technology strategy, innovation, and architecting enterprise integration solutions to solve complex requirements in multi-enterprise, cloud, collaboration, mobile, and other integration challenges. He has more than 30 years of experience in the software industry. Most recently, he held the position of Chief Architect, Product Suite, and CSO with Axway. In prior roles, he served as a senior technology leader in GXS, Inovis, Tumbleweed, and other software technology companies. John holds a mathematics degree from Harvard University.
Where do you store your most important data these days? Does it all fit there?
Because companies manage and use data with increased volumes, variety, and velocity than in the past, existing data architecture is evolving beyond traditional databases, data stores, data warehouses, and the like into a more unfiltered repository known as the data lake.
The demand for increased agility and accessibility for information analysis drives the data lake movement, and for a number of good reasons. But that’s not to say that SQL databases, enterprise data warehouses, and the like will be immediately replaced by data lakes. Rather, these tools are likely to be augmented by them, as data sources, data sinks, or both.
By capturing largely unstructured data for a low cost and storing various types of data in the same place, a data lake:
- Breaks down silos and routes information into one navigable structure.
- Enables analysts to easily explore new data relationships, unlocking latent value.
- Helps deliver results faster than a traditional data approach.
So in an era where business value is based largely on how quickly and how analytical you can get with your data, connecting your organization to a modern data lake facilitates lightning-quick decision-making and advanced predictive analytics.
Data Lake Drivers
An enhanced customer experience commonly drives data lake investment for retailers, but some other verticals that benefit from increased analytics include:
- Healthcare: Health systems maintain and analyze millions of records for millions of people to improve ambulatory care and patient outcomes.
- Logistics: Transport companies manage geolocation information to map more fuel-efficient routes and improve employee safety.
- Law enforcement: Law enforcers can compare MOs across multiple databases (local, state, federal) and case management tools to solve crimes faster.
But some concerns surround the data lake concept, including security, access, and the scalability required to accommodate future streams while retaining all current data for future analysis. Essentially, companies only get out what they put into data management, and an optimized gateway ensures a proper return on data lake investment.
The Requirements
“Purpose-built systems” whose core capabilities are carrier-grade scalability, secure data transfers, and the ability to connect to non-traditional storage repositories (Hadoop, NoSQL, Software-Defined Storage, etc.) can solve the security, access control, and scalability challenges of data lakes, which are more suited to handle today’s less structured data.
The modern big data gateway, which varies from traditional ETL (Extract, Transform, Load) architectures, supports the “schema on read” data lake principle, meaning organizations do not need to know how they will use the data when storing it.
Schema-on-read advocates keeping raw, untransformed data, and without transformation on ingestion, companies can move faster and create new acquisition feeds quickly without thinking about mapping, granting your business data agility now while asking the compelling data-use questions later.
Additionally, transformation often results in discarding supposedly worthless information that later may turn out to be the dark matter comprising the bulk of your information universe, so data lakes’ schema-on-read functionality proves exponentially more useful.
The Big Data Gateway: Challenges Solved
The promise of improved analytics and business agility is broken when data is not easily accessible, so companies undoubtedly must have connected data. After all, a data lake with stagnant (or worse – non-existent!) information flows becomes more of a data swamp.
Pave the road for your organization’s advanced data initiatives with a big data gateway solution built for the access and control of today’s modern enterprise.
Sign up for the free insideAI News newsletter.
I agree that “some concerns surround the data lake concept, including security, access, and the scalability required to accommodate future streams while retaining all current data for future analysis.”
I recently read the Gartner Report “Big Data Needs a Data-Centric Security Focus” concluding “In order to avoid security chaos, Chief Information Security Officers (CISOs) need to approach big data through a data-centric approach.
The good news is that Big Data distributions, like Hortonworks, recently started to include the type of advanced security features that Gartner is recommending, including masking, fine grained encryption, and data tokenization.
Ulf Mattsson, CTO Protegrity