We recently sat down with Mike Lamble who is CEO at Clarity Solution Group, a recognized data and analytics consulting firm, to discuss some of the more pressing topics in data management. Here is what he had to say.
insideAI News: There seems to be a seismic shift in the way organizations are housing data. What do you think is behind the massive adoption of these next generation data warehouses?
Mike Lamble: Flexibility, scalability, and lower TCO (total cost of ownership) all point to Hadoop for data landing and staging. The twenty-plus-year-old “data factory” unidirectional model is being challenged, if not up-ended, by approaches that go from data to dashboard in weeks rather than months, bi-directional flows between data lakes and analytic sandboxes and EDWs, and end user self service analytics. Legacy environments can’t keep up affordably. Next generation enterprise environments must support diverse data types, internal and external data, more users and usage types, and scale indefinitely. In fact, we’re entering into an age where the scope of analytics is wider and data environments are federated across IT and business units, leveraging IT and non-IT horse power. This new era overlays decades of data management knowledge onto fit-for-purpose scalable technologies, and more tightly couples enterprise information with data democratization, advanced analytics, and BI dashboards.
insideAI News: The Data Lake is yet another emerging paradigm that has a lot of folks scratching their heads. What are your thoughts on this and how do they relate to the almost ubiquitous use of Hadoop?
Mike Lamble: “Better, cheaper, faster” is good. Schema-less writes, fitness for all data types, commodity hardware and open source software, limitless scalability – is also good. That said, out of the box Hadoop-based Data Lakes are not industrial strength. It’s not as simple as downloading the Hadoop software, installing it on a bunch of servers, loading the Data Lake, unplugging the enterprise Data Warehouse (EDW) and — voila. The reality is that the Data Lake architecture paradigm – which is a framework for an object-based storage repository that holds data in its native format until needed – oversimplifies the complexity of enabling actionable and sustainable enterprise Hadoop. An effective Hadoop implementation requires a balanced approach that addresses the same considerations with which conventional analytics programs have grappled with for years: establishing security and governance, controlling costs and supporting numerous use cases. Failure to address these concerns can be disastrous, resulting in increased operational maintenance costs.
insideAI News: Speaking of data governance how is the intersection of BI and Big Data changing the landscape?
Mike Lamble: This is an interesting subject and we have seen tremendous progress in terms of data governance. First generation data warehouses introduced cross-silo integration, the second introduced data quality management, and the third emphasized governance. Where Big Data equates to Hadoop which translates to Data Lakes, we’re at risk of throwing away a whole lot of progress. On the technology side, metadata management tools in Hadoop are immature. On the use case side, power users are pulling uncleansed raw data from Data Lakes and putting it to work. Duplication of effort, processes, and algorithms are inevitable. “Multiple versions of the truth” will result, something first generation data warehouses aimed to reel in. Again, applying data governance discipline to the Data Lakes is imperative.
insideAI News: It’s no surprise that with the increasing number of digital channels, there is an equally increasing explosion of data. How are Hadoop and the subsequent related analytics driving customer engagement?
Mike Lamble: Companies today face unprecedented challenges due to the sheer amount of data and channels available. Traditional data management architectures are not optimal for new digital use cases because they’re too slow and expensive and limited to structured data. In fact, the information architecture required to analyze all the data coming from the digital space requires new solutions, and many consumer organizations are considering enterprise Hadoop solutions instead of traditional RDBMS technologies. To effectively engage and influence consumers, organizations need both structured and unstructured data across all consumer touch points. Hadoop extends the scope and scale of consumer analytics, and progressive organizations are actively adopting Hadoop to create personalized consumer engagement.
If implemented correctly, the insight can give consumer-focused organizations a significant competitive advantage. Conversely, if implemented incorrectly, it can result in frustrated business users and failed projects, if not lost customers. It’s no surprise that reaching consumers is more difficult than ever. From my perspective, the effectiveness of legacy push models for consumer engagement continues to erode, and consumers are increasingly engaging through more and more digital platforms.
insideAI News: In your opinion, how far along the curve is the Hadoop environment? Has it reached maturity and if so what does a mature implementation look like?
Mike Lamble: While the Hadoop ecosystem is maturing quickly, capabilities that ensure security, business continuity, data governance and accessibility are still less than industrial strength. Bridging the capability gap requires a strategy that delivers on enterprise needs while enabling the value of Hadoop. Data security and governance processes should be repeatable and simple to implement and enhance. A mature enterprise strength Hadoop (MESH) implementation is achievable. Years of EDW implementations have taught us that platform capabilities do not guarantee success. A MESH framework provides an interoperable matrix of architecture, governance and enablement vectors, accelerating the real lifetime value of Hadoop. A MESH framework also ensures agility across the breadth of Hadoop use cases: acquisition and ingestion, archival data management, real-time event processing, master data integration, data transformation, information delivery, discovery analytics, and machine learning.
Sign up for the free insideAI News newsletter.