I recently caught up with Prat Moghe, CEO of cloud data lake leader Cazena to get his take on how getting off the ground with cloud data lakes continues to be a major frustration for enterprises. We’re seeing such deployments taking at least six months and millions of dollars of annual spend for in-house development and management. There’s got to be a better way. Gartner has estimated the failure rate of big data projects as high as 80%. What can you do about companies that stubbornly hang on to legacy data strategies, using analytics/BI approaches that put them ever-more behind competitors who are modernizing their data stack with AI/ML/etc? In this interview, we’ll get some valuable perspectives for you to follow in accelerating your time-to-analytics.
insideAI News: For enterprises looking to gain ground (or for some, catch up) on data and analytics modernization initiatives as quickly as possible, what should they focus on to set themselves up for success over the long haul?
Prat Moghe: Data and analytics modernization requires embracing the cloud. We’ve seen those companies that get ahead really think of the cloud as a transformative platform. Cloud can radically simplify the tech and process complexity of legacy platforms. The cloud offers more flexibility, which is critical for the ‘long haul’ of supporting many different datasets, use cases, analytics, ML and applications.
Along with the cloud, companies need strong change agents like a Chief Data & Analytics Officer (CDAO) or similar role. These are change agents and champions for data outcomes. The best ones seamlessly straddle between IT and business.
The good news is that you don’t need many people. You just need a few people with the right mindset, who can listen and leverage the modern cloud stack for data and analytics. Sometimes you can hire them from digital natives. Savvy CDAOs can make a big impact with a small team of really good data scientists. The technology and process challenges can be solved with new cloud and SaaS data and analytics offerings that are available today. Companies can start making an impact really quickly with the right cloud platform and a small strategic team.
insideAI News: Where exactly are enterprises getting most tripped up along their data and analytics modernization journeys right now?
Prat Moghe: Organizations are trying to do a lot. They have to think about balancing the people, process and technology that will help them transform and get more agile in delivering outcomes. Each of these factors has its own challenges and, taken altogether, they slow down enterprises. Many enterprises lack the right people to get the outcomes they want. They have processes that are too complex – often due to a rigid on-premises infrastructure, with legacy applications and tools. Trying to fix all three at once is a daunting task for most companies.
The technology required for data and analytics success has been particularly challenging, slowing down outcomes or causing long delivery cycles. That’s why the cloud has been so helpful – it’s taken away many of the technology and process challenges. It’s also reduced the cost of managing platforms, freeing up resources for data scientists and more strategic roles.
insideAI News: How do inefficiencies in the end-to-end orchestration of analytics pipelines affect organizations’ data scientists and engineers?
Prat Moghe: Inefficiencies in data pipelines dramatically slow down outcomes and impact. Data scientists are looking for analytic-ready data, so that they can start to build models from and deploy. They typically don’t have operational, DevOps or data engineering skills – and most aren’t interested in that.
The challenge has been that to get data in an analytic state, it’s got to traverse through three key stages of pipeline. First, someone has to figure out how to bring data in from source systems securely. Second, they have this issue of how to wrangle that data to get it ready for analytics. Finally, they have to figure out the right platforms to land this data, whether a cloud data lake or something else, that will work with their preferred tools and methods and that meet requirements for security, compliance and governance. All of that is really hard and that’s all before the fourth and final pipeline stage – actual analytics and machine learning! That complicated pipeline slows down data scientists work and impact. No one likes that.
insideAI News: How can businesses better plan and execute around arguably that most important aim of any data modernization push: putting self-service analytics access and data democratization at the fingertips of the users who actually need it?
Prat Moghe: The key is that companies need the cloud to easily get data that’s sitting in IT to the business users that need it. But cloud alone doesn’t solve the self-service problem. There are a lot of different ways to adopt the cloud; it’s important to find the right kind of cloud platform. While early adopters had to build, integrate and assemble lots of cloud services themselves in complex DIY and DevOps projects, now there are many other ways to leverage the cloud. From SaaS offerings to private, fully-managed cloud data lakes, there are lots of choices – even for enterprises in highly-regulated industries. If you do it correctly, and plan for global access and security, the cloud can really empower self-service and connect data with business users.
insideAI News: Particularly following Snowflake’s high-profile public offering in late 2020, there’s been renewed focus on how to make sense of data warehouses vis-à-vis data lakes. Do you believe both are important for data modernization? Has the either-or question become an outdated narrative?
Prat Moghe: Cloud data lakes and cloud data warehouses are not an “or” – they are an “and” function. Cloud data warehouses like Snowflake are great as a single-purpose engine for BI and analytics, and they make it really easy. But that SQL data warehouse function is just one of many capabilities that enterprises need to create analytic outcomes.
Enterprises need to ingest data from many different sources, which requires different tools and engines. They need a data lake where they can store and prep data, and different platforms and engines are needed for that. And then for analytics, while Snowflake’s a great option for SQL, they’ll also need a data science platform beyond that.
So, Snowflake is just part of the architecture that companies need for data and analytics success. Companies really need to think about the best way to combine their cloud data warehouse, with their cloud data lake, with data ingestion and different analytics tools to deliver successful outcomes.
insideAI News: As a group, are you seeing industries that are riper for data and analytics modernization than others right now?
Prat Moghe: There are opportunities in all industries, particularly where there is legacy data or data sources that haven’t been leveraged in the past. Then there are use cases embraced by all kinds of companies like Customer 360, creating new digital products, or advanced marketing analytics. Some increasingly popular industry solutions are in manufacturing, which is using data for use cases like predictive maintenance, quality improvement, and productivity. Or take insurance and financial services, using data for everything from marketing analytics, to risk profiling, to fraud. If a company has data, now it’s easier for them to analyze, thanks to the cloud. No one needs to be held back because of technology complexity.
insideAI News: Cazena launched – relatively recently – its Instant AWS Data Lake solution. What enterprise problems is that built to solve? Whose job gets easier with an instant data lake implementation?
Prat Moghe: The way we look at it is that only 4-5% of IT spend has gone to the cloud today. And that spending has largely been driven by digital natives. The mainstream market is only just happening. Many enterprises are still in legacy, on-prem platforms and they’re looking to migrate and modernize in the cloud.
As they modernize in the cloud, many enterprises want to build cloud data lakes. They are looking for the best way to use cloud data lakes land data, prep data, and analyze data with their different tools. But these teams don’t have the skills for the cloud.
So, the AWS Instant Data Lake is really an easy button for the AWS-native stack for data and analytics. The Instant Data Lake includes EMR, QuickSight, SageMaker, Kinesis, Redshift, Glue, Lake Formation, etc. It’s an easy button that takes the best features of the AWS analytics stacks and wraps it into a completely production-ready, turnkey, fully-managed SaaS experience. Now, these mainstream companies that want to leverage cloud data lakes can get something instant. So instead of taking months to build these data lakes, now these are available in minutes. You don’t need any DevOps, SecOps or CloudOps people on these cloud data lakes, all you need are a data scientists and data engineers, and you’re ready to go.
About the Interviewee
Prat Moghe is the founder and CEO of Cazena, whose mission is to make cloud data lakes easy for enterprises. He is a successful entrepreneur with more than 18 years of experience inventing next-generation data services and building strong teams in the technology sector. As senior vice president of strategy, products and marketing at IBM Netezza, he led a worldwide 400-person team that launched the latest Netezza data warehouse appliance, which became a market leader in price and performance, as well as IBM’s first big data appliance. Following Netezza’s sale to IBM for $1.7 billion in 2010, Prat drove the company’s growth strategy and was the force behind its thought leadership in appliances and analytics.