In this special guest feature Ajay Anand, Vice President of Products at Kyvos Insights, gives his views on speed and interactivity of OLAP and the scalability and flexibility of Hadoop. Previously he was founder and vice president of products at Datameer. Ajay earned an M.B.A. and an M.S. in computer engineering from the University of Texas at Austin, and a BSEE from the Indian Institute of Technology.
Many organizations are looking at ways to offload tasks from their enterprise data warehouse to a Hadoop-based infrastructure. This is driven by the expectation that Hadoop will deliver cost-effective scalability, providing the ability to analyze data over longer periods and with greater granularity, as well as provide the flexibility of incorporating all kinds of data sources into their analysis to derive insights.
But when it comes time to deliver these benefits to the business user, there are a number of challenges. The goal is to bring the benefits of Big Data analytics to the business user as seamlessly and transparently as possible, without disrupting their day-to-day activities. Most business users would prefer not to change the tools with which they are most familiar for doing analytics. They should be able to continue running the reports and queries they currently have, and gain additional insights from the data they are collecting on Hadoop. However, when business users try to connect existing BI tools directly to Hadoop, they are quickly disenchanted by the degradation in performance that they experience.
A common approach to try and address this issue is to pull data from Hadoop into an external data mart, and do the analysis there. However, moving data to another database introduces delays, and there are scalability limitations on the amount of data that can be processed.
Solutions are now emerging to address this problem. An approach that is gaining momentum is to bring the analytical capabilities to the data, instead of moving the data to an external data mart. This is accomplished by providing an OLAP solution directly on Big Data, combining the benefits of the scalable Hadoop infrastructure, with the ease of use of OLAP-based analytics. Existing tools can now be connected to the OLAP layer to significantly improve performance. Tests have shown that query times get reduced from minutes to seconds once the data has been organized into OLAP cubes.
There are a number of qualitative and quantitative factors that should be considered when evaluating an OLAP solution on Hadoop, such as:
- Can it scale to deal with the size and granularity (cardinality) of the data you are looking to analyze?
- What is the response time for a diverse set of queries?
- How does it perform with cold queries to deal with ad-hoc analysis?
- How does it perform with warm queries, for reports that are repeated?
- Does it support transparent access for your business users through their tools of choice?
- Can it deal with complex relationships in your data?
- Can it deal with the diverse data formats in your data?
- Does it provide the ability for your business analysts to process data and transform it without requiring them to understand Hadoop or write code?
- Can it deal with incremental data updates?
- Can it provide concurrent access to your users without significant performance degradation?
- Is it enterprise ready to support your availability and security requirements?
- Does it support your existing security infrastructure?
The good news is that the Hadoop eco-system is evolving and tools are now becoming available to provide these capabilities. We are now entering a phase when we can address the “last mile” challenges of enterprise acceptance, so Hadoop based Big Data infrastructures can begin to finally deliver on their promise and provide the return on investment to justify broad adoption.
Sign up for the free insideAI News newsletter.
Thanks Anand for sharing great insight! Yes, transferring data from one legacy system to another is critical for the business, but having solutions like OLAP and Hadoop, you can intelligently transfer big data.