MapR Converges SQL and JSON with Apache Drill v1.6

MapR Logo - New 2014_FEATUREMapR Technologies, Inc., provider of the Converged Data Platform, announced the availability of Apache Drill 1.6 as the unified SQL layer for the MapR Converged Data Platform via tighter integration with MapR-DB.  Customers and partners benefit from the flexibility of reporting and analytics on JSON data stored in MapR-DB tables, realizing faster time-to-value with insights gleaned from operational data.

According to Hadoop Weekly*, “The Apache Drill project has one of the fastest release velocities in the Hadoop ecosystem with a new release nearly every month.”  Version 1.6 of Apache Drill, which is now available on the MapR Converged Data Platform, offers a new MapR-DB document database plugin, enhanced performance and scale, and optimized Tableau and BI tool experience.

Interest and adoption of Drill, which was recognized as one of the best in open source big data technologies, continues to grow in popularity. Thousands of users have downloaded Drill and numerous organizations have it in production, interactively analyzing up to PBs of data. Additionally, over 6,000 BI analysts and developers worldwide have completed Drill training courses provided by the free On-Demand Training program from MapR.

Apache Drill is a game changer for us,” said Edmon Begoli, CTO of PYA Analytics. “Most recently, we have been able to query, in under 60 seconds, two years worth of flat PSV files of claims, billing, and clinical data from commercial and government entities, such as the Centers for Medicaid and Medicare Services. Drill has allowed us to bypass the traditional approach of ETL and data warehousing, convert flat files into efficient formats such as Parquet for improved performance, and use plain SQL against very large volumes of files.”

Highlights of Drill 1.6 include:

  • Flexible and operational analytics on NoSQL – The new MapR-DB document database plugin allows analysts to perform SQL queries directly on JSON data stored in MapR-DB tables. There are a variety of pushdown capabilities available with this plugin to provide optimal interactive experience.
  • Enhanced query performance – Provides better query performance on data in Hadoop and NoSQL systems via numerous query planning improvements, such as partition pruning, metadata caching and other optimization improvements. Delivers up to 10-60X performance gains in query planning compared to the previous releases of Drill.
  • Better memory management – Delivers greater stability and scale which enables customers to run not only larger but also more SQL workloads on a MapR cluster.
  • Improved integration with visualization tools like Tableau – Offers metadata query performance improvements and introduces client impersonation for end-to-end security from the visualization tool to data in Hadoop.  Version 1.6 also provides enhanced SQL Window functions.

Drill is used in a variety of use cases.  For example, media companies can instantly query and analyze incoming content delivery network (CDN) files without requiring data transformations, allowing them to analyze several terabytes of CDN logs and reduce customer attrition.  High-tech chip manufacturers can develop offerings that allow them to better analyze dropped calls and provide that information to their handheld device partners and thereby improve quality of service.  Communications providers can instantly query and analyze logs from cell towers that enable mobile operators to proactively monitor and improve subscriber experience.

Operational analytics on document databases such as MapR-DB is a rapidly growing use case,” said Neeraja Rentachintala, senior director, Product Management, MapR Technologies. “For the first time, there is a stack that allows BI developers and business analysts to store and query data in native formats without cumbersome ETL or transformation, providing end-to-end flexibility and scale.”

*Hadoop Weekly #162, March 20, 2016

 

Sign up for the free insideAI News newsletter.