Cloudera Launches Kudu, New Hadoop Storage for Fast Analytics on Fast Data

Print Friendly, PDF & Email

Cloudera_logo_7212015Cloudera, a leader in enterprise analytic data management powered by Apache Hadoop™, announced the public beta release of Kudu, a new columnar store for Hadoop that enables the powerful combination of fast analytics on fast data. Complementing the existing Hadoop storage options, HDFS and Apache HBase, Kudu is the first native Hadoop storage engine that supports both low-latency random access and high-throughput analytics, dramatically simplifying Hadoop architectures for increasingly common real-time use cases. A public beta of Kudu is immediately available under the Apache open source license, and will be transitioned to the Apache Software Foundation incubator in the future.

Until now, developers have been forced to make a choice between fast analytics with HDFS or efficient updates with HBase. Especially with the rise of streaming data, there has been a growing demand for combining the two features to build real-time analytic applications on changing data – leading developers to create complex architectures with the storage options available. Kudu complements the capabilities of HDFS and HBase, providing simultaneous fast inserts and updates and efficient columnar scans. This powerful combination enables real-time analytic workloads with a single storage layer, eliminating the need for complex architectures.

We’ve been making Hadoop better since the very beginning,” said Charles Zedlewski, vice president, products, Cloudera. “We have an ambitious mission: to constantly drive innovation within the community to usher in the next generation of analytics supported by Hadoop, so companies can adapt to the latest technologies. Cloudera has already transformed what’s possible with Hadoop — enabling interactive data discovery and analytics with Impala and flexible data processing and streaming with Apache Spark. Kudu continues this trend by revolutionizing Hadoop’s storage architecture to better support development of real-time analytic applications, and serves as a crucial step towards solidifying Hadoop as the leading platform for modern analytics.”

Kudu’s architecture streamlines the developer experience for building analytic applications – supporting common use cases that include time series analysis, machine data analytics, and online reporting. Additionally, Kudu is designed to take advantage of changing trends in hardware and in-memory processing. It delivers outstanding CPU performance, takes advantage of RAM and flash, and drives high I/O efficiency as a true columnar store. Finally, as a native, open component within Hadoop, Kudu is integrated with and provides faster query performance for the most powerful analytic frameworks. Users already rely upon many of them, including Impala and Spark – for end-to-end analytic applications in a single platform.

Kudu was jointly engineered by Cloudera and Intel in advance of the changing hardware landscape. Intel has actively contributed to Kudu to help it take full advantage of current and future Intel processor and memory technologies. Kudu was designed to use new persistent memory (pmem) innovations being developed through Intel’s pmem project.

As Hadoop analytics evolve, it’s critical that they are designed with next-generation hardware in mind,” said Vin Sharma, Intel’s Director of Strategy & Products for Big Data Analytics. “Kudu is a critical milestone for Hadoop, supporting the growing need for simplified real-time applications. Intel worked with Cloudera and the community to ensure Kudu is optimized for fast analytic performance today, but is also built to use Intel’s platform advancements well into the future, such as Intel DIMMs with 3D XPoint memory.”

As an open source project, Kudu has drawn wide involvement from the community. Xiaomi, one of the largest smartphone developers in the world, has been one of the first beta users of Kudu and actively contributes to the project. Other organizations, including AtScale, Splice Machine and Zoomdata, have also been developing on Kudu.

Xiaomi has been a long-time user of and contributor to the Hadoop ecosystem, using it to power a wide range of use cases across our business,” said Baoqiu Cui, Chief Architect at Xiaomi. “Our infrastructure team has been working with Cloudera to develop Kudu, taking advantage of its unique ability to support columnar scans and fast inserts and updates to continue to expand our Hadoop ecosystem footprint. Using Kudu, alongside interactive SQL tools like Impala, has allowed us to build a next-generation data analytics platform for real-time analytics and online reporting. We are excited to continue to work with the community to further drive Kudu and the capabilities of Hadoop as a whole.”

For organizations to continue to benefit from data-driven insights, Hadoop’s architecture has to work at the same, ever-accelerating speed at which data is being created and changed. With Kudu, the Hadoop community ushers in the next generation of Hadoop applications with storage for fast analytics on fast data.

In the era of machine-generated data, there’s an increasing need to analyze data in human real-time. This is true across a broad range of analytic use cases, from monitoring and business intelligence to predictive modeling and recommendation,” said Curt Monash, president, Monash Research. “Kudu, Spark and the rest of the Hadoop stack are a promising approach toward eventually meeting those needs.”

 

Sign up for the free insideAI News newsletter.

Speak Your Mind

*