Hortonworks Delivers Spark at Scale for the Enterprise

hortonworksHortonworks, Inc. (NASDAQ: HDP), announced advancements of Hortonworks Data Platform (HDP™) with the in-memory analytic capabilities of Apache Spark. Apache Spark 1.5.2 is now generally available and includes support for Spark SQL and Spark Streaming. Hortonworks’ commitment to Spark is guided by helping customers accelerate data science, maintain seamless data access, drive innovation at the core and ultimately scale for the enterprise.

We continue to see customers across all industries derive real value from using Spark with Hortonworks Data Platform,” said Arun Murthy, Founder and Vice President of Engineering at Hortonworks. “Our customers rely on us to guide them on their Spark journey and our ability to scale Spark against massive data-sets is unparalleled.  With the inclusion of Spark 1.5.2 on the Hortonworks Data Platform, customers can now get Spark that scales.”

Accelerating Apache Spark

Hortonworks continually expands investment in Spark to better enable customers to deploy modern, Spark-based applications alongside Hadoop workloads in a consistent, predictable and reliable way. Hortonworks is providing customers the easiest path for adopting Spark with Hadoop and allowing for innovation at scale. There are three main areas of capabilities of Spark on HDP that make it ideal for the requirements of the Enterprise including:

Data Science Acceleration – Improving data science productivity by enhancing Apache Zeppelin and by contributing additional Spark algorithms and packages to ease the development of key solutions

Seamless Data Access – Hortonworks is improving Spark’s integration with YARN, HDFS, Hive, HBase and ORC. Specifically, it is working to further optimize data access via the new Data Source API. This should allow Spark SQL users to take full advantage of the following capabilities:

  • ORC File instantiation as a table
  • Column pruning
  • Language integrated queries
  • Predicate pushdown

Innovation at the Core – Contributing additional machine learning algorithms and enhancing Spark’s enterprise security, governance, operations, and readiness.

Furthering Community Innovation

In an effort to continually spur community innovation across all open technology, Hortonworks has launched Hortonworks Community Connection, a new online collaboration destination for members to share code examples on Github, ask questions and build a knowledge base in this fast changing technology era. This new online community is an extension of Hortonworks’ open source roots and underscores its commitment to open source by providing community destination to better collaborate and interact with customers, developers and partners.

Hortonworks Community Connection currently hosts thousands of technical articles and FAQs on Hadoop, Spark and other big data technologies contributed by Hortonworks engineers and customers with deep expertise. Hortonworks Community Connection is open to everyone in the community and anyone can access the content and code. Visit and contribute to the new Hortonworks Community Connection here http://hortonworks.com/community/

 

Download insideAI News: An Insider’s Guide to Apache Spark