MapR Technologies, Inc., provider of the Converged Data Platform, announced the immediate availability of Apache Spark 1.6.1 on the MapR Converged Data Platform making it the eighth release of the full Spark stack available to MapR customers. Additionally, the free, complete online Spark On Demand Training (ODT) courses via MapR Academy have achieved the highest course enrollment rate since the ODT program’s initial launch.
We have seen a significant customer adoption of Spark for building data pipelines and advanced analytics,” said Anoop Dawar, vice president of product management, Spark and Hadoop, MapR Technologies. “MapR has fully supported the Spark stack for two years – more than any other vendor in this industry. Based on customer feedback MapR provides early preview releases so data scientists and developers can try cutting edge features and then follows it up with a GA release for production deployments.”
Spark continues to attract significant interest from developers and 30% of course registrants have already become certified as MapR Certified Spark Developers. This industry credential validates a developer’s technical knowledge, skills and abilities to use Spark in an enterprise environment to process large datasets.
Apache Spark version 1.6.1 on the MapR Converged Data Platform features:
- Improved performance gains with core Spark engine – With Spark 1.6.1 automatic memory management, both execution memory and storage memory can be changed dynamically based on workload characteristics. Execution memory can now borrow available memory from the storage region and vice versa.
- Persistence of machine learning pipelines – Spark 1.6.1 adds new features to machine learning that take persistence beyond models to persisting the entire pipeline, including transformers and estimators. The entire workflow can be persisted which includes pipeline persistence along with model persistence, without needing to write custom code for exporting or importing.
- Dataset API – Spark 1.6.1 introduces a new experimental interface called Dataset API that is an extension of the DataFrames API. Datasets contain encoders that can be used in both Scala and Java, with Python support to be added in future releases. The biggest benefit of this new Dataset API is the reduction in memory usage as it can create a more optimal layout in memory when caching datasets.
Sign up for the free insideAI News newsletter.