In this article we’ll drill down into the capabilities of Intel® Parallel Studio XE 2020, the latest release of a comprehensive, parallel programming tool suite that simplifies the creation and modernization of code. Using this newest release, software developers and architects can speed AI inferencing with support for Intel® Deep Learning Boost and Vector Neural Network Instructions (VNNI), designed to accelerate inner convolutional neural network (CNN) loops.
High Performance Tools for AI Developers
Data-centric software applications that help solve critical problems across a range of industries, such as HPC, artificial intelligence (AI) and deep learning, as well as scientific research, demand ever-accelerating performance and faster parallel processing. Developers are challenged to deliver high-performance, scalable, and reliable parallel code that takes advantage of current and next-generation hardware. In response, more cores, more and wider SIMD registers, and competitive features are continuously integrated into the latest Intel® Xeon® Scalable processors—and the new 2020 release of Intel® Parallel Studio XE makes it easier for developers to squeeze highest performance out of Intel® platforms—today and for years to come.
Now in its 12th year, this suite of 10+ best-in-class tools and performance libraries continues its proven ability to help developers optimize code for the latest multicore and many core Intel® architectures—whether their focus is enterprise, cloud, HPC, or AI. (Check out the benchmarks here.) Using this newest release, developers can harness the latest techniques in vectorization, multi-threading, multi-node, and memory optimization, with these newest capabilities:
- Speed AI inferencing with support for Intel® Deep Learning (DL) Boost with Vector Neural Network Instructions (VNNI) in 2nd generation Intel® Xeon® Scalable Processors in Intel® Compilers, Intel® Performance Libraries and analysis tools. Intel® Xeon® Scalable processors are built specifically for the flexibility to run complex AI workloads on the same hardware as your existing workloads, taking embedded AI performance to the next level with Intel® DL Boost. VNNI can be thought of as an AI inference accelerator integrated into every 2nd Gen Intel Xeon Scalable processor.
- Develop for large memories of up to 512GB DIMMs with persistence. Identify, optimize and tune Intel® platforms for Intel® Optane™ DC Persistent Memory using Intel® VTune™ Profiler.
- Stay updated with the latest standards support, including Fortran 2018 features, C++17 (with initial C++20 support), and OpenMP 4.5/5.0.
- Understand and optimize platform configuration for applications through extended, coarse-grained profiling using platform-level collection and analysis in Intel VTune Profiler.
- Get HPC cloud support with low-latency, high-bandwidth communications for MPI applications using the AWS* Parallel Cluster* and AWS Elastic Fabric Adapter* in the Intel® MPI Library.
- Take advantage of support for the latest Intel® processors including Intel® Xeon® Scalable Processors (codenamed Cascade Lake/Cascade Lake AP/Cooper Lake/Ice Lake).
- Harness support for Amazon Linux* 2, the AWS* next-gen Linux OS that offers a secure, stable, high-performance execution environment to develop and run cloud and enterprise applications.1
- Access priority support for a full year to connect directly with Intel engineers and get quick answers to technical questions.
Deep Learning Parallelism
Deep learning has an incredible propensity to tackle data-centric problems across a wide spectrum of domains, such as object detection for autonomous vehicles, facial recognition, and natural language processing (NLP) for conversational AI, among many others.
Parallel programming plays a key role in allowing deep learning to work its magic by enabling software programs to take advantage of multicore and many core systems to accelerate the training of deep neural networks. Parallelism ensures that compute-intensive workloads are fast, reliable, and scalable, particularly important when processing deep neural networks (DNNs) using optimization methods such as stochastic gradient descent (SGD), along with popular weight update rules: learning rate, adaptive learning rate, momentum, Nesterov momentum, AdaGrad, RMSProp, and Adam. These methods are all steeped in linear algebra and partial differential equations. Allowing these methods to run in parallel can yield a reduction in training times from days to seconds. An excellent survey paper that explores parallelism from a theoretical perspective is “Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis,” by Ben-Nun and Hoefler.
Using tools like the Intel® Parallel Studio XE 2020 tool suite allows developers to capitalize on this parallelism in deep learning, specifically parallel deep convolutional neural network (CNN) training. CNN training is a computationally intensive task whose parallelization has become critical in order to complete the training within an acceptable time period. Many of the tool suite’s features enable the use of deep learning technology.
Multiple Editions
Intel® Parallel Studio XE 2020 comes in three editions, each catering to specific levels of developer needs:
- Composer Edition – Includes Intel® C++ and Fortran compilers, performance libraries, and performance-optimized Python* libraries.
- Professional Edition – Includes everything in the Composer Edition, plus performance profiling, a memory and thread debugger, and design tools to simplify adding threading and vectorization.
- Cluster Edition – Includes everything in the Professional Edition, plus an MPI library, MPI profiling and error-checking tools, and an advanced cluster diagnostic expert system tool.
Conclusion
This latest release of a tried and proven tool suite simplifies the creation and modernization of code and accelerates workloads using the latest techniques in vectorization, multi-threading, multi-node, and memory optimization. It combines industry-leading, standards-based compilers, award-winning numerical libraries, performance profilers, and code analyzers so developers—C/C++, Fortran and Python—can confidently optimize software delivering high performance code that scales efficiently on today’s and future Intel® platforms.
1Supported features of tools and libraries may vary by instances and configurations.