Ohio State Launches High-Performance Deep Learning Project

hidl

Deep learning is one of the hottest topics this year at SC16. Now, DK Panda and his team at Ohio State University have announced an exciting new High-Performance Deep Learning project that aims to bring HPC technologies to the DL field.

Welcome to the High-Performance Deep Learning project created by the Network-Based Computing Laboratory of The Ohio State University. Availability of large data sets like ImageNet and massively parallel computation support in modern HPC devices like NVIDIA GPUs have fueled a renewed interest in Deep Learning (DL) algorithms. This has triggered the development of DL frameworks like Caffe, Torch, TensorFlow, and CNTK. However, most DL frameworks have been limited to a single node. The objective of the HiDL project is to exploit modern HPC technologies and solutions to scale out and accelerate DL frameworks.

According to Dr. Panda, as a first step, we have co-designed the popular Caffe with CUDA-aware MPI libraries (specifically with MVAPICH2-GDR 2.2 release).

OSU-Caffe 0.9 Features:

  • Based on Nvidia’s Caffe fork (caffe-0.14)
  • MPI-based distributed training support
  • Efficient scale-out support for multi-GPU nodes systems
  • New workflow to overlap the compute layers and the communication
  • Efficient parallel file readers to optimize I/O and data movement
    • Takes advantage of Lustre Parallel File System
  • Exploits efficient large message collectives in MVAPICH2-GDR 2.2
  • Tested with
    • Various CUDA-aware MPI libraries
    • CUDA 7.5
    • Various HPC Clusters with K80 GPUs, varying number of GPUs/node, and InfiniBand (FDR and EDR) adapters
DK Panda, Ohio State University

DK Panda, Ohio State University

For downloading OSU-Caffe 0.9 library and the associated user guides, please visit the following URL: http://hidl.cse.ohio-state.edu

DK Panda will present more details at several talks at SC16.

Sign up for our insideAI News Newsletter