DAOS Delivers Exascale Performance Using HPC Storage So Fast It Requires New Units of Measurement

Forget what you previously knew about high-performance storage and file systems. New I/O models for HPC such as Distributed Asynchronous Object Storage (DAOS) have been architected from the ground up to make use of new NVM technologies such as Intel® Optane™ DC Persistent Memory Modules (Intel Optane DCPMMs). With latencies measured in nanoseconds and bandwidth measured in tens of GB/s, new storage devices such as Intel DCPMMs redefine the measures used to describe high-performance nonvolatile storage.

DAOS is an extremely lightweight I/O infrastructure that operates end-to-end in user space with full operating system bypass to deliver the performance capability of sub-microsecond storage devices. DAOS offers a shift away from the traditional HPC block-based, high-latency POSIX storage model to one that inherently supports fine-grained data access and unlocks the performance of next generation storage systems.

The Argonne Leadership Computing Facility will be the first major production deployment of the DAOS storage system as part of Aurora, the first US exascale system coming in 2021. The DAOS storage system is designed to provide the levels of metadata operation rates and bandwidth required for I/O extensive workloads on an exascale-level machine. [1]

Designed to bypass the traditional POSIX IO bottlenecks

The key to DAOS performance when used in combination with the Intel Optane DCPMM module is that the CPU can communicate in user space directly with storage via special Direct Access (DAX) instructions available on second generation Intel® Xeon® Scalable Family processors.

The difference is significant as illustrated in Figure 2, because direct user space access eliminates many of the frustrating performance limiting aspects of traditional POSIX I/O that increase latency and reduce storage bandwidth such as the operating system, page cache, block driver, and PCIe bus.

The Argonne Leadership Computing Facility will be the first major production deployment of the DAOS storage system as part of Aurora, the first US exascale system coming in 2021. The DAOS storage system is designed to provide the levels of metadata operation rates and bandwidth required for I/O extensive workloads on an exascale-level machine.

Designed to bypass the traditional POSIX IO bottlenecks

The key to DAOS performance when used in combination with the Intel Optane DCPMM module is that the CPU can communicate in user space directly with storage via special Direct Access (DAX) instructions available on second generation Intel® Xeon® Scalable Family processors.

The difference is significant as illustrated in Figure 2, because direct user space access eliminates many of the frustrating performance limiting aspects of traditional POSIX I/O that increase latency and reduce storage bandwidth such as the operating system, page cache, block driver, and PCIe bus.

Overall, the DAOS API and data model are driving a new scalable storage model for structured and unstructured data sets that supports a rich set of data models that can capitalize on the performance of Intel Optane non-volatile memory. DAOS supports both non-blocking data and metadata operations are supported plus a burgeoning software ecosystem already includes a number of rich data models.

Figure 2: DAOS is a non-POSIX rich storage API that provides a new foundation for HPC I/O

DAOS is open-source, available now

DAOS is available now. Simply download from the DAOS github repository and install to evaluate the benefits in your environment. Numerous paths can be followed to test and integrate DAOS into your HPC application environment:

  • A high-performance MPICH and ROMIO middleware layer is provided which means many HPC codes will realize full DAOS performance immediately without code modification.
  • Similarly, libdfs implements files and directories over the DAOS API by encapsulating a POSIX namespace into a DAOS container. This library can be linked directly with the application or mounted locally through FUSE.
  • Native C, Python, and Go interfaces are provided, which means most HPC applications can be ported to directly utilize DAOS.
  • Other popular IO middleware layers have been integrated into DAOS including HDFS for Hadoop codes.
  • Persistent Memory Over Fabric (PMOF) enables replication of data remotely between machines with persistent memory.
  • For those in a rush, the DAOS software can be tested without persistence using DRAM via tmpfs.

Intel® Optane™ DCPMM technology redefines storage performance – even without DAOS

By ingeniously packaging Intel Optane memory in a DDR form factor so it can sit on the processor’s memory bus, Intel Optane DCPMM literally requires new units of measure be applied to the performance of non-volatile storage. When properly optimized, Intel Optane DCPMM devices can provide up to multiple tens of GB/s of throughput at nanosecond latencies versus single digit GB/s of throughput at microsecond latencies seen with high performing NAND-based flash storage devices. [2] Even without DAOS, Intel Optane DCPMM can make key storage applications up to 17 times faster. [3] To do so, simply use the Intel Persistent Memory Development Kit (PMDK) to support the transactional operations necessary to keep the persistent data consistent and durable, or run with the XFS, EXT4, or NTFS file systems optimized to use Intel® Optane™ persistent storage.

Along with storage, Intel Optane DCPMM memory can be used as a non-persistent memory tier to augment main memory. This means that “fat” computational nodes can essentially contain many terabytes of main memory that delivers performance comparable to DRAM for many real world applications, [4] which can help with HPC simulations that struggle with memory capacity limitations on conventional DRAM-only systems, and without needing to introduce any changes to the application. [5]

Figure 3: There are many form factors and ways to use Intel® Optane™ memory

Overall, a burgeoning software ecosystem is rapidly evolving that shows what is possible for many HPC applications including those that use extremely large, globally accessible data sets. Intel optimized Apache Spark, for example, doubles throughput and reduces runtime by up to 40% for data centric workloads. Other partner frameworks such as SAP HANA, Redis Labs, AsiaInfo, Aerospike, and a number of ISVs demonstrate how the bottleneck imposed by the current popular HPC parallel distributed file-systems such as Lustre and GPFS can be eliminated.

In terms of raw performance, the graphic below shows that memory-system based Intel DCPMM persistent memory can be accessed in nanoseconds. In comparison, it takes over 80 µs for a modern NAND-based SSD, which plugs into the PCIe bus and communicates using the NVM Express protocol, to read a single block of data.

Figure 4: Extreme performance with Intel® Optane™ DC Persistent Memory

Summary

Forget what you previously knew about high-performance storage and file systems. In combination, DAOS plus Intel Optane DCPMM provide a more performant (i.e. lower latency, higher BW/IOPS), scalable (i.e. millions of MPI tasks and thousands of storage servers) plus capable (e.g. fine-grain data access, native object API, key-value store) computational storage than previous generation storage for HPC applications and workflows.

Find more information about how DAOS revolutionizes high-performance storage with Intel Optane DC Persistent Memory.

[1] https://www.marketscreener.com/INTEL-CORPORATION-4829/news/Intel-Data-Centric-Portfolio-Accelerates-Convergence-of-High-Performance-Computing-and-AI-Workload-28774002/

[2] https://lenovopress.com/lp1083.pdf

[3] https://ucsdnews.ucsd.edu/pressrelease/intels_optane_dimms

[4] https://arxiv.org/pdf/1903.05714.pdf

[5] https://www.research.ed.ac.uk/portal/en/publications/an-early-evaluation-of-intels-optane-dc-persistent-memory-module-and-its-impact-on-highperformance-scientific-applications(00807d1c-b95f-449a-8aaf-7e95d8fb6e45).html

For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com.

Intel, the Intel logo, and Optane are trademarks of Intel Corporation or its subsidiaries.

Other names and brands may be claimed as the property of others. 

© Intel Corporation.