Intel® Enterprise Edition for Lustre* Software (Intel® EE for Lustre*) has taken a leap toward greater enterprise capabilities and improved features for High Performance Computing (HPC) with release of version 3.0. This latest version includes new security enhancements, dynamic LNET configuration support, ZFS snapshots, and other features asked for by the HPC community inside and outside the enterprise. Additionally, it adds the Intel® Omni-Path Architecture drivers.
Intel® EE for Lustre* software version 3.0 contains all the major updates to the core Lustre code that are present in the Intel® Foundation Edition for Lustre* 2.7 release tree. It includes the latest stability fixes and performance enhancement from extensive production deployments and comprehensive test frameworks.
Support for Intel® Omni-Path Architecture Fabric
With the growing momentum behind Intel® Scalable System Framework, which includes the recently announced Intel® HPC Orchestrator, and the company’s Intel® Omni-Path Architecture (Intel® OPA) fabric, Intel EE for Lustre software 3.0 now supports Intel’s HPC fabric for systems running Red Hat Enterprise Linux* (RHEL) 7.2 and newer, allowing Lustre to take advantage of Intel OPA’s high data rates and low latency. The Intel® OPA driver is not available for RHEL 6.x based systems.
Enhanced Security for Open Networks
Traditionally, Lustre has run in physically secure environments, where users with access are known and limited. As Intel EE for Lustre software gains traction in more enterprise environments, and more institutions use Lustre for their general purpose file system to take advantage of Lustre’s performance, built-in security features have become much more important. Thus, Intel EE for Lustre software 3.0 includes support for authentication, network encryption, and policy-based client security.
Lustre uses Kerberos authentication and encryption. Kerberos in Lustre can authenticate individual clients, users, and the servers to allow clients to mount the file system and give users access to the files. Authentication is mutual—client to server and server to client—using standard Linux* user credentials and Kerberos-generated keys. Additionally, Kerberos provides encryption for information sent over the network. The technology has several encryption algorithms at its disposal, and Intel® processors integrate hardware acceleration for the algorithms used in Lustre. With Kerberos in Lustre, Intel® EE for Lustre* software can establish trust between Lustre servers and clients, and optionally, support encrypted network communications.
On the client side, Lustre supports Mandatory Access Controls (MAC) with SELinux*. While not everyone enables it on their clients, SELinux is a mature access-control platform for Linux systems. It was originally developed by the United States’ National Security Administration, and it is available in many Linux distributions to enforce access control policies, including Multi-Level Security (MLS). SELinux policy does not propagate to the servers, however.
Single client metadata concurrency boosts performance
Prior to release of version 3.0 client, RPCs that modified file system meta data was done serially, limiting the rates at which a client could access and update data. That restriction is now removed. Version 3.0 now allows multiple metadata RPCs to be in flight simultaneously, per-client, for both read and write transactions. This can improve the performance of applications with metadata-intensive workloads and per-client scaling for these workloads.
Intel continues commitment to OpenZFS on Lustre
OpenZFS* is a popular file system used on Lustre clusters. Intel EE for Lustre software version 3.0 updates OpenZFS/SPL software to the latest stable version 0.6.5 from the ZFSonLinux project. This brings with it several performance and stability improvements. One of the most significant additions is the ability to set the dataset record size up to 1MB, which improves throughput IO for large files.
Additionally, version 3.0 supports OpenZFS snapshots, a data protection feature that enables checkpointing of a file system volume. Common use cases for this feature include the following:
- Quick undo/undelete/roll-back in case of user/administrator error
- Prepare a consistent, read-only view of data for backup
- Prepare for a software upgrade
Intel has developed a mechanism in Lustre that is capable of leveraging ZFS to take a coordinated snapshot of an entire Lustre file system, provided that all of the storage targets in the file system are formatted using ZFS. The snapshot is taken across the whole file system, which can then be mounted as a separate name space on a Lustre client, with the snapshot appearing as a separate Lustre instance.
Dynamic LNet configuration simplifies LNET management and tuning
Dynamic LNet configuration (DLC) is a powerful extension of the LNet software that simplifies system administration tasks for Lustre networking. It allows an operator to add and remove interfaces and make changes to LNet while LNet is still active, without removing and reloading the kernel modules. This enables dynamic tuning and optimization while Lustre is still running on the target node. DLC also applies to LNet routers; they can be added, removed, and updated without affecting other Lustre network traffic.
With Intel EE for Lustre software version 3.0, LNet can now be managed from the command line or through a C API, rather than exclusively through kernel module options. Intel’s version still supports kernel options, but the new tools make it easier to apply changes and to audit the running configuration.
Online Lustre File System Consistency Checks (LFSCK)
LFSCK is an administrative tool first introduced in Lustre software release 2.3 for checking and repairing attributes specific to a mounted Lustre file system. LFSCK does this without downtime, and it can be run on the largest Lustre file systems with negligible disruption to normal operations.
Development of the LFSCK feature has progressed across multiple Lustre software releases. In this Intel EE for Lustre software release, LFSCK has the following functionality:
- Verify and repair the Object Index (OI) table, which is used internally to map Lustre File Identifiers (FIDs) to MDT internal ldiskfs inode numbers in the OI Table. An OI Scrub traverses the OI Table and makes corrections where necessary.
- Namespace scanning verifies and repairs directory FID-in-Dirent and LinkEA.
- New in this release, layout scanning verifies and repairs MDT-OST file layout inconsistencies.
- Also new in this release, layout scanning verifies and repairs inconsistencies between multiple MDTs.
LFSCK is similar in concept to an offline FSCK repair tool for a local file system, but LFSCK is implemented to run as part of the Lustre file system, while the file system is mounted and in use.
Updates to Intel® Manager for Lustre* software
Intel EEL for Lustre software version 3.0 also updates Intel® Manager for Lustre* software to include support for the latest RHEL operating system releases. The managed mode platform supports the High Availability (HA) framework updates in RHEL 7.2, as well as continuing to support RHEL 6.x servers.
The user interface has been upgraded with new workflows and improvements to responsiveness, performance, layout, and navigation. More customization options have been added to the GUI, including customization of common Corosync parameters. Alerts that pop up on the display can now be addressed from within the alert window, rather than having to change context. Search improvements have been made to the status screens, and operators can construct their own queries.
Intel and the Lustre community continue their development commitments to Lustre’s performance and functionality, giving it more capabilities requested by enterprise users while not sacrificing its performance as an HPC file system.