Analytics, AI and Machine Learning continue to make extensive inroads into data-oriented industries presenting significant opportunities for Enterprises and research organizations. However, the potential for AI to improve business performance and competitiveness demands a different approach to managing the data life-cycle. Here’s five key areas to strongly consider when creating and developing an AI data platform that ensures better answers, faster time to value, and capability for rapid scaling.
Saturate Your AI Platform
Given the heavy investment from organizations into GPU based compute systems, the data platform must be capable of keeping Machine Learning systems saturated —across throughput, IOPS, and latency—eliminating the risk of under-utilization of this resource.
Saturation level I/O means eliminating application wait times. As far as the storage system is concerned, this requires different, appropriate responses depending upon the application behavior: GPU-enabled in-memory databases will have lower start-up times when quickly populated from the data warehousing area. GPU-accelerated analytics demand large thread counts, each with low-latency access to small pieces of data. Image-based deep learning for classification, object detection and segmentation benefit from high streaming bandwidth, random access, and, fast memory mapped calls. In a similar vein, recurrent networks for text/speech analysis also benefit from high performance random small file access.
Build Massive Ingest Capability
Ingest for storage systems means write performance and coping with large concurrent streams from distributed sources at huge scale. Successful AI implementations extract more value from data, but also can gather increasingly more data in reflection of their success. Systems should deliver balanced I/O, performing writes just as fast as reads, along with advanced parallel data placement and protection. Data sources developed to augment and improve acquisition can be satisfied at any level, while concurrently serving Machine Learning compute platforms.
Flexible and Fast Access to Data
Flexibility for AI means addressing data maneuverability. As AI-enabled data centers move from initial prototyping and testing towards production and scale, a flexible data platform should provide the means to independently scale in multiple areas: performance, capacity, ingest capability, lash-HDD ratio and responsiveness for data scientists. Such flexibility also implies expansion of a namespace without disruption, eliminating data copies and complexity during growth phases. Flexibility for organizations entering AI also suggests good performance regardless of the choice of data formats.
Scale Simply and Economically
A successful AI program can start with a few terabytes of data and ramp to petabytes. While flash should always be the media for live AI training data, it can become economically unfeasible to hold hundreds of terabytes or petabytes of data all on flash. Alternate hybrid models can suffer limitations around data management and data movement. Loosely coupled architectures that combine all-flash arrays with separate HDD-based data lakes present complicated environments for managing hot data efficiently.
Integration and data movement techniques are key here. Start small with a flash deployment and then choose your scaling strategy according to demand; either scaling with flash only, or combining with deeply integrated HDD pools, ensuring data movement transparently and natively at scale.
Selecting a Partner Who Understands of the Whole Environment
Since delivering performance to the application is what matters, not just how fast the storage can push out data, integration and support services must span the whole environment, delivering faster results. This underscores the importance of partnering with a provider that really understands every aspect of the environment—from containers, networks, and applications all the way to file systems and flash. Expert platform tuning to your workflow and growth direction is paramount to removing barriers in your path to value from AI, and enabling the extraction of more insights from data.
The new AI data center must be optimized to extract maximum value from data—that is, ingesting, storing, and transforming data and then feeding that data through hyper-intensive analytics workflows. This requires a data platform isn’t constrained by protocol or file system limitations, or a solution that end up being excessively costly at scale. Any AI data platform provider chosen to help accelerate analytics and Machine Learning must have deep domain expertise in dealing with data sets and I/O that well exceed the capabilities of standard solutions, and have the tools readily at hand to create tightly integrated solutions at scale.
About the Author
Kurt Kuckein is the Director of Marketing for DDN Storage, and is responsible for linking DDN’s innovative storage solutions with a customer focused message to create greater awareness and advocacy. In this role, Kurt oversees all marketing aspects including brand development, digital marketing, product marketing, customer relations, and media and analyst communications. Prior to this role, Kurt served as Product Manager for a number of DDN solutions since joining the company in 2015. Previous roles include Product Management and Product Marketing positions at EMC and SGI. Kurt earned an MBA from Santa Clara University and a Bachelors of Arts in Political Science and Philosophy from University of San Diego.
Sign up for the free insideAI News newsletter.
Wowwwww
Amazing Article and thanks for providing this useful Information
Amazing article. Thank you for sharing your knowledge. I found your article very helpful. Thank you!