How to Move a Petabyte of Data to the Cloud in Four Steps

Organizations of all sizes are increasingly using public cloud infrastructure, but those with hundreds of terabytes or petabytes of data find the shift to cloud more complex, disruptive, and inflexible than it is presumed. The business value of cloud storage is desirable, but large data volumes present large challenges for migration, compatibility, and agility.

Below are four steps to transition even petabyte-scale data to cloud environments from experts at SwiftStack, a company that powers hybrid cloud storage for enterprises.

1. “Drift and Shift” to cloud-native storage

By definition, data that is not yet in the cloud is stored in silos, each with specific data access protocols. This data is extremely complicated to “lift and shift” to the public cloud. Instead, a “drift and shift” strategy is more practical, shifting storage to a cloud-native format that uses on-premises storage. Data remains where it is today, so this step is both low cost and low risk, and can be done over time. The business benefits of cloud storage can be achieved on premises, and the data will be ready to move to public cloud when the time is right.

2. Automate operations

Data management software with built-in automation that operates based on policies set and controlled by IT makes it possible for even a single administrator to manage a multi-petabyte hybrid cloud infrastructure in a global organization. Define the service objectives for protection, synchronization, location, access, capacity usage, etc., and let the software control the placement of data and its delivery to applications. Users consume storage with the right policies for their applications. As the business demands evolve, so can the policies controlled by IT.

3. Stay flexible

All key public cloud providers (Amazon, Google, Microsoft, and Rackspace) use object storage platforms for long-term retention and governance of the end user’s data. While object is their default setting, there are enough differences under the hood, and enough proprietary technology, that moving a petabyte, or even part of a petabyte, from one provider to another may be intolerable. With data management across all locations and clouds, cross-cloud platform compatibility offers the flexibility that architects are looking to build into their infrastructure. This ensures IT stays in full control, allowing universal management regardless of location, and prevents provider lock-in.

4. Metadata mastery

Due to technical limitations, legacy storage like SAN and NAS systems were just not built with metadata in mind. Cloud-native storage retains metadata with the object data, rather than in a separate database only its own application can read. Cloud storage – whether public cloud, cloud-native on-premises, or a combination – is an ideal medium in which to take advantage of metadata. Harnessing, organizing, and analyzing metadata associated with petabytes of business data would have been unthinkable just a few years ago.

Pricing based on consumption, elastic scalability, improved collaboration, and other key advantages of the public cloud are attainable goals, but those with large data volumes must be mindful of their unique environment,” said Joe Arnold, SwiftStack president and chief product officer. “Fortunately these organizations will also find that the right cloud data management tactics and tools will unleash more value from that data, and respond as business needs and workloads evolve.”

SwiftStack brings the fundamental attributes of public cloud resources into the enterprise infrastructure: scalability, agility, and pricing based on consumption. Legacy applications can access and consume storage via file services, and cloud-native applications via object. Users gain the freedom to move workloads between private data centers and public clouds like Amazon or Google, and from cloud to cloud, according to administrative policies. Whether on-premise or in public cloud, data remains under the management control of internal IT, residing wherever it is needed by users and applications. SwiftStack enables the use of standard commodity hardware and choice of server hardware, drives, and networking; and scales linearly, so capacity and throughput can be added independently and cost effectively.

 

Sign up for the free insideAI News newsletter.