Petabytes to Zettabytes: Operational Challenges of Cluster Infrastructure

A new digital future is taking shape — one where soon, most new data will be distributed and processed at the edge, outside of cloud data centers. IDC reports that the global datasphere – the amount of data created and consumed in the world each year – will grow from 45 zettabytes (ZB) in 2019 to 175 ZB by 2025.The majority – 59% – of that generated data is expected to be stored outside the public cloud.

Storing that massive amount of data and the associated operations and processing will require more infrastructure via the edge or private clouds and will continue for years to come. This new infrastructure will look much different from public cloud infrastructure consisting of relatively few but immense data centers filled with endless rows of equipment. Instead, the physical data center infrastructure is extending out through smaller and more distributed sites, conveniently located as close as economics permit to data sources, complementing the existing centralized core.

“As we forge ahead into a new distributed infrastructure world in this new Zettabyte Generation,” Jarrett Appleby CEO for Appleby Strategy Group, a global advisory business says, “Organizations must plan how to be more modern and agile while structuring their environments and global distributed infrastructure to handle new applications collecting and analyzing data in real-time.”

Infrastructure and how it will be managed is undergoing a metamorphosis due to several key factors.

Public Cloud Has Forever Raised the Bar for All IT

As more and more enterprises adopt cloud-based services, the expectations for operational support and capabilities are aligning to what they are experiencing from global public cloud providers, which are delivered via huge operations teams and seemingly limitless budgets. This is creating an ever-increasing operational gap between their business expectations and what they can achieve utilizing their own resources and budgets or those of their managed service providers.

The expectation is that modern, up-to-date tech stacks are ubiquitous. Delivery of this falls upon operations teams, who need the appropriate tools to automate and orchestrate the resources and environments. There’s a lot of catching up to do – it is a major commitment in time and resources.

Data Generation Outside the Cloud

As organizations go through their digital transformation journeys, they are seeking information about their key processes and workflows and as a result, data everywhere is exploding. However, today less than 10 percent of data is created outside the data center or cloud. By 2022 Gartner predicts this figure will explode to more than 50 percent, and 75 percent by 2025.

Organizations operating in the real world, including brick and mortar retail, manufacturing, transportation, energy, healthcare and more are putting the infrastructure in place to measure, collect and analyze their data to understand and improve their performance.

Not only does this data need to be analyzed, it is used to create tight feedback loops for continuous improvement. This will result in the need for faster and faster feedback loops to respond to changing conditions.

Data Gravity is Driving Distributed Infrastructure

As more data is generated, it needs to be aggregated and processed to extract value and gain insights. With the increasing prevalence of AI and machine learning, organizations will also incorporate existing historical data – the more the better. All of this results in pockets of data gravity, manifested in distributed storage locations.

Thus, a storage cluster with data gravity attracts an increasing number of associated applications, manifested in compute clusters. It is easier and more cost-effective to add the compute cluster rather than to continue moving data elsewhere.

“Historically network-centric and network-only locations are evolving to increasingly host compute and storage infrastructure,” Sean Iraca, Founder and Principal, Double Time Consulting says. “As a result, the future of cloud – public and private – is distributed and brings with it new operational challenges that have yet to be effectively addressed.”

Operators have done a phenomenal job of building a high-capacity distributed networks across the globe. As silicon and optical economics continue to reduce the effective footprint of networking, these networking locations are a natural choice to deploy the additional compute and storage infrastructure, effectively transforming distributed networks to become distributed compute, storage and networking locations.

While 5G edge computing deployments are one example of this, other areas of the ecosystem are adjusting to this reality as well. Data centers don’t grow on trees, and the proliferation of new multi-service edge data center solutions (MEC) focused on extending facilities beyond the centralized core is well documented. Data center operators like Equinix that historically focused on providing space, power and interconnection have been enjoying an incredibly successful moment and have started offering compute services.

Network-only locations will evolve to increasingly host compute and storage infrastructure. As a result, the future of cloud – public or private – will be distributed and brings with it new operational challenges that have yet to be effectively addressed.

Orchestration Challenges for Operating Distributed Infrastructure at Scale

Unlike the case in the centralized core public cloud where many applications are relatively static across few locations and resources are plentiful, the challenge for the industry revolves around how to more dynamically manage and orchestrate resources across many sites with resources that are scarce.

How do we make all the elements of a cluster workload-ready? This involves coordinating the physical resources – the servers, storage and networking, along with all the software-related infrastructure, for a cloud-like environment in one location. Then how do we do the same thing at 100 other locations?

Mastering Foundational Infrastructure

The foundations of infrastructure are the necessary cluster hardware and software infrastructure that provide the resources and services that applications depend upon. During the age of the cloud, the resulting technology gains accelerated as organizations embarked on their digital transformation journeys. This is further complicated as a result of these environments being distributed across multiple locations.

The first step in managing this in an agile manner is understanding the assets themselves. Infrastructure inevitably changes over time due to software updates and upgrades which can be a challenge to keep up to date. Applications have their own release cadence with their own potential hardware and software compatibility. Managing this complexity is no longer something that can be managed in a spreadsheet.

Once the foundational infrastructure can be accurately understood, the next step is codifying the deployment and management of the appropriate software infrastructure. The layering of software infrastructure continues to grow – we have seen the rise of microservices and software-defined everything – all of which add their own sets of service requirements that must be met. Managing an application’s configuration options across permutations across layers lead to dizzying combinatorics that make deployment, managing and troubleshooting an arduous task that slows everyone down.

Too Many Snowflakes

Operations is considered an expense that must balance cost management while achieving the required application service levels. With most organizations under tight budgetary constraints, they must consider how to do meaningfully more with less. This can lead to re-evaluating the current status quo.

Historically, networking, server, and storage technologies have each required uniquely focused attention over the decades. This independent evolution has resulted in silo-specific skills and cross-silo complexity. Compare that to operations in hyperscale environments; cloud operators have ruthlessly standardized on equipment, protocols, stacks and tools to limit the number of items that need to be mastered.

For highly centralized, monolithic, hyperscale environments, the benefits of economies of scale have afforded billions in investment and out-of-the box thinking, including implementing their own customized versions of the above – and offer state-of-the-art operations, all reflected in the price of the service. Few organizations can afford to make investments of that size.

Because each enterprise’s infrastructure is unique, the toolset needed to support it is unique as well. This requires a sizable investment by enterprises and poses challenges of its own for the IT executives and teams tasked with developing and maintaining these systems.

Focus on Innovation and Not Cluster Frustrations

While infrastructure challenges are broad and wide, the driving force behind infrastructure operations and management decisions will be based on a complex cocktail of data access, intelligent, value-driven cluster management, and access to the right talent and expertise to make distributed infrastructure possible.

When enterprises reach a scale where public cloud is no longer cost-effective, they need to transition to modern infrastructure as a service (IaaS) and storage as a service (SaaS) solutions that dramatically simplify the configuration and management of the complex infrastructure needed to run today’s modern applications in private clouds.

These solutions can help organizations stay on top of complex data issues with solutions that automate infrastructure provisioning and cloud resource life cycle management. Organizations can now deploy and manage new active archive systems at scale, software publishers are simplifying on-prem customer deployments, and developers can focus on application development and innovation instead of the operational efforts to manage clusters.

About the Author

Mark Yin is the CEO of Platina Systems, the first to provide robust, out-of-the-box automation of infrastructure provisioning and cloud resource life cycle management spanning bare metal resources to Kubernetes clusters. Mark started Platina Systems in 2014, and has focused his work on streamlining infrastructure provisioning and operation, and providing a foundational cloud operations software stack for any services (eg., IaaS, PaaS or SaaS) over immutable infrastructure.

Sign up for the free insideAI News newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1