When it comes to maintaining SLAs, monitoring the health of complex application environments and being able to identify, diagnose and respond to issues as they arise, DevOps and data science teams have a challenging task of predicting the universe of possible problems they might encounter, types of data they need, and the amount of time for which they’ll need it. Compounding this challenge, the more successful their application is, the greater the volume of data generated. While it’s tempting to keep everything forever, it’s neither practical nor feasible to do so.
Hindsight is 2020
DevOps and data science teams aren’t clairvoyant and the expectations placed on them aren’t necessarily fair. They make their best guess given the circumstances and constraints under which they must operate. If they’re wrong, the results can be very serious – missed SLAs, lost revenue, reputational damage in the marketplace – based on a combination of the known unknowns, issues that have been identified but for which the probability and impact can’t be quantified, and the unknown unknowns, the issues that completely blindside you.
History is not the best guide
Having a good sense of the type of issues you’ve seen in the past helps guide decisions on the types of data – what sources, tables, columns, rows plus unstructured data that might be sufficient to address the problems you’ve seen before and might expect to see again. But, just when you think you’ve seen it all, that seemingly innocuous alert you keep snoozing is suddenly correlated to a minor but growing bug that’s being discussed in a customer forum and the combination reveals a major problem. That’s the kind of StuffHTF moment that makes you want to save as much data as possible.
It’s usually not feasible to save everything – the local high performance storage for your app just doesn’t have the capacity and cloud storage is generally too expensive. Cloud storage also introduces additional complexity in ensuring appropriate availability and securing those backups from ransomware and other attack vectors.
Striking the right balance
In an ideal scenario, more data is definitely better and increases the probability of faster resolution when issues arise. Traditional centralized object storage has been the de facto standard and both the go-to solution as well as the most limiting factor. While traditional object storage is fast and readily available, it comes at a very expensive price.
Some cloud storage options are far less expensive, but storage isn’t instantly available and rapid retrieval would be prohibitively expensive. Other low cost solutions require significant additional cost and complexity to provide a highly available and geographically redundant solution.
Across the many providers and multiple tiers of storage, multiple regions and data centers to choose from, developers face the challenge that their efforts to balance performance, price and availability create new tradeoffs in complexity and security.
Emerging technologies show promise
Although centralized cloud storage has been the dominant model for over a decade, emerging storage solutions like decentralized storage show promise. Decentralized cloud storage solves a number of the challenges when it comes to storing backups of large amounts of data for the long term:
- Decentralized cloud object storage uses crowd-sourced capacity instead of building and operating data centers which represents a significant savings that is passed along to customers
- Storing data on a distributed and decentralized network of third-party hardware providers represents a fundamental shift in the security and privacy paradigm
- Decentralized storage is inherently multi region with no single point of failure
- Decentralized storage is capable of very fast transfers when leveraging the massive parallelism inherent in distributed systems
With decentralized cloud storage, developers have much more flexibility in terms of the amount of data stored and the frequency with which data is backed up. With costs that are 80% lower than prices currently charged by hyperscale centralized cloud storage providers, no minimum storage terms and no mystery fees, developers can store a lot more data for the same cost.
Much lower costs for bandwidth also make it easier to leverage a multi-cloud architecture without vendor lock-in or punitive levels of transfer fees making interoperability a cost prohibitive scenario. Moreover, default multi-region high availability and enterprise grade SLAs mean less complexity and cost to ensure data is available when needed. And because when that data is needed, it’s need fast, so the performance from parallel transfers means fast RTO/RPO (Recovery Time Objective/Recovery Point Objective) ensuring your application is able to rapidly recover in the event of catastrophic failure.
How decentralized storage works
Decentralized cloud object storage works by encrypting data, splitting it up in to small pieces, and distributing those pieces over a network of tens of thousands of storage nodes located all over the world. When an object is uploaded into decentralized cloud storage, it is default encrypted, split up into 80 or more pieces and distributed across thousands of diverse nodes and ISPs in nearly 100 countries. There is no single point of failure or centralized location meaning outages, downtime, ransomware and data breaches are virtually impossible. If any node goes offline for any reason, your file can be reconstituted from just 29 of its distributed pieces. And, in addition to user-assigned access grants to ensure privacy, our edge-based security model with delegated authorization provides flexible and ultra-secure capabilities for access management.
Innovation expands horizons
Just as cloud computing and hyperscale data centers represented a paradigm shift in how infrastructure, applications and services were delivered that were previously unfathomable in an on-premise world, decentralized cloud storage represents the next seismic shift in the evolution of cloud. Decentralized cloud storage enables many new technologies and architectures that are not cost effective, too slow, or simply not possible today.
About the Author
John Gleeson is COO, Storj
Sign up for the free insideAI News newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1