Planes, Trains and Data Growth

In this special guest feature, Kevin Dudak from Spectra Logic looks at how often data storage boils down to issues of Space, Time, and Money.

Kevin Dudak

Kevin Dudak

Working for a storage manufacturer, I get to see a lot of different storage challenges. How do we store massive data sets for long periods of time effectively? So, I should not have been surprised when I experienced my own miniature version this challenge. I just returned last week from the 50th Annual Reno Air Races. A friend got me into taking pictures a couple of years ago, and this year I took a bit more than 15,000 pictures at the event. In fact during the last few years, I have taken close to 50,000 pictures at the Air Races.

My data set might be small compared to some, but it brings up many of the same challenges faced in the data center– and being in the storage industry, I can’t help but think we can surely come up with a better way.

All of my issues seem to be tied to Space, Time and Money.

Space: I am running out of storage space, and my current storage system at home, a 6-year-old Windows Home Server, does not have room for more hard drives. My home office doesn’t really have room for another device; I am short of floor space. This mirrors my conversations with many organizations; some are always tight on capacity, and sometime real estate. Scaling capacity can be challenging at times, especially when the physical limitations of systems are reached.

Time: I just don’t have time to manage a growing storage system at home. I would rather be out playing with toys than configuring a new storage server. I am sure I could create my own NAS system at home starting with an open source product, but I don’t have time to build or support it. I am also starting to struggle with the amount of time it takes to find things. Much of this is my own fault, as I appear to enjoy listening to the camera take pictures a little too much. The fact is we all have a limited amount of time to devote to things like storage management, even at work. Learning multiple systems is hard, and eventually we become generalists instead of experts.

Money: Budget is a constant challenge. I don’t think that will ever change. I would love to buy a small enterprise grade NAS system for the house, but can’t afford the box or the power burn. Big Data environments are seeing similar problems. The data sets are getting so large; it is hard to afford the infrastructure to support the storage, let alone the actual storage systems.
So, what would be the ideal storage system? One that offers:

  • Extreme scalability – While I might only need a dozen Terabytes or so for my pictures, big data needs hundreds of Petabytes to an Exabyte over time.
  • Self-managing – it is great that disk arrays rebuild failed drives, but it would be nice if the machines did more to take care of themselves. Keeping track of media health as well as data integrity would be fantastic. Don’t just tell me the system is broken tell me how to fix it, or better yet that it’s already fixed itself as we were talking.
  • Easy to use – I don’t really want to have to come up with a new way to operate. I’d like to use the same interfaces and workflows I already know. A little change here or there is fine, but most organizations don’t have time to learn everything new every couple of years.
  • Staying power – I want this to last a long time. While I might have to upgrade components, I don’t want to have to deal with massive data migrations, or other disruptive changes

In some ways, this sounds ‘cloud like.’ However, the cloud doesn’t really work for me. I like a lot of the flexibility, but want the real-time access and security of having it here, on-site.

I know lots of different companies are working on their next generation storage systems. I hope they are thinking of these things. And if you happen to have the perfect storage solutions for my house, please let me know.