Recently, George Crump at Storage Swiss wrote about tape vs. cloud and it sparked me to share my perspective on this fun storage industry debate.
My position is that the debate about "tape vs. cloud" is flawed.
Because the cloud includes cold-storage tiers that are physically a tape library in the cloud provider’s data center, thus, the tape vs. cloud argument is erroneous since the cloud encompasses tape.
The discussion is better framed as “manage tape yourself vs. consume tape seamlessly in the cloud.”
Roll your own: tape for the DIYer
Advocates for tape media sell against their biggest threat: the cloud. But let’s be clear that what they’re trying to sell is an approach where you’ll need to buy tape media plus a tape library device and assign some poor soul on your IT team as the operator.
It’s like saying that planting a garden is the most cost-effective way to feed your family. While that’s certainly true, most people lack the time, skill, and patience for it.
Many people prefer to order their groceries online and have them delivered—knowing it is more expensive than planting a garden—because their most precious resource (time) is better placed on higher-value tasks that can yield a greater return than the cost savings of a do-it-yourself approach.
The reality is that there’s an opportunity cost to spending time on mundane tasks, like managing tape.
Organizations that embrace the cloud do so for efficiency reasons. The transformation typically involves rethinking your perspective on IT: In essence, no longer is IT merely a cost center; instead it becomes an enabler of business agility, innovation, and growth.
Tape in the cloud: a new user experience
Consuming tape in the cloud looks like this: Using software-based methods, I place my data in the cloud, paying for storage based on actual usage. With a simple tiering policy configured in a Web interface, I can target the datasets in my cloud storage that will move to the cloud’s tape tier seamlessly. Tiering in the cloud to the cold-storage level of tape has the immediate benefit of reducing my cloud storage spend. I can realize storage economics almost equivalent to do-it-yourself tape, yet I never have the hassles of the following:
- Acquiring a tape library and tape media,
- Operating a tape library,
- Worrying about rotating the media as it ages,
- Worrying about an offsite storage strategy,
- Worrying about the physical aspects of security and data protection.
The beauty of tape in the cloud with a solution like HubStor managing it, is that you can see, search, and manage all your data holistically, regardless of what tier it’s on. Goodbye storage silos.
Recalling a file from the cloud’s tape tier involves a rehydration process that can take several hours before your data is back on disk and accessible. That’s the speed of tape no matter what. But at least with the cloud model, your effort is to simply initiate a retrieval request. If the file you’re after is on tape media physically, an automated rehydration process takes place—at no point are you handling tape media.
A GDPR-compliant, discovery-ready secondary storage strategy
Magnetic tape arrived on the scene in the 1950s, courtesy of IBM. Since that time, the business world has changed. Legal discovery and demanding data privacy legislation such as the EU’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) put emphasis on an organization’s abilities to search within their stored data, analyze information without delay, and act upon targeted information swiftly.
Perhaps my biggest complaint with tape is its cumbersome nature and the simple lack of agility one can have when it comes to any type of discovery or precision data management.
When I worked for ZANTAZ, I remember the data restoration team was always busy and they were a major revenue generator for the company. That’s because big companies facing litigation or an audit needed help with data stored on literally thousands of tapes. The scope of the work was too much for most IT teams to deal with internally, so they outsourced the cumbersome effort of extracting the data off various tape media, loading it onto disk, and full-text indexing the content for search. Some of the jobs were big, multi-million-dollar projects.
While tape offers the lowest storage cost around, a simple GDPR right-to-know request can reveal the expensive side of tape.
Discovery search and data management with cloud’s tape
We’ve taken steps to optimize for the cloud’s tape tier at HubStor, specifically as it relates to indexing.
One challenge with any full-text indexing platform is that the stored indices become corrupt on occasion. In most cases, an index failure of this kind requires a re-index operation to rebuild the search index. But if the data is on tape, well, that’s going to be painful because the indexing engine will need to retrieve files to obtain their rendered text.
However, HubStor optimizes search so that indexing occurs before tiering to tape, and any re-indexing may occur without ever needing to recall the data from tape.
Altogether, the cloud’s tape experience with HubStor’s data management layer provides organizations with the following advantages:
- The storage economics of tape without the upfront cost of a tape library or the hassle of managing tape.
- Offsite storage by design, with added options for in-cloud backup (to the same cloud region, or a geographically separate region).
- A reliable discovery search that avoids the content indexing pitfalls of the past.
- Data management capabilities such as legal hold, retention, chargeback, classification, role-based access, search, cases, and reporting—all applicable to the information stored on the cloud’s tape tier, and all without involving recall from tape. (The only time you need to rehydrate a piece of information from tape is if you must open or export it.)
- Hardened security stack that protects against malware and ransomware.