It’s commonly said that public cloud storage is cheap, but that its data transfer costs supposedly make it unsuitable for long-term cloud archiving.
Who’s making this argument? From what I’m seeing, it’s coming from vendors selling private cloud or on-prem storage solutions.
They do a good job of making it sound like cloud storage has this hidden cost that makes it more expensive over time. You don’t pay to retrieve your data in the other models, do you? And the argument usually stops there, without any quantification.
It sounds like a legitimate argument. But as we’ll see, it isn’t.
Qualifying data transfer costs
Retrieval activity is initially difficult for an organization to estimate. It isn’t something we measure before going to the cloud. But remember, we’re talking about archiving here, and the fact is that most data becomes low-touch – if not completely inactive – after just 30 days.
Here are a few factors that’ll strongly influence your monthly data transfer activity:
- What workloads are you archiving?
- How aggressive are your archiving policies? If you’re aggressively archiving content created within 30 days then you should expect more retrieval.
- Are you providing end-users access to the archive? Not at all? Just select groups? Everyone? The more users accessing the archive the more overall retrieval you should expect.
- What is an average monthly volume of eDiscovery collection?
Quantifying data transfer costs
So let’s run some numbers and look at data transfer costs specifically, proportionally, and relatively.
For our examples we’ll use 100 terabytes (TB) archived in HubStor, a consumption-based cloud archive built on public cloud economics. The following chart shows data transfer costs in a given month by volume:
Indeed data transfer costs do not exist when you own and operate the storage. So at first glance, these data transfer costs induce sticker shock. But let’s dig a little deeper. Here are a few initial things to consider:
- This is purely retrieval data transfer. In our cloud model with HubStor, which transparently marks up the underlying Microsoft Azure cloud infrastructure, there’s no cost for data transfer on the way in. Each month provides 5 gigabytes (GB) data transfer out at no cost. Beyond 5 GB per month, there’s a cost (around 10 cents) per GB retrieved.
- Exceeding 5% retrieval in a month is rare, especially when your archive is larger than 100 TB. Remember, we are talking about archiving here. Active archiving to be more descriptive, which implies day-to-day user access on demand. But even as an active archive, as an organization evolves it begins to retain large volumes of data that simply aren’t touched. We’ve seen customers with 50% of their archive mapping to orphan users – where the data owner is disabled or nonexistent in their directory. The point here is that a typical archive – having content older than 30 days – sees very low retrieval activity, even when the entire knowledge worker community is authorized to access it.
- The 100% retrieval scenario would come into play if your organization ever decided to leave. A complete migration out. While the data transfer costs seem high for the full 100 TB amount, it’s relatively an inexpensive data migration cost compared to typical data migrations that require professional services and licensing of third party software (which, depending on the migration scenario, can run anywhere from $500 to several $1,000s per TB).
Let’s now take a look at data transfer costs proportionally in the cloud archive's monthly subscription fee.
This chart clearly shows that data transfer costs can range drastically based on retrieval volume. A few things to consider:
- In the above chart the ‘Everything Else’ dollar amount represents all other cloud consumption and costs associated with the 100 TB cloud archive environment (databases, web apps, storage containers, virtual machines, software, support, etc.). It's much more than just vanilla cloud storage.
- We can see that data transfer is an insignificant line item for retrieval activity less than 25% (25 TB). But it becomes a significant portion when retrieving 50% (50 TB) or more.
- In our experience, archive retrieval almost always sits below 10% in any given month. It would be a rare month to see 20% retrieval of an archive. Who is accessing 20% or more of their data that is older than 30 days each and every month? Likely no organization on earth. But as we’ll see, consumption-based archiving with public cloud is still price competitive even at 100% retrieval.
According to a 2014 storage survey by TwinStrata (now EMC/Dell):
"almost 60 percent of data heavy organizations store more than half a petabyte (PB) of inactive data and that more than a quarter of these organizations are storing at least three quarters of a petabyte of data that will rarely (if ever) be used again."
So let’s put these costs in even greater perspective. The following chart compares fully-loaded monthly consumption fees of the cloud archive against in-house storage cost estimates from leading analyst firms.
- The ESG figure is derived from this costing and Forrester’s is found here. Both in-house storage figures in the above chart are for fully-burdened costs for usable capacity and they both amortize the upfront capital expense of acquiring the storage at a 4 year life span. In other words, they are measuring all that goes into preserving 100 TB of storage in the enterprise (not just the upfront capital expenditure to acquire the storage).
- The price tag of a likely 5 to 10 TB monthly retrieval volume is quite insignificant, giving us an incredibly low cost for a 100 TB active archive environment relative to what it takes to have the same 100 TB capacity in-house.
- Even if you were to see 100% retrieval on a 100 TB archive – meaning the entire archive is retrieved – your cloud archive’s monthly consumption costs are still lower than the Total Cost of Ownership (TCO) of in-house storage. Again, to be clear, this is a cloud archive built on public cloud and basing its pricing as a markup on consumption.
Data transfer costs can indeed be a significant portion in the cloud archive consumption model. But relative to in-house storage costs, it’s nowhere near being a factor that would make public cloud economics unsuitable for long-term archiving.
Could it be that traditional storage vendors want you to compare apples to oranges? To compare only the upfront purchase price tag of their offering against the complete TCO picture of their biggest threat, the public cloud. A recent study by 451 Research concluded that enterprise spending on public cloud storage is poised to double over the next two years, at the expense of traditional on-premise storage.
Although data transfer costs are specific to public cloud storage, there are more cost factors in the in-house storage model that you simply never see in public cloud (facilities, power/cooling, backup, usable capacity, etc.). And remember that most storage equipment has a life span – typically 4 years – requiring an expensive (and sometimes painful) data migration at refresh time along with another large capital expenditure on the new gear. Remember this the next time you hear that cloud storage costs are recurring long-term. In-house storage costs are too; even more so, just differently.
And if any of these cost figures for a 100 TB environment sound big then our storage cost white paper is a must read as it breaks down and compares the fully-burdened cost models of a cloud archive versus in-house storage.