While trying to identify the most common pain-points customers experience when protecting their data, I came across an interesting survey published by ESG Research, September 2015. Though 22% of the respondents named ‘increasing cost of protection’ as their biggest challenge, 15% cited ‘keeping pace with capacity of data to protect’ as their second choice. Cost is definitely an important roadblock for most respondents – however, it is the latter that offers us a new perspective and food for thought.
During my experience of providing consultation services to solution architects on data protection opportunities, I observed that only a few actually understand the real difference between long-term backups and archival. The misconception mostly arises from the fact that we approach both solutions from the perspective of final storage targets. Although both of them deal with similar storage mediums, such as tapes, object storage and cheaper disks at the backend, the actual difference lies at the source.
When long-term backups are scheduled, we take a normal copy and conﬁgure it to last for 10-15 years on the target storage platform. However, we don’t displace or delete any part of the content from the source. Essentially, we are left with the same data volume to manage and protect even for the next cycles of backup.
Archival differs in that it does not copy the data but actually moves the data from its source location to the target. This is achieved by leaving behind small placeholders called ‘stubs’, which when clicked gives users access to their ﬁles even though the location of ﬁle data has been changed. So, archival as a strategy helps us reduce the scope of the data that must be managed and protected within the customer environment.
If I look at the lifecycle of data in production environments, initially when the data is created all of it is fresh and accessed frequently by end-users. Typically, two years down the line, perhaps only 50% of data would be active and rest of it needs to be accessed for reporting purposes. After ﬁve years, 20-30% of data is accessed or altered, but approximately 60% of it would be accessed only on an ad-hoc basis. This ad-hoc data contains duplicate ﬁles, historical records, or even old end-user data, which must be archived. Otherwise, it uses the costlier storage medium for production and calls for recurring backups, security measures, and compliances, alongside other cost-heavy risk-based policies.
According to industry estimates, over 40% of organizations will have supplanted their long-term backups strategy with archival by 2020 – and this trend will soon go mainstream, in an era where data has been growing exponentially, in zettabytes. I believe it is only by keeping a regular check on archival and backup strategies that we can relieve the customers of their recurring hurdles – such as the ever-increasing cost and scope of data protection.
- We need to understand the difference between long-term backup and archival and identify relevant use-cases.
- Long-term backup causes huge direct and indirect costs, often when it is unnecessary.
- We can save a signiﬁcant portion of the cost typically dedicated to data protection, by archiving ‘stale’ data before blindly backing it up.