Select Page

Optimal data archival with Glacier: Understanding Archival data and its challenges

21

SEPTEMBER, 2017

There are a lot of benefits when backing up to Amazon Glacier. Amazon Glacier is an extremely low cost storage service that revolutionizes data archiving and backup. Glacier also allows you to scale your storage space as needed.

Amazon Web Service (AWS) handles all the operational procedures required for data retention including securing and protecting your data. This leads to further reduction of costs and elimination of operational overhead of managing physical storage infrastructure. Glacier is also compatible with other AWS services such as the on-demand computing of Amazon Elastic compute (also known as Amazon EC2).

Before we explore further into what Glacier is and what its benefits are, a clear understanding of archival data and the challenges of in-house archive solutions is necessary.

What is Archival Data?

It is imperative to understand which data qualifies as archival data before you backup to Amazon. Most of the data that is archived in Glacier is in-frequently accessed and is sometimes referred to as cold data.  Amazon Glacier has been specifically designed to store archive data that is typically older data and is no longer accessed actively. However, this old data is still important and needs to be retained for future reference. Data in Glacier is stored for longer periods of times. These retention periods can be measured in months but are often measured in years, if not decades.

Archival Data

There are several reasons why archive data is retained for long time spans. Some of this data can be retained for business objectives. For instance, if a media and entertainment company is archiving their data, then they may need to retain it indefinitely because they make up the company’s core assets.  Another reason to retain data is to comply with laws and regulations, which increasingly require a variety of different data types to retain for longer periods.

Challenges with in-house Archive Solutions

Conventional infrastructure and service providers force customers to pay large amounts of upfront payment. After the initial payment, the maintenance and management of the infrastructure adds more to the cost. The total budget also includes costs of resources spent to keep the infrastructure operational such as power costs, cooling budget and other similar operational costs.

Another challenge is that companies have to predict how much storage they will require, it’s basically a combination of what they require at present and what they will need in the future based on the storage space requirements compiled over the past months or years. Regardless of the techniques used to predict future requirements, they end up purchasing infrastructure that sits idle. If the prediction isn’t 100% accurate, then that means the purchased infrastructure sits idly for longer periods of time. That is a fixed asset that has consumed considerable resource and is currently useless.

Even after you spend these resources and consume the budget, it is still difficult to ensure high durability of your data; not to mention, it is time-consuming and even more costly. It also becomes a challenge to create and maintain copies of your data and divide them over several data centers in geographically distinct sites. This implies that the acquisition of a reliable disaster recovery plan isn’t possible with traditional methods and technology.

For data stored on infrastructure, performing data integrity checks also prove costly and are risky.

Amazon Glacier

Benefits with Amazon Glacier

Now that we know what archival data is and what are the challenges imposed by traditional methods and technology. We can explore what makes Glacier the best option for archiving your data. These benefits are universally applicable to all kinds of data: healthcare data pertaining to research or patient information, operational data of businesses, IP video surveillance data and more. The archiving storage is multipurpose and can adhere to user’s requirements.

Cost Effective: For every application

Glacier doesn’t require any capital commitments at all, you do not need to spend on infrastructure; with Glacier all of those costs are gone. The cost of storing data is $0.01/GB/month. The difference between this cost and the cost incurred using traditional infrastructure is astronomical.

As you are not spending on any infrastructure, all relevant resources and operational costs are also removed. There are no management costs, no maintenance costs and it doesn’t require additional staff to manage. Amazon provides a console that helps you do all of that without additional costs.

Durability: Data loss is almost impossible

All of Amazon’s services offer a durability of 11 nines: 99.9999999999% per year. This means that if you have one billion objects, then the chance of data loss is one object per year.

Amazon performs regular data integrity checks and is built to automatically self-heal. Glacier’s security supports Secure Socket Layers (SSL) encryption and allows you to control who can access your data.

Flexibility: Unlimited Storage Space

Glacier doesn’t impose any size limitations on the storage space. You don’t have to purchase a set limit of storage. You pay for what you consume and as your requirement increase; you can increase your storage space as well. The storage space can be scaled up to as much as you want.

You can make 1000 vaults with a single account, to increase this limit you can make multiple accounts. There can be an unlimited amount of archives in each vault. An archive can be a single image or a zip file comprising of different files.

Security: SSL Encryption

Amazon web services (AWS) is the largest cloud service provider and it holds certifications of various industry regulatory authorities. Amazon’s clients include domestic consumers, enterprises and governments. In terms of security, it is not new to sensitive and mission critical data. There are several procedures in place to ensure that your data remains protected from malware and cyber-attacks. Encryption such as SSL enables AWS to leverage security and provide secure and reliable cloud backup.

Simplicity: Single Console Management

Glacier removes all infrastructure otherwise used to archive data. Due to the excessive requirements of traditional technology, everything becomes complex. As there is no longer any infrastructure involved, things are simpler, convenient and very easy to manage. Amazon provides a single console to enable you to manage all your archived data.

Multiple Services: Combine diverse options for an effective archiving strategy

Amazon’s compatibility of one storage service with the other opens up a lot of options for users with different kinds of data. Enterprises have to deal with three types of data: data that is accessed frequently, data that isn’t accessed frequently and archival data. A frequently accessed data object slowly transitions from most accessed to archival data. Users can start with Amazon’s S3 service, then after a certain period of time has passed since the creation of the object; move the data to S3-IA and finally to Glacier for archival. This can be done with a very simple code and doesn’t require excessive management.

For instance, if you wish to backup emails to the cloud, you can use this code to sort where exactly your email is kept. After a year, those emails may still be needed for referencing purposes but keeping them in the hot blob/frequently accessed storage tier (S3) is far from economical. Using a simple code, you can move your emails from S3 to S3-IA and then finally archive them where they can stay indefinitely and incur negligible costs.

Conclusion

Archiving data is necessary for a variety of reasons. Using traditional technology for archiving purposes incurs huge costs and remains inefficient. Amazon Glacier resolves all the issues and challenges faced with traditional archiving technology. There are a lot of benefits of using Glacier for archiving: Glacier backup cost is exceptionally less, it is more secure and reliable, it provides unlimited storage space and despite that is easy to manage. Glacier is compatible with various other Amazon services; therefore users can develop an archiving structure that moves data between different storage tiers using these services.

Related Post

Pin It on Pinterest