AWS

Amazon Glacier in 5 min

What is Amazon Glacier?

Amazon Glacier is an extremely low-cost storage service that provides secure, durable, and flexible storage for data backup and archival.Remember its not a direct backup tool , its a storage service for Backup & Archive.It is managed service from Amazon,so need to worry about capacity planning .

You can store unlimited data with low price $0.004 per gigabyte per month.The main purpose of this service is you can keep your data for many years and decades to meet compliance requirement .

Core components:

1.Archives
2.Vault

Archives:
Your actual data (like object ins S3) . Data will be stored as Archives. An archive can be comprised of any data such as photos, videos, or documents. You can upload a single file as an archive or aggregate multiple files into a TAR or ZIP file and upload as one archive.The total volume of data and number of archives you can store are unlimited. Individual Amazon Glacier archives can range in size from 1 byte to 40 terabytes.

Vault :

Vault is a container to store archives.A vault is a way to group archives together in Amazon Glacier.You can organize your data in Amazon Glacier using vaults.You can create 1000 vaults per account.Each archive is stored in a vault of your choice.

You may control access to your data by setting vault-level access policies using the AWS Identity and Access Management (IAM) service. You can also attach notification policies to your vaults. These enable you or your application to be notified when data that you have requested for retrieval is ready for download.

Note : Vault Operation can be perform through AWS Console.

How to create/working  with Vault ?
https://docs.aws.amazon.com/amazonglacier/latest/dev/getting-started-create-vault.html
https://docs.aws.amazon.com/amazonglacier/latest/dev/working-with-vaults.html

How to upload data to Glacier :

You can not upload files as Archives through AWS console

1.Amazon Glacier is supported by the AWS SDKs for Java, .NET, PHP, and Python (Boto) or any 3rd party tools.

2.If you are not comfortable to use any sdk or tools the better way is upload your data into Amazon S3 bucket and you can configure with lie cycle management tools to move your data to Glacier. You may need to pay additional storage costs. With this method you can still view the data in S3 console , only the storage class will show as “Glacier”

Integrated lifecycle management with Amazon S3:

Amazon Glacier works together with Amazon S3 lifecycle rules to help you automate archiving of Amazon S3 data and reduce your overall storage costs.After certain days your object moved from S3 to Glacier automatically . You no need to move manually

https://docs.aws.amazon.com/amazonglacier/latest/dev/getting-started-upload-archive.html

How to restore the data from Glacier:

Restoration can be split into 2 types depends on following

1.Data archived to Glacier from Amazon S3 with life cycle management rule-Follow the link to restore the data to S3 https://aws.amazon.com/premiumsupport/knowledge-center/restore-glacier-tiers/

2.Data uploaded directly through AWS SDKs and third party tools.-https://docs.aws.amazon.com/AmazonS3/latest/dev/restoring-objects.html

Third party tools:

Alternatively you can use 3rd party also to restore.

  • Commvault
  • NetAPP
  • Cloud berry
  • S3 explorer
  • Fast Glacier.

Durability:

  • Average annual durability of 99.999999999%  (11) for an archive.
  • Data automatically replicated across minimum 3 AZs
  • Cross-region and cross account replication is possible.
  • Data Integration Checks will be performed periodically

Data Retrieval Features:

What is Data Retrieval ? since this service is basically to store your huge data’s at the same time it will take time to restore the data. Normally the data will be restored in S3 . Amazon provides 3 different option to restore your data.

  • Expedited (Upto 250 Mb Objects)
  • Standard
  • Bulk

Provisioned Capacity Units:

  • Ensures 3 Expedited requests every 5 min and 150Mb/s aggregate retrieval throughput
  • $ 100 per month per unit

Amazon Glacier Select:

Amazon Glacier Select allows queries to run directly on data stored in Amazon Glacier without having to retrieve the entire archive.Analytics application can call the Amazon Glacier Select API to retrieve only the relevant data for your query from the Amazon Glacier archive. Prior to Glacier Select, an Amazon Glacier archive had to be completely restored before the data could be used .Charge will be based on amount of data scanned.

Encryption by default:

Amazon Glacier automatically encrypts data at rest using Advanced Encryption Standard (AES) 256-bit symmetric keys and supports secure transfer of your data over Secure Sockets Layer (SSL).

Multi-part Upload:

Similar to S3 multi part upload , in Glacier also you can use multipart upload to speed up the process.When you send a request to initiate a multipart upload, Amazon Glacier returns a multipart upload ID, which is a unique identifier for your multipart upload. Any subsequent multipart upload operations require this ID. The ID is valid for at least 24 hours

Additional information:
 https://docs.aws.amazon.com/amazonglacier/latest/dev/uploading-archive-mpu.html

Immutable archives:

Data stored in Amazon Glacier is immutable, meaning that after an archive is created it cannot be updated. This ensures that data such as compliance and regulatory records cannot be altered after they have been archived

Vault Lock:

Amazon Glacier Vault Lock allows you to easily deploy and enforce compliance controls on individual Glacier vaults via a lockable policy. You can specify controls such as “Write Once Read Many” (WORM) in a Vault Lock policy and lock the policy from future edits. So once data stored it will never change with vault lock options.

Vault Access Policies:

Vault access policies allow you to easily manage access to your individual Glacier vaults. You can define an access policy directly on a vault to grant vault access to users and business groups internal to your organization, as well as to your external business partners

A vault lock policy is different than a vault access policy.Vault lock and vault access policies can be used together.

How to implement Glacier Vault lock with vault access policies:

https://docs.aws.amazon.com/amazonglacier/latest/dev/vault-lock.html

Audit Logs:

1.Amazon Glacier supports audit logging with AWS CloudTrail, which records Amazon Glacier API calls for your account and delivers these log files to you

2.These log files provide visibility into actions performed on your Amazon Glacier assets. For instance, you can determine which users have accessed a vault over the last month or identify who deleted a particular archive and when

3.Using audit logging can help you implement compliance and governance objectives for your cloud-based archival system

Vault Inventory:

1.Amazon Glacier maintains an inventory of all archives in each of your vaults for disaster recovery or occasional reconciliation. The vault inventory is updated approximately once a day.

2.You can request a vault inventory as either a JSON or CSV file which will contain details about the archives including the size, creation date, and the archive description if provided during upload.

3.The inventory will represent the state of the vault as of the most recent inventory update.

You can view the vault details in console also.

Access Control with IAM:

Amazon Glacier uses AWS Identity and Access Management (IAM) to help you securely control access to AWS and your Amazon Glacier data. You can create users in IAM, assign individual security credentials (i.e., access keys, passwords, and multi-factor authentication devices) and IAM policies on each Amazon Glacier vault to grant permitted activities to intended users.

Thanks!

Tagged

Leave a Reply

Your email address will not be published. Required fields are marked *