OSE briefly used Amazon Glacier to store some old backups of our data when Dreamhost notified us on 2018-03-20 that we were violating their ulimited storage policy by storing backups on their servers.
In 2019, we left Amazon Glacier for Backblaze for the following reasons:
- Backblaze is cheaper after considering that Glacier has minimum archive retention requirements in their fine-print
- Backblaze is way, way easier to use
Actual Storage Quotas and Costs
- ops at opensourceecology.org
Restore from Glacier
We use Amazon Glacier for cheap long-term backups. Glacier is one of the cheapest options to store about a TB of data, but also can be very difficult to use. And data retrieval costs are high.
Glacier has no notion of files & dirs. Archives are uploaded to Glacier into vaults. Archives are identified by a long UID & a description. At OSE, we use the tool 'glacier-cli' to simplify large uploads; this tool uses the description field as a file name. For each tarball, I uploaded a cooresponding metadata text file that lists all the files that were uploaded (this should save costs if someone doesn't know which archive to download, since the metadata file is significantly smaller than the tarball archive itself).
Archives >4G require splitting into multiple parts & providing the API with a tree checksum of the parts. This is a very nontrivial process, and most of our backups are >4G. Therefore, we use the tool glacier-cli, which does most of this tedious work for you.
If you don't already have this installed (try executing `glacier.py`), you can install the glacier-cli tool as follows
# install glacier-cli prereqs yum install python-boto python2-iso8601 python-sqlalchemy # install glacier-cli mkdir -p /root/sandbox cd /root/sandbox git clone git://github.com/basak/glacier-cli.git cd glacier-cli chmod +x glacier.py ./glacier.py -h # create symlink in $PATH mkdir -p /root/bin cd /root/bin ln -s /root/sandbox/glacier-cli/glacier.py
Sync Vault Contents
The AWS console will show you the vaults you have, the number of archvies it has, and the total size in bytes. It does *not* show you the archives you have in your vault (ie: their IDs, descriptions, & individual sizes). In order to get this, you have to pay (and wait ~4 hours) for an inventory job. glacier-cli keeps a local copy of this inventory data, but--if you haven't updated it recently--you should probably refresh it anyway. Here's how:
# set creds (check keepass for 'ose-backups-cron') export AWS_ACCESS_KEY_ID='CHANGEME' export AWS_SECRET_ACCESS_KEY='CHANGEME' # query glacier to get an up-to-date inventory of the given vault (this will take ~4 hours to complete) # note: to determine the vault name, it's best to check the aws console glacier.py --region us-west-2 vault sync --max-age=0 --wait <vaultName> # now list the contents of the vault glacier.py --region us-west-2 archive list <vaultName>
The glacier-cli tool uses the archive description as the file name. You cannot restore by the archive id using glacier-cli. Here's how to restore by the "name" of the archive:
# create tmp dir (make sure not to download big files into dirs that are themselves being backed-up daily!) stamp=`date +%Y%m%d_%T` tmpDir=/var/tmp/glacierRestore.$stamp mkdir $tmpDir chown root:root $tmpDir chmod 0700 $tmpDir pushd $tmpDir # download the encrypted archive time glacier.py --region us-west-2 archive retrieve --wait <vaultName> <archive1> <archive2> ...
The above command will take many hours to complete. When it does, the file(s) will be present in your cwd.
Decrypt Archive Contents
OSE's backup data holds very sensitive content (ie; passwords, logs, etc), so they're encrypted before being uploaded to 3rd parties.
Use gpg and the 4K 'ose-backups-cron.key' keyfile (which can be found in keepass) to decrypt this data as follows:
Note: Depending on the version of `gpg` installed, you may need to omit the '--batch' option.
[root@hetzner2 glacierRestore]# gpg --batch --passphrase-file /root/backups/ose-backups-cron.key --output hetzner1_20170901-052001.fileList.txt.bz2 --decrypt hetzner1_20170901-052001.fileList.txt.bz2.gpg gpg: AES encrypted data gpg: encrypted with 1 passphrase [root@hetzner2 glacierRestore]#
There should now be a decrypted file. You can extract it to view the contents using `tar`.
On 2018-07-06, we deprecated our managed hosting hetzner1 server, replacing it with hetzner2, a dedicated server with root access that had more resources _and_ cost less per month.
All of the files from hetzner1 were uploaded to Glacier for safe long-term storage in-case they ever need to be recovered.