In one of the environments I was managing, the problem was simple - the archive space was running out and the management was tired in re-inventing the wheel every couple of years by basically re-building their archiving storage system. They wanted to switch to tape...
...without most of the costs associated with "going tape". They refused to purchase additional equipment (tape drives, racks, libraries etc.) and demanded that the storage previously used by backups (NAS) was returned to production environment asap. Crazy? Yep.
Anyway, I did not get a chance to see how this story ended as I've switched jobs and became a mostly VMware/Linux guy I am today. Having that said, last week I've decided to come back to this problem (as a thinking exercise) and try figuring out what would be a perfect solution in this scenario.
The answer turned out to be... virtual tape.
If we look at it purely from a cost perspective, it is as follows:
- Virtual Tape Storage (EC2) - used as the initial storage when the tape backups are uploaded before they go to Glacier - $0.025/GB/month
- Virtual Tape Storage Archive (Glacier) - used when tapes are archived - $0.005/GB/month
On top of that, there is also the cost of the gateway appliance itself. The way this works is, you are billed for the first 100GB ($125.00) written to the gateway and anything after that is free of further charges or as Amazon puts it:
Up to a maximum of $125.00 per gateway per month. The first 100 GB per account is free.
So in summary, we have a cost of $125.00 per gateway, per month if we'd keep uploading more than 100GB monthly to AWS (which the client we're discussing obviously did).
If you add this all up, the cost of AWS VTL is just too good to be true. I was mostly surprised by how cheap the gateway itself is. If you'd be interested in the details, you can go check out their full pricing here.
In this setup guide, we're going to assume the client's backup is based on Veeam Backup and Replication 9.5 U3 and VMware vSphere 6.7
We are also going to assume the client has already an access to AWS account.
1. Download and configure the VTL appliance
- After logging into the AWS, open up the
- In the AWS services menu, select the
Create Gatewaywizard should automatically pop-up at this stage, if it won't, select
VMWare ESXiand download the image.
The image will be saved in
.ovf format. The deployment process is pretty straight forward:
- Log in to your vSphere environment (vCenter)
- Right-click on the datacenter which should be hosting the appliance
- Select `Deploy OVF Template"
- Follow the steps in the deployment wizard to finish
Once the image has been deployed to your environment, open up the console and finish up the setup. There are two most important things to configure here:
- The default username
sguserand the password is
sgpassword; you might want to change these to something more secure
- The appliance probably received an IP address from DHCP. The VTL Gateway has to be in the same subnet as the Veeam B&R server, so you might need to adjust that.
With this configuration part finished, we can move onto the storage stuff. The VTL Gateway requires at least two virtual disks (caching & upload buffer) and you need to add those manually from vSphere. I recommend creating at least a 512GB vmdk for cache and 256GB vmdk for upload buffer.
- Exit the VTL console screen and right click on its object (VM) in the vCenter inventory
- Open the hardware tab and select
Add new hardware
Add new diskfrom the list of available hardware
- Specify the size of the new disk and adjust other relevant settings
- Repeat this operation for every disk needed.
- When finished adding the disks, select
Once the drives are added, we can reboot the appliance.
After a few minutes, if we go back to the AWS console, it should be asking for an IP address to connect to. Enter the LAN IP address of the appliance - do not worry about ports configuration, the connection will be established over HTTPS and then always initiated from within your network.
After configuring a connection between the local appliance and AWS, we should finally create some tapes. As I didn't want to invest too much into this project, I've created only three 100GBs tapes, but you can go higher if needed. The 100GB tape is the smallest available option.
2. Configuring Veeam
- Login to your Veeam management server,
- After logging in, go to
control paneland select
Addyour AWS VTL Gateway Appliance (by IP address) and select
- Veeam should think for a seconf and finally discover multiple iSCSI targets,
- Connect to every discovered target. These are your tape drives,
- After connecting to all targets, close the
- Back in the main Veeam console, select
Tape Infrastructureand add a new tape server
- Run through the wizard - it's a very straightforward and shouldn't cause any problems.
Your new tape server should now be ready for use in Veeam. Before we'll be able to run any backups or backup copy jobs to it, we'll still need to do few more things...
3. GFS Media Pools
GFS Media pools control how we are going to store our tape media. This is a very important step in our setup, as we'll be using both AWS EC2 and AWS Glacier so these rules are critical for configuring proper load balancing between these two storage pools.
- Firstly, log into the Veeam Console and select
Tape Infrastructure, if you're not already there,
- Right click on
Tape Infrastructure. Select
Add GFS Media Pool
- On the next page, select the free tapes which should belong to that particular pool,
As AWS is going to charge per tape in case of an eventual recovery operation, it is best to split the pools in some groups based on the type of the VM it is going to be a backup repository for. This way, you won't have to pull massive amount of tapes in case of a single VM recovery. At the same time, if the backup going to tape is of DR type (i.e. backup of an entire Active Directory environment), it might be better to not split the pool as recovering tapes from different pools takes longer than recovering multiple tapes from a single one.
- After selecting all the required tapes, click
- On the following page, select the retention settings needed and then open up the
- In the
Advancedwindow, there is one very important setting which we want to enable -
Move all offline tapes into the following media. Selecting this will make Veeam move archived (finished) backups to AWS Glacier.
- Next, close the
Advancedwindows and proceed to the next page,
- On the last page -
Encryption- choose which kind of extra encryption Veeam should use for this storage pool. It is recommended to use some kind of encryption as backing up to AWS is considered an
4. Creating GFS Tape Jobs
Finally, as we created our media pool and vault, we can now move on to creating a tape backup job.
A tape backup job will start and then wait for the next backup to occur, just like a regular backup copy job - this means, the tape controler checks every hour for new, complete (i.e. not in progress) backup files. Once a backup job finishes, the tape job controller will send an API request to the library and the job will start up.
- To create a new Tape Backup job, select
New Backup to Tape Jobfrom the main window of the Veeam Console,
- On the next page, under
Backup Files, select the backup job(s) which should be saved onto tape (remember to select the last full or differential backup)
Nextand on the following page, select the media pool created in the previous step,
- On the next page, select both the
Eject Media upon completionand
Export the following media sets upon job completion. Skipping this part will prevent the tapes from being transferred over to AWS Glacier.
- Move on to the last page and configure the schedule.
Having all of that done, we can just relax and wait for the backup to be saved onto our virtual tapes in the AWS:
What do you think?
I must admit, when I've first heard about backing up archive data to cloud or "virtual tapes", I wan't too optimistic about it. Seeing the price of cloud storage - reasonable for anything under 1TB and straight-out crazy for anything else - I was certain that tape jobs can only happen locally.
Well, not for the first time and definitely not last, I was wrong in my predictions. Backing up to AWS seems like a reasonably priced alternative if the backup infrastructure in the environment has access to fast enough uplink. Obviously, it's not for everyone (remote sites, huge data backups etc.), but certainly it's perfect for clients such as the once I've described in the intro - desperate to off-load the upkeep of the local infrastructure and adamant about stabilizing the costs on monthly basis.
Have you ever used any virtual tapes in your backup environment? Send me your opinions @wilk_it_wizard or #sshguru