NOTICE: This project has moved to https://bitbucket.org/ozmt/ozmt The focus has moved from EC2 related functions to management of OpenZFS in general. A new project name has been coined: Open Zfs Mangement Tools (OZMT) It contains pool, replication, backup, and snapshot management scripts. EC2 ZFS tools, historical reference: A collection of tools to setup a ZFS pool of EBS devices on EC2. Author: Chip Schweiss firstname.lastname@example.org Background: For a project I am managing we decided we want to backup our data in the cloud instead of traditional tapes being shipped off site. We already decided on using OpenIndiana for file storage with ZFS, and wanted to utilize ZFS send & receive to push changes to the cloud. After numerous failed attempts to get OpenIndiana running on EC2, I moved on to ZFS on Linux (ZoL) on an Amazon Ubuntu 12.04 instance. Goals: 1. Highly secure, meaning encrypted data in the cloud. Keys are tightly controlled. 2. Relatively inexpensive. 3. Fully automated. 4. Scalable 5. Possibly be used for disaster recovery in the cloud. This collection of scripts were developed to deploy and maintain our ZFS backups in the cloud. We anticipate approximately 1 TB total data to be backed up, with less than 50GB to start. EBS storage is fixed in size when deployed and cannot be resized. This may change in the future, but at the time of this development it is the case. ZFS cannot restripe, but it can grow vertically if the underlying devices are replaced one at a time with larger devices. I built my initial pool of 40 1GB EBS devices, striped in 5 vdevs of 8 EBS devices as raidz1. The grow script I wrote will replace one EBS at a time in each vdev, wait for a resilver and replace the next until all have been replace. This allows for in place growth of the storage and keeps the pool balanced and never requires restripe. It can grow in increments of 40GB raw storage until the EBS devices are at the EC2 max. (1TB at the time of writing). If your concerned about not having ZFS level redunancy while growing the pool you can setup a raidz2 pool. I've decided this is not an issue since EBS itself is already redunant underneith. I run a scrub before hand to make sure all data is good on all EBS blocks first. The primary ZFS server also has ec2-api tools installed and controls the backup operation. The EC2 instance is only turned on once a day to receive the day's worth of snapshots then shutdown. This keeps running time per month very low and keeps the data locked since it is encrypted on EBS devices. The encryption key is never stored on the EC2 instance. It is supplied through an ssh tunnel from the primary site. The solution: Security first: Our primary server, still OpenIndiana 151a5 at the time of this writing hosts a zpool with many zfs folders that we want to backup and keep our data secure. Since the ZoL instance will have everything accessible in a decrypted form when the system is pushing data, we decided we would build a staging pool that we would copy all of our data to and encrypt on the file level. This is done with with public key encryption so a restore process would require supplying the private key which is not stored on the system at all. The procedure: The staging pool will be synced to our EC2 instance using ZFS send / receive. Using a snapshot schedule, the primary zfs folders each get a snapshot policy that maintains a set number of hourly, mid-day, daily, weekly, monthy, bi-annual, and annual snap shots. After each snapshot cycle data is synced to the local staging pool and encrypting at the file level with GPG. The scripts utilize 'zfs diff' to build the working set of files to copy, update, rename (mv), or delete on the staging pool. Daily the primary server launches the ZoL instance on EC2, mounts the zpool and syncs the changes. The challenges: Data transfer rates: Even initially the transfer rate over SSH was not acceptible. We started with approximately 50 GB which blew up to a 75GB zfs send. This took approximately 30 hours to push to EC2. Should we make any significant changes, the tranfer rate would be a problem. An acceloration was needed. Many techniques and tools were examined. mbuffer helped a little bit along with some TCP tuning, but nothing significant. I settled on bbcp (http://www.slac.stanford.edu/~abh/bbcp/) because it fit the use case perfectly. It can take data in and/or out via named pipe which is perfect for zfs send/receive. With mbuffer and bbcp, I can consistantly push the 75GB in under one hour. There is still a lot of room in our network, but this rate is quite resonable. bbcp introduced another security issue of concern. The data it sends over the Internet is not encrypted. The setup is all done via SSH, but the transfer channels it sets up are not secured. To deal with this the primary server generates an encyption key and passes it to the receiving end via SSH and pipes data through openssl before passing to bbcp. This had zero impact on our transfer rate. Memory Requirements: I hoped to make the receiving system to be a m1.micro instance. It fell over after only a small burst of data being sent to it. m1.small survived a little longer, but again fell over. At this point I learned that ZoL needs at least 4GB ram. I tried adding swap space on the instance storage, but it did not help. Then I moved to using m1.medium. It has 3.75GB ram. Seem to be working fine until I add bbcp to pipe and it too fell over after several GB of data transfer. I have not been able to get an m1.large to fall over. Unfortunately, just launching an m1.large cost 32 cents. Enter spot instances. m1.large can be had for a fraction of the cost, typically 2.9 cents an hour in my zone.However, it needs to be able to launch from an AMI image and must gracefully handle an unplanned termination. These requirements currently being worked out. UPDATE 28/Dec/2012: Turns out, large and even extra large instances will hang. This appears to be related to a recenly fixed bug in ZoL: https://github/zfsonlinux/spl/issues/174 I have not had time to test this yet. The zinger: In the middle of this development Amazon release their Glacier archive service. The storage cost is 1/10 the amount of EBS. It cannot have a ZFS file system natively, however doing something like an initial full zfs send and daily incrementals, each to an new 'archive' in the vault, a 90 day cycle could be created to start a new full set and incrementals. No file level recover would be possible, but that is what snapshots are for. So long as you don't have a high data change rate this could be utilized at a fraction of the cost. Another approach that might be feasable is using a file level archiving and sending increments by using 'zfs diff'. I would like to build a solution around Clacier, but is not currently on my road map. UPDATE 26/Dec/2012: Because of delays in our FDA project, this had been on hold for a little while. It's back in full swing now. The ZoL EC2 instance kept hanging, even on extra large instances, This seems to be attributed to a bug in ZoL. In the mean time I discovered the glacier-cmd python script. I have temporarily abandoned the EC2 instance and implemented scripts to utilize Glacier. There are some quirks with glacier-cmd script, but for the most part it is getting the job done. I'm closely following mt-aws-glacier at https://github.com/vsespb/mt-aws-glacier. It shows a lot of promise also. Once it has support for STDIN, I'll probably make the used of the two scripts modular. Requirements for EC2 backup: * m1.large or larger instance. Using anything less that 4GB ram runs into vmap allocation errors and kernel freezes when only a few GB have been inserted to ZFS pool. * ec2-api-tools need to be installed and in the search path or referenced in the zfs-config.sh file. * JAVA_HOME must be set * EC2 keys need to be on the system and referenced in the zfs-config.sh file If bbcp network accelerator is used bbcp binary must be in the excution $PATH and pwgen must be installed. If scripts are run on a machine other than the EC2 instance: (Recommended) * SSH Public key authentication needs to be configured between the controlling machine and the EC2 instance. * Since the EC2 instance will be under a different IP each boot turn off IP address checking on the ssh client by adding CheckHostIP no StrictHostKeyChecking ask to /etc/ssh/ssh_config If your primary system is Solaris/OpenIndiana many of the necessary packages can easily be installed from http://www.opencsw.org. First copy zfs-config.sh.example to zfs-config.sh and configure it for your machine. 1. To create the pool of EBS devices run create-zfs-volumes.sh. 2. Run create-zfs-pool.sh to create the pool. If you have specified encryption in the config you will be prompted for the encryption key. To expand your pool vertically, you must use raidz1 or better. 1. Change the EBS size in the zfs-config.sh file. 2. Run grow-zfs-pool.sh Using encryption: Make sure you have the right packages and modules installed. On Ubuntu do the following: sudo apt-get install cryptsetup sudo echo aes-x86_64 >> /etc/modules sudo echo dm_mod >> /etc/modules sudo echo dm_crypt >> /etc/modules Some commands will prompt you for the encryption key. You will need to supply the crypto key after boot by calling setup-crypto.sh and mountall. Running from a remote system: These scripts are designed to run from outside the or inside the EC2 instance using the Amazon EC2 API. Everything that must be run locally on the EC2 instance will be prepended with a variable called 'remote' that is set in the config file. This is meant to be something like 'ssh email@example.com'. Make sure you have public key authentication setup to your EC2 instance. The config script has other variables that must be set for this to work properly. See the config example for more information.