arnie: Simple Incremental Backups
In a single sentence, arnie is: a tremendously simple system for performing incremental backups to remote untrusted hosts, with support for encrypted files on the remote host. While our scripts are simple and recent, we believe they are robust and are in active use on the author's own servers, and we provide a comprehensive suite of tests to prove it.
Our goal is to provide a very simple and robust script to perform incremental backups.
The original context for the creation of this program was that we had access to a remote but untrusted online host with plenty of storage, where we wanted to send backup archives of a source code repository. We wanted incremental backups in order to minimize bandwidth requirements, and we wanted the archive files to be encrypted on the remote host (in case it gets broken into--it's untrusted).
We tried many other open source offerings for this problem, such as dar, duplicity, hdup2, etc., but writing a few simple tests--if you do not tests your backups, you have no backups--brought problems to the surface with most of these systems, and more complex backup solutions like Amanda required more setup time and work to setup than writing my own simplistic solution.
I suspect that the widespread availability of cheap online storage space (e.g. IBackup) will make this simple scheme for backups attractive for many other people. This is why I provide this code as open source.
The system consists in two self-contained Python scripts: the archiver (arnie-archive) and the restorer (arnie-restore).
Essentially, the arnie archiver creates a tar archive with all the files that have changed since the last backup. The history of what had been previously backed up is kept in a simple text file in the root of the archived directory (.arniehistory), and the full list of files available at the moment of backup is also included in the tar archive itself. Optionally, the history file can be stored in an alternate location (see options).
To archive, you typically run a command like this:
arnie-archive /path/to/important-dir remotehost:/path/to/backups
The name of the archive file generated is output on stdout. The date of the backup archive is contained in the filename, so do not rename the file (there is an option to select the prefix that the archive files are given). To perform a full backup, invoke the archive script with the --full option, or simply delete the history file before running the archiver.
Restoring backups is similarly very simple: we open the latest tar archive, get its list of necessary files to restore, and search backwards in time within all the available archive files to find them. To restore, you login to the remote host (or fetch the archives locally) and run:
arnie-restore /path/to/backups /path/to/restored-directory [<dir/file> ...]
The original archived directory will be restored to /path/to/restored-directory. After re-creating the files, the restore script normally runs a verification of the MD5 sums of all the restored files, so if something ever went wrong you will be notified at that point. If you do not want to restore the entire archive, you can specify the list of directories or files to restore (specified directories will be restored recursively).
Note that if the backup files are encrypted, you will need to decrypt them manually before running the restorer, for example:
gpg --decrypt-files backup.*
The restorer will ignore encrypted files in the backups directory.
In addition, the following features are provided (see options):
- Automatically compress archives in gzip or bzip2 format;
- Automatically encrypt archives using a GnuPG key (specify the key Id or name);
- Automatically send the archive file to a remote host using scp. This is just a convenience: alternatively you can capture the name of the archive and send it any way you like (ftp, other...);
- Tracks and restores permissions changes on directories and files;
- Works with empty directories;
- Works with symbolic links;
- An alternate location for the history file can be specified;
- Regexp patterns for excluding files in the backup can be provided;
- You can restore at any of the times the backups were made (the restore script has an option);
- The archives are simply stored as GNU tar files, so you can open them manually if so desired.
The arnie tools do not incrementally archive changes in user/group ids nor access, modification or creation times (however, they could very easily be modified to do so and I might do that at some point). They do, however, track permission changes, and obviously added files, deleted files, and file content changes (the file comparison is carried out using a simple MD5 sum).
Also note that the incremental archives produced by these programs are not of optimally minimal size, i.e. we do not store diffs, but rather we store entire files when they change, and we do not track file location moves. Therefore this might not be a suitable solution if you have a large binary file from which only a few blocks are changing between backups (the entire file will be backed up every time). Our solution is meant to work well on directories with many small to intermediate size files (such as a CVS source code repository).
There is no special manual, the manual consists in this README file and the --help switch will provide ample details about invocation. Run the scripts like this to get full details:
Simply copy the arnie/bin/arnie-archive and arnie/bin/arnie-restore scripts to /usr/bin.
If you want to run the test suite, download the source code and see the instructions under the test subdirectory (very simple, one Python script to run). In case you have a question, it should be extremely easy to add new tests to the test code.
Send email to the author: Martin Blais <email@example.com>.
If you would like to add new features, please contact the author for advice. Please contribute your changes or additional test code back to the author, we will be happy to include your changes if they are generic and could benefit others.
We have tried to minimize the dependencies of the arnie scripts in order to keep them simple and working in as many environments as posarniele.
- Python 2.4 or greater;
- If you use encryption of the archives, you need to have GnuPG installed (the gpg executable);
- If you send files to a remote host from the archive script, you need to have the ssh and scp clients installed.
We invoke the client programs rather than use Python bindings for these last few options, in order to avoid yet more dependencies.
We have not tested arnie under Windows yet, but there is no reason why it should not work. If you encounter problems or success using it on Windows, please let me know via email.
Copyright (C) 2005 Martin Blais. All Rights Reserved. This code is distributed under the GNU General Public License;
We are NOT responsible for any data loss resulting from the use of these programs. Please test your backups regularly, this is YOUR responsibility. If you do not test your backups, you do NOT have backups. Test the archive and restore on the actual data that you will use it own as well. If you don't understand the importance of this, we suggest you do not use the software.