HTTPS SSH

Somersault

Somersault is a backup tool designed to allow backing up to remote storage. Initially Somersault only supports Amazon S3 database (which provides cheap, reliable, remote storage), but future plans include FTP, WebDAV, etc. Somersault's focus on backing up to remote storage means that it will only transfer changed files (similar to rsync). This makes backups quick and inexpensive.

This document explains Somersault, from install to usage to support. I suggest reading the Install, Description, and Instructions sections first. These will give you a feel for what Somersault does and how to go about making a backup.

Somersault is released under the Apache License, Version 2.0

Changes

Version 0.4.0

Version 0.3.1

Version 0.3.0

  • Added locking capabilities

    • Locks are used to avoid conflicts against the underlying file systems.
    • Locks are automatically acquired and released, there is no management needed by the user.
    • Added --unlock command to force unlocking if Somersault fails to release a lock.
  • Changed --backup back to --scenario (and re-documented the command to more accurately specify its purpose)

Version 0.2

  • Added a GUI browser for the contents of a File System (see new --browser command).
  • Changed --scenario to --backup (for clarity)
  • Added new "shared" filters and comparators; see SharedFilter and SharedComparator.

Install

Requirements

Installation Steps

  1. Extract Somersault package in a new directory
  2. (Optional) Install Unlimited Strength Java(TM) Cryptography Extension Policy Files (see FAQ)

Description

Somersault is a command-line only application. There is no GUI at this time.

At its very core, all Somersault does is copy files from the source file system to the destination file system. This process is broken down into two steps: 1) figure out what actions to perform (Building), 2) execute the actions (Execution). An action is anything that Somersault needs to do: creating directories, removing files, transferring files.

Somersault uses two mechanisms to decide which files to copy. First, a user specified filter that indicates which files to copy. Second, if the same file path exists in the destination, a user specified comparator compares the source and destination file to determine if the source is newer than the destination.

Somersault calls the source and destination locations "file systems". There are two kinds of file systems: indexed and non-indexed. An indexed file system is one where a special file, called the index, stores all the information about the file system that somersault needs. Also, only indexed file systems support compression and encryption.

The reason for indexed file systems is usually performance: speed, bandwidth, and/or cost. Because an indexed file system stores all its information in a single file, Somersault can easily download that single file and use just that index figure out what is different between the source and destination. Hence it is usually the destination that is indexed, but its possible for the source to also be indexed and/or for the destination to not be indexed.

Somersault stores the source and destination definition, filter, and other options in a file called a Scenario file. The Scenario file is an XML file that is designed to be modified by hand. The best way to go about creating a scenario file is to use the "examples" option to generate example scenario files and copy-and-paste from those to make your own.

Instructions

Here are the steps I suggest in creating a backup. (These are the same steps I use to setup my backups.) These steps should be followed in order, but you should expect to go back to previous steps several times as you refine earlier steps.

  1. Think about what you want to backup - This step sounds obvious, but it requires more attention than one might think at first glance. Many people use lots of programs, and programs are not usually uniform about where they put their data. You'll come back to this step many times throughout this process.
  2. The "examples" option - Run Somersault with the "examples" option. This will generate lots of example Scenarios files that you can piece together to create your own Scenario file. I recommend starting simple and refining the Scenario file as you work through these instructions.
  3. The "buildonly" option - Run Somersault with your Scenario file and the "buildonly" option. This will create all the actions Somersault thinks it should run but it will stop before actually executing those actions. I suggest you look at the output (or log file) for the phrases "Filter Accept", "Filter Reject", "Comparator Accept", and "Comparator Reject". This tells you what Somersault wanted to backup and what it ignored. (You will only see the "Comparator" statements if the same file already exists in the destination.)
  4. The "dryrun" option - Run Somersault with your Scenario file and the "dryrun" option. This will execute a pretend backup -- Somersault will go through the motions of backing up but will not actually perform a backup. Its a good opportunity to check if you encounter any permission errors when Somersault tries to actually open the files.
  5. Backup! - Run Somersault with your Scenario file and without the "dryrun" option to perform your backup.
  6. Re-run the backup - I suggest running the previous step again. You should end with zero actions because the source and destination are the same.
  7. Test restore - I strongly suggest that you run a restore to test your backup. This is a step many people skip, and then later find out that the backup is not what they wanted. There are two ways you can do this:

    • Run Somersault with the "restore" and "buildonly" options. You should end up with zero actions.
    • Create a new Scenario file based on your original Scenario file, change the source to some temporary location, run a full restore to that new (empty) source. You can then browse the temporary location and see what was actually backed up.
  8. Store your Scenario file in a safe place - Store your Scenario file in a place other than your backup. You will need the Scenario file to restore your data if you loose your original data. This is especially important if you use encryption.

Commands

Commands direct Somersault to do something. You must specify one (and only one) command.

--scenario <filename>

Specifies the scenario file name (or full file path), which makes the destination mirror the source. By default, with no other command specified, the source will be backed up to the destination.

--examples <directory>

Writes several example scenario files to the provided directory. This is the best way to start writing your own scenario file.

--browser <filename>

Allows browsing the file system using a GUI.

--unlock

Forcibly removes all locks from the destination file system. This should be used if Somersault crashes or (more likely) Somersault is forcibly killed by the user. This command was added with the locking feature that prevents multiple Somersaults from working on the same file system.

Options

Options modify the way the above Commands work.

--restore

A restore reverses the backup operation to copy files from the destination to the source. A restore will NOT delete files from the destination that do not exist in the source, and will NOT overwrite files with directories and visa versa. It is highly recommended that you restore to an empty directory.

If used with the "browser" command, the destination will be displayed instead of the source.

--reconcile

Reconcile is a correction operation that should be run if a backup is halted or crashes. Reconciliation should only be applied to indexed FileSystems. Reconciliation compares the index and the raw data stored in the FileSystem, and deletes any data that does not match up, whether it is extra data in the FileSystem or extra entries in the index.

--dryrun

No changes will be applied to the destination. Somersault goes through all the motions of performing a backup but does not modify the destination. (Note: There may be some minor differences between an actual run and a dry-run. The most notably difference is performance -- so files are not being transferred the backup is much faster.) This option does nothing if "buildonly" is specified.

--buildonly

Creates actions but does not execute them. If there are zero actions the two file systems are equal (according to the filter in the scenario file).

--gui

Displays a GUI with more detailed status information while Somersault is working. Highly recommended when not using Somersault as part of a unattended script. The GUI will not close automatically when finished, so you can see any final results if unattended.

Filters

The various filters provided by Somersault are the key to deciding which files to backup. They are also used to figure out which files should be compressed.

A filter is a boolean expression, that means it either accepts or rejects each file tested. Complex filters with multiple checks are combined using the AndFilter and OrFilter.

NameFilter

This is probably the most useful filter. It filters on the file or directory name using a

PathFilter

The best filter for large scale pruning of the directories searched. This filters on the parent path for a file or directory using regular expressions. Directories are separated by the "/" character. The path is from the root of the source FileSystem, which may be a sub-directory of the real file system. To match a specific directory, use something like "/dirname/".

AndFilter

Logically AND's multiple filters together. If all sub-filters accept the file, then the AndFilter accepts the file too.

OrFilter

Logically OR's multiple filters together. If one or more of the sub-filters accept the file, then the OrFilter accepts the file too.

BooleanFilter

This is not really a "filter" per-say, depending on how you set it up it either accepts or rejects all files.

FileFilter

Accepts files, rejects directories. This will most commonly be used to quality other filters to apply only to files.

DirectoryFilter

Accepts directories, rejects files. This will most commonly be used to quality other filters to apply only to directories. This filter reeturns false for files.

FileAttributesFilter

Filters on a file's attributes: executable, readable, writable. This filter returns true for directories, because that simplifies the most common usage -- when not inside a NotFilter.

NotFilter

Reverses (or negates) a child filter. If the child filter accepts, this will reject, if the child filter rejects, this accepts. This is most useful for excluding files that match other filters.

SizeFilter

Accepts files that fall within the size range. The range is in bytes and the end values are inclusive. This filter returns true for directories, because that simplifies the most common usage -- when not inside a NotFilter.

DepthFilter

Filters on the depth of the file or directory. The depth is a range and the end values are inclusive. Root files are depth 1.

SharedFilter

Allows sharing filters between multiple Scenarios. This filter performs no filtering itself but allows loading filters stored in a separate file. The purpose of this is to allow multiple Scenarios can use the same filters. You can use relative or absolute paths. If relative paths are used, they are resolved from the current file's location.

Comparators

The various compaarators provided by Somersault determine whether a file has been changed. They are only used when a file exists in both the source and destination file systems.

BooleanComparator

This is not really a "comparator" per-say, depending on how you set it up it either always consider a file different or same. The value true means always different, false means always the same.

HashComparator

Hashes the files and compares the hashes to determine if the files are different. This uses the Hash Info specified in the Scenario file. The Hash Info should be the same between two file systems (otherwise the Hashes will always be different). Indexed file systems store the hash in the index to avoid re-computing the hash every time.

ModifyDateComparator

Compares the last modified data to determine if the files are different. Files with last modified dates less than "granularityMs" are considered to be unchanged.

MultiSomerFileComparator

A special comparator that allows multiple comparators to be used in succession. If any of the child comparators decide that the files are different, the files will be considered different. Otherwise the files are considered the same.

SizeComparator

Compares the files' sizes to determine if they are different.

SharedComparator

Allows sharing comparators between multiple Scenario. This comparator performs no comparing itself but allows loading comparators stored in a separate file. The purpose of this is to allow multiple Scenarios can use the same comparators. You can use relative or absolute paths. If relative paths are used, they are resolved from the current file's location.

Encryption

Somersault supports encryption via the Bouncy Castle libraries. Bouncy Castle are one of most well known the stalwarts encryption libraries. The raw API allows using any encryption algorithm that the Bouncy Castle library supports.

The JCE is not support simply because its a huge pain to setup correctly. It requires modifying the JVM installation, which normal users are unlikely to do. Hence JCE is not supported. (It was at one time but has been removed.)

FAQ

Somersault has limited ability to handle symbolic links since it is written in Java. It does its best not follow them, but the implementation is not perfect. (Java 7 will have better support for identifying symbolic links.)

How are file-permissions handled?

Somersault has extremely limited support for file-permissions since it is written in Java. Somersault can read any files that the current user has access in the source, and writes files to the destination with the current user as the owner. (Java 7 will have better support for identifying symbolic links.)

Why does Somersault use regular expression instead of the simpler glob (*.*) style patterns?

Simply, regular expressions are more powerful. They offer flexibility that glob patterns cannot match. Any glob pattern can be easily converted into a regular expression. All "*" are replaced by ".*", the pattern should start with "(?i)" to make the pattern case-insensitive, and the pattern should start and end with the ".*" pattern.

Using "\" in a PathFilter doesn't work on MS Windows!

Somersault only uses "/" for directory separator characters for all operating systems.

Somersault uses a lot of memory!

First, no one has actually said that to me, I'm pre-empting the question. Second, Somersault is designed to use as little memory as possible, but Somersault uses Java, and Java likes to allocate big chunks of memory, more than the application actually needs. Each file processed requires new resources, but once done with a file Somersault releases those resources. But Java doesn't necessarily give the memory back to the OS. If you need to keep Somersault's memory usage under control, use the various memory control options when running Somersault (link). Remeber, unused memory is wasted memory. Somersault is and used under fairly heavy load (tens of thousands of files).