HTTPS SSH

Somersault

Somersault is a backup tool designed to allow backing up to remote
storage. Initially Somersault only supports Amazon S3 database
(which provides cheap, reliable, remote storage), but future plans
include FTP, WebDAV, etc. Somersault's focus on backing up to
remote storage means that it will only transfer changed files
(similar to rsync). This makes backups quick and inexpensive.

This document explains Somersault, from install to usage to
support. I suggest reading the Install, Description, and
Instructions sections first. These will give you a feel for what
Somersault does and how to go about making a backup.

Somersault is released under the
Apache License, Version 2.0

Changes

Version 0.4.0

Version 0.3.1

Version 0.3.0

  • Added locking capabilities

    • Locks are used to avoid conflicts against the underlying file
      systems.
    • Locks are automatically acquired and released, there is no
      management needed by the user.
    • Added --unlock command to force unlocking if Somersault fails
      to release a lock.
  • Changed --backup back to --scenario (and re-documented the
    command to more accurately specify its purpose)

Version 0.2

  • Added a GUI browser for the contents of a File System (see new
    --browser command).
  • Changed --scenario to --backup (for clarity)
  • Added new "shared" filters and comparators; see SharedFilter
    and SharedComparator.

Install

Requirements

Installation Steps

  1. Extract Somersault package in a new directory
  2. (Optional) Install Unlimited Strength Java(TM) Cryptography
    Extension Policy Files (see FAQ)

Description

Somersault is a command-line only application. There is no GUI at
this time.

At its very core, all Somersault does is copy files from the source
file system to the destination file system. This process is broken
down into two steps: 1) figure out what actions to perform
(Building), 2) execute the actions (Execution). An action is
anything that Somersault needs to do: creating directories,
removing files, transferring files.

Somersault uses two mechanisms to decide which files to copy.
First, a user specified filter that indicates which files to copy.
Second, if the same file path exists in the destination, a user
specified comparator compares the source and destination file to
determine if the source is newer than the destination.

Somersault calls the source and destination locations "file
systems". There are two kinds of file systems: indexed and
non-indexed. An indexed file system is one where a special file,
called the index, stores all the information about the file system
that somersault needs. Also, only indexed file systems support
compression and encryption.

The reason for indexed file systems is usually performance: speed,
bandwidth, and/or cost. Because an indexed file system stores all
its information in a single file, Somersault can easily download
that single file and use just that index figure out what is
different between the source and destination. Hence it is usually
the destination that is indexed, but its possible for the source to
also be indexed and/or for the destination to not be indexed.

Somersault stores the source and destination definition, filter,
and other options in a file called a Scenario file. The Scenario
file is an XML file that is designed to be modified by hand. The
best way to go about creating a scenario file is to use the
"examples" option to generate example scenario files and
copy-and-paste from those to make your own.

Instructions

Here are the steps I suggest in creating a backup. (These are the
same steps I use to setup my backups.) These steps should be
followed in order, but you should expect to go back to previous
steps several times as you refine earlier steps.

  1. Think about what you want to backup - This step sounds obvious,
    but it requires more attention than one might think at first
    glance. Many people use lots of programs, and programs are not
    usually uniform about where they put their data. You'll come back
    to this step many times throughout this process.
  2. The "examples" option - Run Somersault with the "examples"
    option. This will generate lots of example Scenarios files that you
    can piece together to create your own Scenario file. I recommend
    starting simple and refining the Scenario file as you work through
    these instructions.
  3. The "buildonly" option - Run Somersault with your Scenario file
    and the "buildonly" option. This will create all the actions
    Somersault thinks it should run but it will stop before actually
    executing those actions. I suggest you look at the output (or log
    file) for the phrases "Filter Accept", "Filter Reject", "Comparator
    Accept", and "Comparator Reject". This tells you what Somersault
    wanted to backup and what it ignored. (You will only see the
    "Comparator" statements if the same file already exists in the
    destination.)
  4. The "dryrun" option - Run Somersault with your Scenario file
    and the "dryrun" option. This will execute a pretend backup --
    Somersault will go through the motions of backing up but will not
    actually perform a backup. Its a good opportunity to check if you
    encounter any permission errors when Somersault tries to actually
    open the files.
  5. Backup! - Run Somersault with your Scenario file and without
    the "dryrun" option to perform your backup.
  6. Re-run the backup - I suggest running the previous step again.
    You should end with zero actions because the source and destination
    are the same.
  7. Test restore - I strongly suggest that you run a restore to
    test your backup. This is a step many people skip, and then later
    find out that the backup is not what they wanted. There are two
    ways you can do this:

    • Run Somersault with the "restore" and "buildonly" options. You
      should end up with zero actions.
    • Create a new Scenario file based on your original Scenario
      file, change the source to some temporary location, run a full
      restore to that new (empty) source. You can then browse the
      temporary location and see what was actually backed up.
  8. Store your Scenario file in a safe place - Store your Scenario
    file in a place other than your backup. You will need the Scenario
    file to restore your data if you loose your original data. This is
    especially important if you use encryption.

Commands

Commands direct Somersault to do something.
You must specify one (and only one) command.

--scenario <filename>

Specifies the scenario file name (or full file path), which makes
the destination mirror the source. By default, with no other
command specified, the source will be backed up to the
destination.

--examples <directory>

Writes several example scenario files to the provided directory.
This is the best way to start writing your own scenario file.

--browser <filename>

Allows browsing the file system using a GUI.

--unlock

Forcibly removes all locks from the destination file system. This
should be used if Somersault crashes or (more likely) Somersault is
forcibly killed by the user. This command was added with the
locking feature that prevents multiple Somersaults from working on
the same file system.

Options

Options modify the way the above Commands work.

--restore

A restore reverses the backup operation to copy files from the
destination to the source. A restore will NOT delete files from the
destination that do not exist in the source, and will NOT overwrite
files with directories and visa versa. It is highly recommended
that you restore to an empty directory.

If used with the "browser" command, the destination will be
displayed instead of the source.

--reconcile

Reconcile is a correction operation that should be run if a backup
is halted or crashes. Reconciliation should only be applied to
indexed FileSystems. Reconciliation compares the index and the raw
data stored in the FileSystem, and deletes any data that does not
match up, whether it is extra data in the FileSystem or extra
entries in the index.

--dryrun

No changes will be applied to the destination. Somersault goes
through all the motions of performing a backup but does not modify
the destination. (Note: There may be some minor differences between
an actual run and a dry-run. The most notably difference is
performance -- so files are not being transferred the backup is
much faster.) This option does nothing if "buildonly" is
specified.

--buildonly

Creates actions but does not execute them. If there are zero
actions the two file systems are equal (according to the filter in
the scenario file).

--gui

Displays a GUI with more detailed status information while
Somersault is working. Highly recommended when not using Somersault
as part of a unattended script. The GUI will not close
automatically when finished, so you can see any final results if
unattended.

Filters

The various filters provided by Somersault are the key to deciding
which files to backup. They are also used to figure out which files
should be compressed.

A filter is a boolean
expression, that means it either accepts or rejects each file
tested. Complex filters with multiple checks are combined using the
AndFilter and OrFilter.

NameFilter

This is probably the most useful filter. It filters on the file or
directory name using a

PathFilter

The best filter for large scale pruning of the directories
searched. This filters on the parent path for a file or directory
using
regular expressions.
Directories are separated by the "/" character. The path is from
the root of the source FileSystem, which may be a sub-directory of
the real file system. To match a specific directory, use something
like "/dirname/".

AndFilter

Logically AND's multiple filters together. If all sub-filters
accept the file, then the AndFilter accepts the file too.

OrFilter

Logically OR's multiple filters together. If one or more of the
sub-filters accept the file, then the OrFilter accepts the file
too.

BooleanFilter

This is not really a "filter" per-say, depending on how you set it
up it either accepts or rejects all files.

FileFilter

Accepts files, rejects directories. This will most commonly be used
to quality other filters to apply only to files.

DirectoryFilter

Accepts directories, rejects files. This will most commonly be used
to quality other filters to apply only to directories. This filter
reeturns false for files.

FileAttributesFilter

Filters on a file's attributes: executable, readable, writable.
This filter returns true for directories, because that simplifies
the most common usage -- when not inside a NotFilter.

NotFilter

Reverses (or negates) a child filter. If the child filter accepts,
this will reject, if the child filter rejects, this accepts. This
is most useful for excluding files that match other filters.

SizeFilter

Accepts files that fall within the size range. The range is in
bytes and the end values are inclusive. This filter returns true
for directories, because that simplifies the most common usage --
when not inside a NotFilter.

DepthFilter

Filters on the depth of the file or directory. The depth is a range
and the end values are inclusive. Root files are depth 1.

SharedFilter

Allows sharing filters between multiple Scenarios. This filter
performs no filtering itself but allows loading filters stored in a
separate file. The purpose of this is to allow multiple Scenarios
can use the same filters. You can use relative or absolute paths.
If relative paths are used, they are resolved from the current
file's location.

Comparators

The various compaarators provided by Somersault determine whether a
file has been changed. They are only used when a file exists in
both the source and destination file systems.

BooleanComparator

This is not really a "comparator" per-say, depending on how you set
it up it either always consider a file different or same. The value
true means always different, false means always the same.

HashComparator

Hashes the files and compares the hashes to determine if the files
are different. This uses the Hash Info specified in the Scenario
file. The Hash Info should be the same between two file systems
(otherwise the Hashes will always be different). Indexed file
systems store the hash in the index to avoid re-computing the hash
every time.

ModifyDateComparator

Compares the last modified data to determine if the files are
different. Files with last modified dates less than "granularityMs"
are considered to be unchanged.

MultiSomerFileComparator

A special comparator that allows multiple comparators to be used in
succession. If any of the child comparators decide that the files
are different, the files will be considered different. Otherwise
the files are considered the same.

SizeComparator

Compares the files' sizes to determine if they are different.

SharedComparator

Allows sharing comparators between multiple Scenario. This
comparator performs no comparing itself but allows loading
comparators stored in a separate file. The purpose of this is to
allow multiple Scenarios can use the same comparators. You can use
relative or absolute paths. If relative paths are used, they are
resolved from the current file's location.

Encryption

Somersault supports encryption via the
Bouncy Castle libraries.
Bouncy Castle are one of most well
known the stalwarts encryption libraries. The raw API allows using
any encryption algorithm that the Bouncy Castle library supports.

The JCE is not support simply because its a huge pain to setup
correctly. It requires modifying the JVM installation, which normal
users are unlikely to do. Hence JCE is not supported. (It was at one
time but has been removed.)

FAQ

Somersault has limited ability to handle symbolic links since it is
written in Java. It does its best not follow them, but the
implementation is not perfect.
(Java 7
will have better support for identifying symbolic links.)

How are file-permissions handled?

Somersault has extremely limited support for file-permissions since
it is written in Java. Somersault can read any files that the
current user has access in the source, and writes files to the
destination with the current user as the owner.
(Java 7
will have better support for identifying symbolic links.)

Why does Somersault use regular expression instead of the simpler glob (*.*) style patterns?

Simply, regular expressions are more powerful. They offer
flexibility that glob patterns cannot match. Any glob pattern can
be easily converted into a regular expression. All "*" are
replaced by ".*", the pattern should start with "(?i)" to make the
pattern case-insensitive, and the pattern should start and end with
the ".*" pattern.

Using "\" in a PathFilter doesn't work on MS Windows!

Somersault only uses "/" for directory separator characters for all
operating systems.

Somersault uses a lot of memory!

First, no one has actually said that to me, I'm pre-empting the
question. Second, Somersault is designed to use as little memory as
possible, but Somersault uses Java, and Java likes to allocate big
chunks of memory, more than the application actually needs. Each
file processed requires new resources, but once done with a file
Somersault releases those resources. But Java doesn't necessarily
give the memory back to the OS. If you need to keep Somersault's
memory usage under control, use the various memory control options
when running Somersault
(link).
Remeber, unused memory is wasted memory. Somersault is and used
under fairly heavy load (tens of thousands of files).