HTTPS SSH

This is a collection of simple scripts for incremental backup schedules with
Dar.

Files

  • save_first.sh BASE_PATHS+ - create full backup
  • save_inc.sh BASE_PATHS+ - create new backup increment in reference to a
    previous one or a full backup
  • save_meta.sh - backup important meta data (e.g. manually
    selected packages, mounts, partitions etc.)
  • ftp_push.sh - push backup files to ftp(s) server
  • ftp_cleanold.sh - remove old backups from local cache directory
    and remote ftp(s) server

Example

# crontab -l
# m h  dom mon dow   command
00  08 *   *   1     cd /srv/backup && ~/backup-scripts/save_first.sh /etc /home /var && silence ~/backup-scripts/ftp_push.sh config.sh
00  08 *   *   2-7   cd /srv/backup && ~/backup-scripts/save_inc.sh   /etc /home /var && silence ~/backup-scripts/ftp_push.sh config.sh
00  09 28  *   *     cd /srv/backup && silence ~/backup-scripts/ftp_cleanold.sh config.sh

Meaning that every Monday at 8 o'clock a full backup is done and every other
day an incremental one. Every day new backup data is transferred to a remote
sftp server. The directory /srv/backup is a local cache directory (the scripts
create the backup files in the current working directory (cwd)). At the end of
each month the backup data of the previous month is deleted locally and
remotely.

The silence utility retains the output unless something goes wrong
(i.e. exit status is unequal zero). Alternatively, chronic from moreutils
can be used for the same purpose.

FTPS

The backup scripts take as arguments one or more backup base paths. The ftp
scripts take as argument a config file which defines the ftp command, e.g. for
ftps:

$ cat config.sh
FTP="ftp-ssl -e -z secure backup.example.net"

Also the config file ~/.netrc should contain the configuration of the ftp
client:

machine backup.example.net
login juser
password geheim

SFTP

With sftp, i.e. file transfer over ssh, a configuration could look like this:

$ cat config.sh
FTP="lftp -e 'source ~/sftp.lftp'"
$ cat sftp.lftp
open sftp://juser:geheim@backup.example.net

Note that public-key authentication with the OpenSSH sftp client should be
preferred, but in case only keyboard authentication is avaible (think: existing
legacy appliance), the lftp client makes it easy to non-interactively supply a password.

Extra Options

Sometimes it is convenient to add extra flags to dar - for example for
excluding certain subtrees from backup. You can do this via setting the
DARFLAGS environment variable. For example:

$ DARFLAGS="-P www/testdir" bash save_first.sh /var

Another use case for DARFLAGS is to enable compression:

$ DARFLAGS="-z" bash save_first.sh /var

Encryption

When pushing backups to a backup server there is always the possibility that it
gets compromised. Even more so when the backup server is not under your control.

Thus, for some use cases it make sense to encrypt the backups before pushing them
to a backup server.

Fortunately, dar already supports symmentric encryption/decryption via command
line options. One has just to take care that the keys are supplied via a
separate run control file and not as direct command line arguments. Otherwise
they would show up in ps/top/etc and thus would be observable by other users.

The above example modified to enable encryption:

00  08 *   *   1     cd /srv/backup && DARFLAGS="-B /home/juser/dar.cfg" silence ~/backup-scripts/save_first.sh /etc /home /var
00  08 *   *   2-7   cd /srv/backup && DARFLAGS="-B /home/juser/dar.cfg" silence ~/backup-scripts/save_inc.sh   /etc /home /var

Where the run control file contains the keys, e.g.:

$ cat dar.cfg
all:
-K muchsecret
reference:
-J muchsecret

The first key is used for encrypting the archive, the 2nd one is used for
reading a reference archive when doing an incremental backup.

The constructs all: and reference: are targets that are defined by
Dar's conditional configuration syntax. Dar versions before 2.4 don't
recognize reference, thus, with old versions, one has to work-around that
via maintaining two separate configuration files for the first and further
incremental backups. Just specifying both -K and -J under the global
target yields the Dar warning '-J is only useful with -A option, for the
archive of reference'.

Note that (at least with some dar versions) long options are not supported
in the run control file.

Make sure that the run control file is only readable by the backup user.

Contact

I appreciate feedback and comments:

mail@georg.so
gsauthof@sdf.lonestar.org

Restore

Note that when dealing with encrypted archiving one has to add a
-B dar.cfg switch to each dar command where dar.cfg contains
-K mypassword.

Full restore

# for i in `ls 2011-07-1[1-5]_-home*.dar`; do dar -x ${i%.1.dar} -wa; done

This restores the full backup of /home, beginning with the first full backup
(monday, 2011-07-11) and then using increments from 2011-07-12 until
2011-07-15. Assuming that the output of ls is lexicographically sorted (which is
the default).

Restore a few files

To get a listing of all files:

$ dar -l full_or_incremental_basename

Or via dry run (-e) extract (-x):

$ dar -v -e -x full_or_incremental_basename

A specific file - dry run first (-e):

$ dar -v -e -g some/path/file -x full_or_incremental_basename

Dependencies

File scheme

The archive files created by save_first.sh and save_inc.sh are named like this:

2011-07-11_-home_inc_0_.1.dar
2011-07-11_-var_inc_0_.1.dar
2011-07-12_-etc_inc_1_.1.dar
2011-07-12_-home_inc_1_.1.dar
2011-07-12_-var_inc_1_.1.dar
2011-07-13_-etc_inc_2_.1.dar
2011-07-13_-home_inc_2_.1.dar
2011-07-13_-var_inc_2_.1.dar

Where the .1.dar suffix is used by dar, the '/' is encoded as '-', '_' is used
as delimiter and the '0-increment' is the full backup.

Alternatives

Obnam

An alternative to use for incremental backups is obnam. Obnam
stands for 'obgligatory name'. Despite its awful name it has an impressive
feature list and thus is a candidate to consider. Its manual is
commendable, it clearly describes the well designed command line interface
and explains different use cases.

Unfortunately, the performance of Obnam is suboptimal. For example, in a
simple test, the initial backup of a typical home directory (60 GB, local
disk) was done at 2.4 MiB/s - where the destination was a USB 2.0 disk
drive (the tested version of Obnam was 1.6.1 on a Fedora 20 system with
lots of RAM/GHz). In that test advanced features like compression and
encryption were even disabled - and since it was the initial backup, no
catalogue comparisons were done.

See also a related stack exchange question for similar reports.

In comparison, other backup programs are write-limited by the maximal write
rate of that USB 2.0 device ( ~ 28 MiB/s) - even when features like
compression, encryption and incremental mode are enabled.

Conclusion: Obnam is too slow to be useful.

BTRFS based backup

BTRFS is a copy on write file system that supports fast
snapshotting. Thus, using it for for incremental backups suggests itself.
Basically, the backup involves simply rsyncing locations to a BTRFS mount,
creating daily/weekly etc. read-only snapshots (which are normal filesystem
locations) and that's it. For encryption, the BTRFS filesystem can be
created on a luks-encrypted device-mapper device.

See also btrarch, which is a collection of scripts to automate
such a backup scheme.

Advantages:

  • speed - especially when doing incremental backups I've observed a
    speedup of 2 against Dar. In my tests I used rsync in whole-file-copy
    mode (which is also the default when syncing between local disks), thus,
    the speedup does not come from a reduced number of transfered bytes.
  • easy retrieval - the restore of the last or any previous snapshots
    can be done via simple filesystem commands. No need to restore
    several increments on each other or to construct some kind of
    catalogue.
  • data integrity - since BTRFS checksums all filesystem data, errors
    are detected. The checksums are verified during normal filesystem
    operation - but it is also possible to explicitly verify
    a complete volume (cf. btrfs-scrub(8)).