Author: Marc 'BlackJack' Rintsch
Date: 2014-04-26
Version: 0.4
Copyright: This document has been placed in the public domain.

1   Name -- an archiver.

2   Synopsis [-h|--help|--version] [options] file(s)

3   Description

The program creates compressed tar archives from files and directories. In contrast to the original tar it builds a list of file names first and sorts it in a way that (should) give a better compression ratio.

It also makes the common task of archiving one directory a bit easier by providing a short option that infers the file name of the archive from the name of the directory and the compression algorithm. See Examples for details.

3.1   Why sorting the names?

Way back in the old DOS days the RAR archiver had, and still has, three advantages over ZIP archives when it comes to compression ratio:

  1. RAR uses a slower but better compression algorithm than the standard deflate algorithm of ZIP archives,
  2. it creates solid archives instead of compressing each file separatly to benefit from redundancy between files,
  3. and files are grouped by file name extensions in order to have files with similar contents close to each other and benefit from 2. even more.

With tar archives 1. is true if bzip2 compression is used and 2. is always true as a tar archive is created first and then compressed as a whole.

But the grouping by file name extensions is not done by the standard tar programs. This is what is doing.

4   Requirements

The script requires Python 2.7. Optionally the lzma module can be used to create LZMA compressed archives.

5   Commandline Options

--version show program's version number and exit
-h, --help show this help message and exit
 write archive to this file instead of STDOUT.
-a, --auto-name
 infer archive file name from the first given directory name. This only works if there is just one directory name given as argument. The archive is named: <directory_name>.tar[.<algorithm>]
--list just dump the sorted file names to STDOUT -- don't create archive.
 select compression algorithm from none, gzip, bzip2, or lzma. [bzip2]
-z, --gzip use gzip compression.
-j, --bzip2 use bzip2 compression. [default]
-J, --lzma use LZMA compression.
-b BLOCKS, --blocking-factor=BLOCKS
 BLOCKS x 512 per record [20]

6   Examples

Compress the contents of directories and all their subdirectories: foo/ > foo.tar.bz2 -o foo_and_bar.tar.bz2 foo/ bar/

Create the archives foo.tar.bz2 and bar.tar.gz with the auto naming option: -a foo/ --auto-name --gzip bar/

7   History

0.4.0: 2014-04-26

LZMA compression added if the lzma module is available. BZIP2 is still the default compression method.

Requirements bumped to Python 2.7.

0.3.0: 2005-08-26

Added -b/--blocking-factor option. Setting it to 1 prevents some blocks full of zero bytes to be appended to the archive. May save some bytes, but generally those blocks are compressed very effectivly anyway.

The program does not crash anymore if it comes across files that can't be read. A warning is printed instead.

Directories given at the command line are archived now too. Before this fix only the contents of the directory were archived but not the top level directory name itself.

0.2a : 2005-01-23

Fixed a really stupid bug that made creating archives with redirecting the output into a file impossible.

While compressing each file name is written to stderr and prefixed with the percentage of files already processed.

The user can select the compression algorithm (none, gzip or bzip2) and the auto naming feature (-a) was implemented.

0.1a : 2005-01-22
Initial release. Can be used to create archives but has a severe bug: it silently ignores problems while creating the file list.

8   ToDo

  • Sort extensions by extension list instead of alphabetically.
  • Group backup files (*{~,.bck,.bak}) with their "master" files and sort just by name without extension.
  • Exclude list/patterns.
  • Option to add a prefix to every file.
  • Change attributes like uid/gid, uname/gname.
  • Color output with ANSI escape sequences if stderr is a tty.

9   Bugs

  • Silently ignores problems while creating the file list.