Clone wiki

warc-tools / Home

WARC Tools

This is the WARC Tools project by Hanzo, helped by Internet Archive, and supported by the International Internet Preservation Consortium.

Coming Soon

We haven't got around to adding more documentation yet, but we will.

Meta

This wiki uses the Creole syntax, and is fully compatible with the 1.0 specification.

The wiki itself is actually a hg repository, which means you can clone it, edit it locally/offline, add images or any other file type, and push it back to us. It will be live immediately.

Go ahead and try:

$ hg clone http://bitbucket.org/hanzo/warc-tools/wiki

Wiki pages are normal files, with the .wiki extension. You can edit them locally, as well as creating new ones.

Syntax highlighting

You can also highlight snippets of text, we use the excellent Pygments library.

Here's an example of some Python code:

def wiki_rocks(text):
	formatter = lambda t: "funky"+t
	return formatter(text)

You can check out the source of this page to see how that's done, and make sure to bookmark the vast library of Pygment lexers, we accept the 'short name' or the 'mimetype' of anything in there.

Have fun!

Installation Guide

  1. Download Hanzo Warc Tools http://code.hanzoarchives.com/warc-tools/downloads
  2. Install dependencies
    Fedora Tip: Search for the below library names by navigating Applications -> System Tools -> Add/Remove Software
    1. Python Setup tools (python3-setuptools and python-setuptools)
    2. Python Unittest (python-unittest2)
    3. Python 2.6

  3. Extract files
    $ tar -xvf hanzo-warc-tools-f8cd94bebe53.tar.gz
  4. Install
    1. Navigate to the extracted directory
    2. Run setup
      $ ./setup.py build
      $ ./setup.py install
  5. Now the following commands should be available via the terminal prompt:
    warcvalid.py
    warcdump.py
    warcfilter.py
    warc2warc.py
    arc2warc.py
    warcindex.py

Updated