[TOC]
## WHATS THIS
It will download files for you if you give it RSS-feeds of your
favorite trackers.
How it works:
RSS -> leech -> files -> directory ---------------+
|
downloaded files <- client <- directory watcher <-+
leech is implemented in sh + curl + xsltproc + grep + sed + curl again.
For periodic checks you might want to use cron. E.g.
# crontab -l
*/30 * * * * CONFIG_DIR=/etc/leech DOWNLOADS_DIR=/mnt/usb/store/schedule /usr/sbin/leech
will run leech every 30 minutes checking feeds and downloading all matched
files.
## WHY ITS GOOD
* No long-running processes - no long-running memory consumption on
your NAS
* No Python/Perl/PHP/Java/Whatever required
* Still does the job
## USAGE
CONFIG_DIR="/etc/leech" DOWNLOADS_DIR="/mnt/downloads/schedule" leech
leech will download RSS-feeds specified in ``/etc/leech/foods``,
transform them with xsltproc into text, match against expressions in
``/etc/leech/wild-downloads`` (and ``/etc/leech/downloads``), exclude
files matching ``/etc/leech/reverse-downloads`` and will run cURL to
download everything to ``/mnt/downloads/schedule``.
``DOWNLOADS_DIR`` might be omitted to download files to current directory.
``CONFIG_DIR`` might also be omitted if you're fine with using
``/etc/leech`` by default.
You might also want to use ``sbin/leech-match-test`` to test if expressions
in your downloads configuration match filenames you need: [how to].
[how to]: #markdown-header-test-download-filters-configuration
## RECIPES
Recipe is a script leech calling to download a single file. There are
currently the following recipes available:
* ``leech-default`` - default, will just download files to
``$DOWNLOADS_DIR``
* ``leech-transmission`` - Transmission specific recipe, will add URLs
directly to Transmission instead of downloading to disk. Might be used
to workaround some issues (e.g. inotify is not compiled into your kernel
and Transmission crashes because of this)
* ``leech-defmission`` - download file with ``leech-default`` with cookies
authentication supported, then add downloaded file to Transmission
using ``leech-transmission``
* ``leech-aria2`` - aria2 specific recipe
See ``$CONFIG_DIR/default`` for examples of using recipes.
If you need to write your own recipe take a look at any of the recipes,
they are pretty straightforward.
### cooking with your own recipe
leech will set the following environment variables for the each
call of a recipe:
* ``$LEECH_URL`` - file URL extracted from feed
* ``$LEECH_DOWNLOADS_DIR`` - ``$DOWNLOADS_DIR`` originally passed
to leech or set by defaults
* ``$LEECH_TIMEOUT`` - URL processing timeout as set in configuration
* ``$LEECH_FEED_URL`` - feed URL ``$LEECH_URL`` originates from, as
* ``$LEECH_TARGET_DIR`` - download directory override for external
software (e.g. aria2)
One recipe call for each URL. Recipe should exit 0 on success and exit
with error code otherwise. Exit code will be printed to screen as in
"Downloading: ... Failed: 9 ()". stderr (if any) will be printed in
brackets as in "Downloading: ... Failed: 1 (can't connect to host
192.168.1.1:9091".
## KNOWN ISSUES
* OpenWrt's leech only supports Unicode encodings - see TROUBLESHOOTING
for workaround
* It only supports [RFC822][] dates in RSS - see TROUBLESHOOTING for
workaround
[RFC822]: http://www.ietf.org/rfc/rfc0822.txt
## HOWTO
### test download filters configuration
``leech-match-test`` can check if matching patterns in your
``$CONFIG_DIR`` are suitable for files you want do downloads.
Note that this tool **won't** actually download feeds from the Net, but
only test that filename you provided in command line does match patterns
in ``wild-downloads`` or ``downloads``.
It is this way because RSS feed might not have files you want at that
moment - feed could be truncated to a week or so, but you might still want
to check if file would be downloaded if it appears in feed, right?
leech-match-test "stuff.iso"
Or, if you're linking this filter to the feed from example.com:
leech-match-test "stuff.iso" "example.com"
### apply filters to specific feed only
You need to put feed's **host** in front of your filter expression. For
instance:
example.com i386
in ``$CONFIG_DIR/wild-downloads``, or
example.com.*i386
in ``$CONFIG_DIR/downloads``
> **Don't put entire URL with ampersands (&) and everything else -
this is unsupported and won't work**.
How this works: all filters are matched not only against filenames, but
against feed URL too. URL is always on the left, so you need to put
host on the same side of your filter expression, otherwise it might not
work.
### match everything in the specific feed
example.com
This will match anything that has http://example.com attached to it.
### match everything from every feed
Put the following into ``downloads``:
.*
This is usual wildcard you could use in regex-like ``downloads``
### reverse-matching filters
Put filters for files you want to avoid downloading under
``$CONFIG_DIR/reverse-downloads``. After leech find list of files matching
``$CONFIG_DIR/wild-downloads`` (and ``$CONFIG_DIR/downloads``), it will
also apply reverse-matching filters to this list and exclude unwanted
files.
### HTTP cookie authentication
You might pass ``$COOKIE`` enviroment variable to leech.
COOKIE="pw=mysecret;" /usr/sbin/leech
COOKIE="/home/alex/cookie.txt" /usr/sbin/leech
Internally leech uses cURL, thus cookies string/file should be compatible
with cURL. cURL understands ``key=value;`` cookies format and Netscape HTTP
Cookie File.
cURL man page: http://curl.haxx.se/docs/manpage.html
For cookies authentication in Transmission, set ``$DOWNLOAD_RECIPE``
to ``leech-defmission``, then use ``$COOKIE`` enviroment variable as
usual.
### override download directory in aria2 or Transmission
You can set ``$TARGET_DIR`` (in leech 1.6) variable either in environment,
e.g.
TARGET_DIR="/a/b/c/" /usr/sbin/leech
or uncomment it in ``leech/default`` to set for all leech invocations.
You also need to use a corresponding recipe which would set required
flags to override downloads directory.
Note though that if you set this variable in both: environment and config,
then config takes priority.
## WHATS INSIDE
* ``config/default`` - main configuration file
* ``config/foods`` - feeds file
* ``config/wild-downloads`` - rules for files downloading (simplified)
* ``config/downloads`` - rules for files downloading (using regular
expressions)
* ``config/reverse-downloads`` - reverse-matching rules for files
downloading (regexes)
* ``sbin/leech`` - main script
* ``sbin/leech-match-test`` - matching tool
* ``sbin/leech-default`` - default download recipe
* ``sbin/leech-transmission`` - Transmission specific recipe
* ``sbin/leech-defmission`` - default meets transmission
* ``sbin/leech-aria2`` - aria2 recipe
* ``sbin/leech-config`` - internal configuration stuff
* ``sbin/leech-wild-magic`` - a little bit of magic
* ``sbin/rfc822tounix`` - RFC 822 to Unix-time convertion utility
* ``share/leech/leech.xsl`` - XSL transformation (preprocessing stuff)
## INSTALL AS USER
It should work out of the box if you have xsltproc, curl and mktemp
installed.
* edit ``config/foods`` and add RSS feeds
* edit ``config/wild-downloads`` and add DL rules
Now you should be able to run ``CONFIG_DIR=config ./sbin/leech`` and see
it downloading feeds and files (if any, to current directory).
* ``crontab -e`` and add cron job as described above, with correct paths
to ``CONFIG_DIR``, ``DOWNLOADS_DIR`` and correct path to main script.
## INSTALL AS SUPERUSER
Check "Downloads" section, there should be package(s) you need. In case
they're not there, please email me about this problem
([aleksey.tulinov@gmail.com][]).
Installation process does everything you need for normal use (except cron
and downloads configuration). If you want to check whether it's running
correctly:
* ``leech`` will show you warning and force leech to run
Configuration files are under /etc/leech.
* edit ``/etc/leech/foods`` and add RSS feeds
* edit ``/etc/leech/wild-downloads`` and add DL rules
* ``crontab -e`` and add cron job as described above.
* (optional) don't forget to enable cron (if it's not):
``/etc/init.d/cron enable`` for OpenWRT
## TROUBLESHOOTING
If you think something is wrong, or just want to make sure if everything
is OK, you could always run leech in manual mode and observe its output.
See above, how to do it.
### it works from command line, but doesn't work in cron
You probably didn't put full path to script as in ``/usr/sbin/leech``.
Otherwise, you can see what happens if you write a log file, as in
*/30 * * * * CONFIG_DIR=/etc/leech DOWNLOADS_DIR=/mnt/schedule /usr/sbin/leech >>/mnt/leech.log 2>&1
Note that i've redirected stdout (``>>/mnt/leech.log``) and stderr
(``2>&1``) and each time leech runs it will print all output to
``/mnt/leech.log``.
If you did that and see that leech is complaining about some command not
being found (e.g. ``line 209: xsltproc: not found``) but this program is
definitely installed, take a look at this [SO question][] and answers to
it. It has detailed description of the issue when some program(s) might be
missing from ``$PATH`` when shell script is launched from cron, as well as
several suggestions how to fix this.
[SO question]: https://stackoverflow.com/questions/10129381/crontab-path-and-user
### leech prints error about parsing of feeds in some outdated encoding, e.g. cp1251
Try to re-encode your feed with web-service like [pipes.yahoo.com][]
Latter will give you UTF-8 encoded feed.
You might also e-mail webmaster of the feed and kindly remind him current
date. There is a [link][] to United States Naval Observatory Time Service
Department to prove that 1995 is over.
[pipes.yahoo.com]: http://pipes.yahoo.com
[link]: http://tycho.usno.navy.mil/simpletime.html
### leech doesnt download URLs with special characters
URL can't contain any special characters by design. All characters
outside of [very specific range][] should be [percent-encoded][] and
apparently someone forgot to do this.
Proper reference for e-mailing to webmaster: [RFC 3986][]
[very specific range]: http://en.wikipedia.org/wiki/Url#List_of_allowed_URL_characters
[percent-encoded]: http://en.wikipedia.org/wiki/Percent_encoding
[RFC 3986]: http://tools.ietf.org/html/rfc3986
### leech prints timestamp parsing error
WARNING: RSS timestamp (2012-07-17 04:34:08) can't be parsed correctly
RSS require timestamps to be in [RFC 822][] format. Having date
in other format means your feed is broken and doesn't follow standard. I
don't really want to support broken feeds, but leech will still work if you
set ``$HISTORY`` value in ``default`` to value greater than oldest record
in broken feed.
For instance, oldest record in feed is two weeks old - set ``$HISTORY``
to 15 days.
With ``$HISTORY`` set correctly, leech won't download files twice. You
could also send webmaster this link to RSS [specification][]. Hope this
helps.
[specification]: http://validator.w3.org/feed/docs/rss2.html#ltpubdategtSubelementOfLtitemgt
[RFC 822]: http://www.ietf.org/rfc/rfc0822.txt
### leech prints "Failed: N" on some (all) downloads
I would print human-readable error messages, but alas those messages are
nowhere to be found. You can consult cURL error [codes][] to see what is
happening.
Usually it means that server is down or there is typo somewhere in feed's
URL. One special case is code 33, which seems to be a cURL bug, shouldn't
appear often, just ignore it.
Code 48 is reported to appear if you have curl/libcurl versions mismatch.
[codes]: http://curl.haxx.se/libcurl/c/libcurl-errors.html
### leech doesnt download magnet links
If you see the following error:
Downloading: magnet?:... Failed: 6
You are probably using default (cURL) download recipe. Normal downloading
software (which cURL is) doesn't understand magnet links, you need torrent
client to download files linked by magnets. Therefore you either need to
ignore magnet download errors, or switch to Transmission/aria2 recipe.
Transmission-specific recipe will add magnet links directly to Transmission
using ``transmission-remote`` utility, Transmission will do all the magic
needed to magnetize link and download files.
## UNDER THE HOOD
Script will create temporary files in corresponding directory (``/tmp``
apparently):
* ``/tmp/leech.lunch.XXXXXX`` - contains downloaded feed
* ``/tmp/leech.wild.XXXXXX`` - merged dl rules from ``wild-downloads``
and ``downloads``
* ``/tmp/leech.reverse.XXXXXX`` - reverse-matching rules
It will also create ``.leech.db`` with list of already downloaded files in
``$PERSISTENCE`` or in ``$DOWNLOADS_DIR`` if ``$PERSISTENCE`` is not set
(by default). To ensure you privacy, this file contains only MD5 sum of
downloaded URLs and time when download happened. Database is periodically
cleared, old (not needed) records are deleted.
Files matched ``config/wild-downloads`` and ``config/downloads`` rules go
directly to ``$DOWNLOADS_DIR``. In case of incomplete file retrieval, cURL
will resume download.
## QUESTIONS?
[aleksey.tulinov@gmail.com][]
[aleksey.tulinov@gmail.com]: mailto:aleksey.tulinov@gmail.com