HTTPS SSH
[TOC] ## WHATS THIS It will download files for you if you give it RSS-feeds of your favorite trackers. How it works: RSS -> leech -> files -> directory ---------------+ | downloaded files <- client <- directory watcher <-+ leech is implemented in sh + curl + xsltproc + grep + sed + curl again. For periodic checks you might want to use cron. E.g. # crontab -l */30 * * * * CONFIG_DIR=/etc/leech DOWNLOADS_DIR=/mnt/usb/store/schedule /usr/sbin/leech will run leech every 30 minutes checking feeds and downloading all matched files. ## WHY ITS GOOD * No long-running processes - no long-running memory consumption on your NAS * No Python/Perl/PHP/Java/Whatever required * Still does the job ## USAGE CONFIG_DIR="/etc/leech" DOWNLOADS_DIR="/mnt/downloads/schedule" leech leech will download RSS-feeds specified in ``/etc/leech/foods``, transform them with xsltproc into text, match against expressions in ``/etc/leech/wild-downloads`` (and ``/etc/leech/downloads``), exclude files matching ``/etc/leech/reverse-downloads`` and will run cURL to download everything to ``/mnt/downloads/schedule``. ``DOWNLOADS_DIR`` might be omitted to download files to current directory. ``CONFIG_DIR`` might also be omitted if you're fine with using ``/etc/leech`` by default. You might also want to use ``sbin/leech-match-test`` to test if expressions in your downloads configuration match filenames you need: [how to]. [how to]: #markdown-header-test-download-filters-configuration ## RECIPES Recipe is a script leech calling to download a single file. There are currently the following recipes available: * ``leech-default`` - default, will just download files to ``$DOWNLOADS_DIR`` * ``leech-transmission`` - Transmission specific recipe, will add URLs directly to Transmission instead of downloading to disk. Might be used to workaround some issues (e.g. inotify is not compiled into your kernel and Transmission crashes because of this) * ``leech-defmission`` - download file with ``leech-default`` with cookies authentication supported, then add downloaded file to Transmission using ``leech-transmission`` * ``leech-aria2`` - aria2 specific recipe See ``$CONFIG_DIR/default`` for examples of using recipes. If you need to write your own recipe take a look at any of the recipes, they are pretty straightforward. ### cooking with your own recipe leech will set the following environment variables for the each call of a recipe: * ``$LEECH_URL`` - file URL extracted from feed * ``$LEECH_DOWNLOADS_DIR`` - ``$DOWNLOADS_DIR`` originally passed to leech or set by defaults * ``$LEECH_TIMEOUT`` - URL processing timeout as set in configuration * ``$LEECH_FEED_URL`` - feed URL ``$LEECH_URL`` originates from, as * ``$LEECH_TARGET_DIR`` - download directory override for external software (e.g. aria2) One recipe call for each URL. Recipe should exit 0 on success and exit with error code otherwise. Exit code will be printed to screen as in "Downloading: ... Failed: 9 ()". stderr (if any) will be printed in brackets as in "Downloading: ... Failed: 1 (can't connect to host 192.168.1.1:9091". ## KNOWN ISSUES * OpenWrt's leech only supports Unicode encodings - see TROUBLESHOOTING for workaround * It only supports [RFC822][] dates in RSS - see TROUBLESHOOTING for workaround [RFC822]: http://www.ietf.org/rfc/rfc0822.txt ## HOWTO ### test download filters configuration ``leech-match-test`` can check if matching patterns in your ``$CONFIG_DIR`` are suitable for files you want do downloads. Note that this tool **won't** actually download feeds from the Net, but only test that filename you provided in command line does match patterns in ``wild-downloads`` or ``downloads``. It is this way because RSS feed might not have files you want at that moment - feed could be truncated to a week or so, but you might still want to check if file would be downloaded if it appears in feed, right? leech-match-test "stuff.iso" Or, if you're linking this filter to the feed from example.com: leech-match-test "stuff.iso" "example.com" ### apply filters to specific feed only You need to put feed's **host** in front of your filter expression. For instance: example.com i386 in ``$CONFIG_DIR/wild-downloads``, or example.com.*i386 in ``$CONFIG_DIR/downloads`` > **Don't put entire URL with ampersands (&) and everything else - this is unsupported and won't work**. How this works: all filters are matched not only against filenames, but against feed URL too. URL is always on the left, so you need to put host on the same side of your filter expression, otherwise it might not work. ### match everything in the specific feed example.com This will match anything that has http://example.com attached to it. ### match everything from every feed Put the following into ``downloads``: .* This is usual wildcard you could use in regex-like ``downloads`` ### reverse-matching filters Put filters for files you want to avoid downloading under ``$CONFIG_DIR/reverse-downloads``. After leech find list of files matching ``$CONFIG_DIR/wild-downloads`` (and ``$CONFIG_DIR/downloads``), it will also apply reverse-matching filters to this list and exclude unwanted files. ### HTTP cookie authentication You might pass ``$COOKIE`` enviroment variable to leech. COOKIE="pw=mysecret;" /usr/sbin/leech COOKIE="/home/alex/cookie.txt" /usr/sbin/leech Internally leech uses cURL, thus cookies string/file should be compatible with cURL. cURL understands ``key=value;`` cookies format and Netscape HTTP Cookie File. cURL man page: http://curl.haxx.se/docs/manpage.html For cookies authentication in Transmission, set ``$DOWNLOAD_RECIPE`` to ``leech-defmission``, then use ``$COOKIE`` enviroment variable as usual. ### override download directory in aria2 or Transmission You can set ``$TARGET_DIR`` (in leech 1.6) variable either in environment, e.g. TARGET_DIR="/a/b/c/" /usr/sbin/leech or uncomment it in ``leech/default`` to set for all leech invocations. You also need to use a corresponding recipe which would set required flags to override downloads directory. Note though that if you set this variable in both: environment and config, then config takes priority. ## WHATS INSIDE * ``config/default`` - main configuration file * ``config/foods`` - feeds file * ``config/wild-downloads`` - rules for files downloading (simplified) * ``config/downloads`` - rules for files downloading (using regular expressions) * ``config/reverse-downloads`` - reverse-matching rules for files downloading (regexes) * ``sbin/leech`` - main script * ``sbin/leech-match-test`` - matching tool * ``sbin/leech-default`` - default download recipe * ``sbin/leech-transmission`` - Transmission specific recipe * ``sbin/leech-defmission`` - default meets transmission * ``sbin/leech-aria2`` - aria2 recipe * ``sbin/leech-config`` - internal configuration stuff * ``sbin/leech-wild-magic`` - a little bit of magic * ``sbin/rfc822tounix`` - RFC 822 to Unix-time convertion utility * ``share/leech/leech.xsl`` - XSL transformation (preprocessing stuff) ## INSTALL AS USER It should work out of the box if you have xsltproc, curl and mktemp installed. * edit ``config/foods`` and add RSS feeds * edit ``config/wild-downloads`` and add DL rules Now you should be able to run ``CONFIG_DIR=config ./sbin/leech`` and see it downloading feeds and files (if any, to current directory). * ``crontab -e`` and add cron job as described above, with correct paths to ``CONFIG_DIR``, ``DOWNLOADS_DIR`` and correct path to main script. ## INSTALL AS SUPERUSER Check "Downloads" section, there should be package(s) you need. In case they're not there, please email me about this problem ([aleksey.tulinov@gmail.com][]). Installation process does everything you need for normal use (except cron and downloads configuration). If you want to check whether it's running correctly: * ``leech`` will show you warning and force leech to run Configuration files are under /etc/leech. * edit ``/etc/leech/foods`` and add RSS feeds * edit ``/etc/leech/wild-downloads`` and add DL rules * ``crontab -e`` and add cron job as described above. * (optional) don't forget to enable cron (if it's not): ``/etc/init.d/cron enable`` for OpenWRT ## TROUBLESHOOTING If you think something is wrong, or just want to make sure if everything is OK, you could always run leech in manual mode and observe its output. See above, how to do it. ### it works from command line, but doesn't work in cron You probably didn't put full path to script as in ``/usr/sbin/leech``. Otherwise, you can see what happens if you write a log file, as in */30 * * * * CONFIG_DIR=/etc/leech DOWNLOADS_DIR=/mnt/schedule /usr/sbin/leech >>/mnt/leech.log 2>&1 Note that i've redirected stdout (``>>/mnt/leech.log``) and stderr (``2>&1``) and each time leech runs it will print all output to ``/mnt/leech.log``. If you did that and see that leech is complaining about some command not being found (e.g. ``line 209: xsltproc: not found``) but this program is definitely installed, take a look at this [SO question][] and answers to it. It has detailed description of the issue when some program(s) might be missing from ``$PATH`` when shell script is launched from cron, as well as several suggestions how to fix this. [SO question]: https://stackoverflow.com/questions/10129381/crontab-path-and-user ### leech prints error about parsing of feeds in some outdated encoding, e.g. cp1251 Try to re-encode your feed with web-service like [pipes.yahoo.com][] Latter will give you UTF-8 encoded feed. You might also e-mail webmaster of the feed and kindly remind him current date. There is a [link][] to United States Naval Observatory Time Service Department to prove that 1995 is over. [pipes.yahoo.com]: http://pipes.yahoo.com [link]: http://tycho.usno.navy.mil/simpletime.html ### leech doesnt download URLs with special characters URL can't contain any special characters by design. All characters outside of [very specific range][] should be [percent-encoded][] and apparently someone forgot to do this. Proper reference for e-mailing to webmaster: [RFC 3986][] [very specific range]: http://en.wikipedia.org/wiki/Url#List_of_allowed_URL_characters [percent-encoded]: http://en.wikipedia.org/wiki/Percent_encoding [RFC 3986]: http://tools.ietf.org/html/rfc3986 ### leech prints timestamp parsing error WARNING: RSS timestamp (2012-07-17 04:34:08) can't be parsed correctly RSS require timestamps to be in [RFC 822][] format. Having date in other format means your feed is broken and doesn't follow standard. I don't really want to support broken feeds, but leech will still work if you set ``$HISTORY`` value in ``default`` to value greater than oldest record in broken feed. For instance, oldest record in feed is two weeks old - set ``$HISTORY`` to 15 days. With ``$HISTORY`` set correctly, leech won't download files twice. You could also send webmaster this link to RSS [specification][]. Hope this helps. [specification]: http://validator.w3.org/feed/docs/rss2.html#ltpubdategtSubelementOfLtitemgt [RFC 822]: http://www.ietf.org/rfc/rfc0822.txt ### leech prints "Failed: N" on some (all) downloads I would print human-readable error messages, but alas those messages are nowhere to be found. You can consult cURL error [codes][] to see what is happening. Usually it means that server is down or there is typo somewhere in feed's URL. One special case is code 33, which seems to be a cURL bug, shouldn't appear often, just ignore it. Code 48 is reported to appear if you have curl/libcurl versions mismatch. [codes]: http://curl.haxx.se/libcurl/c/libcurl-errors.html ### leech doesnt download magnet links If you see the following error: Downloading: magnet?:... Failed: 6 You are probably using default (cURL) download recipe. Normal downloading software (which cURL is) doesn't understand magnet links, you need torrent client to download files linked by magnets. Therefore you either need to ignore magnet download errors, or switch to Transmission/aria2 recipe. Transmission-specific recipe will add magnet links directly to Transmission using ``transmission-remote`` utility, Transmission will do all the magic needed to magnetize link and download files. ## UNDER THE HOOD Script will create temporary files in corresponding directory (``/tmp`` apparently): * ``/tmp/leech.lunch.XXXXXX`` - contains downloaded feed * ``/tmp/leech.wild.XXXXXX`` - merged dl rules from ``wild-downloads`` and ``downloads`` * ``/tmp/leech.reverse.XXXXXX`` - reverse-matching rules It will also create ``.leech.db`` with list of already downloaded files in ``$PERSISTENCE`` or in ``$DOWNLOADS_DIR`` if ``$PERSISTENCE`` is not set (by default). To ensure you privacy, this file contains only MD5 sum of downloaded URLs and time when download happened. Database is periodically cleared, old (not needed) records are deleted. Files matched ``config/wild-downloads`` and ``config/downloads`` rules go directly to ``$DOWNLOADS_DIR``. In case of incomplete file retrieval, cURL will resume download. ## QUESTIONS? [aleksey.tulinov@gmail.com][] [aleksey.tulinov@gmail.com]: mailto:aleksey.tulinov@gmail.com