Overview

Overview

This is a fetcher from imageboards

Current

  • danbooru.donmai.us
  • gelbooru.com
  • konachan.com

Planned

  • idol.sankakucomplex.com
  • behoimi.org
  • chan.sankakucomplex.com
  • e621.net
  • genso.ws
  • hijiribe.donmai.us
  • ichijou.org
  • nekobooru.net
  • r34.booru.org
  • safebooru.org
  • sonohara.donmai.us
  • tentaclerape.net
  • yande.re
  • zerochan.net

Install

sudo pip install scrapy

or use virtualenv:

virtualenv .env --system-site-packages
.env/bin/pip install scrapy --upgrade

Install Tor and Privoxy

In debian/ubuntu:

sudo apt-get install tor privoxy

Configure Tor.

Uncomment or add in /etc/tor/torrc:

ControlPort 9051

Configure Privoxy.

Uncomment or add in /etc/privoxy/config:

forward-socks5   /               127.0.0.1:9050 .

Restart services:

sudo service tor restart
sudo service privoxy restart

Configuration

in file fetchersettings.py change variable:

HTTPCACHE_DIR = 'path/to/http_cache'
IMAGES_STORE = 'path/to/images_store'

Usage

Show available spiders:

scrapy list

Run grabbing all picture from danbooru:

scrapy crawl danbooru

Run grabbing by tag:

scrapy crawl danbooru -a tag='shimakaze_(kantai_collection)'
Run grabbing and save images another folder::
scrapy crawl danbooru -s IMAGES_STORE='new/path'