HTTPS SSH

scrapy-warc

A stand-alone spider for Scrapy that saves the downloaded in Warc archives.

Usage:

./scrapywarc.py seeds-file output-dir