1. Juan Manuel Caicedo Carvajal
  2. scrapy-warc

Overview

HTTPS SSH

scrapy-warc

A stand-alone spider for Scrapy that saves the downloaded in Warc archives.

Usage:

./scrapywarc.py seeds-file output-dir