WebCrawl is a web site spiderer I wrote many years ago after being frustrated by httrack, which is supposed to be the best. Httrack was taking way too much CPU time in the GUI mode, didn't provide good enough feedback in the console mode, and didn't provide enough control overall. So I wrote a web crawler backend and a console frontend for it. It can be used much more easily than httrack. It has a number of features that httrack doesn't, such as rewriting URLs and page content on the fly with regular expressions or arbitrary processing code, which enables a few interesting and important scenarios. I use it to mirror sites for offline reading, mainly so that I can ensure that I still have a copy even if the site gets taken down.

Maybe httrack has improved since I wrote this. I donno.