1. Frederic De Groef
  2. csxj-crawler


csxj-crawler / README.rst

Uses python 2.6

3rd party Dependencies

  • scrapy's HtmlXPathSelector : because any BeautifulSoup-based app is an half-assed implementation of XPath anyway.
  • BeautifulSoup : Quickly navigate data from html pages (legacy, will probably be replaced by scrapy at some point)
  • chardet : useful to fight encoding issues


Still thinking about it. Until then, this is not public domain and I retain full copyright.