Source

csxj-crawler / README.rst

Full commit

Uses python 2.6

3rd party Dependencies

  • scrapy's HtmlXPathSelector : because any BeautifulSoup-based app is an half-assed implementation of XPath anyway.
  • BeautifulSoup : Quickly navigate data from html pages (legacy, will probably be replaced by scrapy at some point)
  • chardet : useful to fight encoding issues

Licence

Still thinking about it. Until then, this is not public domain and I retain full copyright.