1. Jacob Perkins
  2. scrapy

Commits

Show all
Author Commit Message Date Builds
Jacob Perkins
LogVisisted spider middleware and IgnoreVisisted downloader middleware
Daniel Graña
url_query_cleaner: do not append ? if query is empty
Daniel Graña
url_query_cleaner: add exclude and non-unique parameters support, also remove untested exception catching code and add missing tests
Daniel Graña
Automated merge with ssh://hg.scrapy.org/scrapy-0.8
Steven Almeroth
FormRequest.from_response doc fix. closes #155
Pablo Hoffman
Automated merge with http://hg.scrapy.org/scrapy-0.8/
Pablo Hoffman
added note about installing Zope.Interface in windows platforms
Daniel Graña
Automated merge with ssh://hg.scrapy.org/scrapy-0.8
Daniel Graña
Remove shpinx warning introduced by shorter title overline
Lucian Ursu
#154: Language fixes to the documentation
Pablo Hoffman
Added scrapy.utils.py26.json to use python2.6 json module when available, otherwise failback to simplejson module or scrapy.xlib.simplejson. This way we can always assume json and avoid conditional code.
Pablo Hoffman
fixed import
Pablo Hoffman
bugfix for python < 2.6
Pablo Hoffman
moved copytree() function from utils.python to utils.py26
Pablo Hoffman
added scrapy.service and scrapy.tac for running from twistd
Daniel Graña
images: avoid signing images based on spider name or request hostname, use request.meta instead
Daniel Graña
update ENCODING_ALIASES setting default value in settings documentation topic
Daniel Graña
Minimize effect of http://bugs.python.org/issue8271 on TextResponses by changing str.decode errors policy by custom `replace` alike error handler
Pablo Hoffman
added a couple additional TwistedPluginSpiderManager tests
daniel
add missing dropin.cache file required by default spidermanager tests
Daniel Graña
gb2312 and gbk encodings was superseded by gb18030
Pablo Hoffman
made Spider name required again (do not default)
Daniel Graña
SEP-012: bugfix backward compatibility of Spider.domain_name and Spider.extra_domain_names
Pablo Hoffman
use a default name for spiders constructed without names
Pablo Hoffman
Added support for passing generic arguments to spider constructors (refs #152), extended Spider tests, added unittests for TwistedPluginSpiderManager
Pablo Hoffman
Automated merge with http://hg.scrapy.org/scrapy-0.8
Pablo Hoffman
added missing default values to file xporter doc
Rolando Espinoza
SEP12 implementation * Rename BaseSpider.domain_name to BaseSpider.name This patch implements the domain_name to name change in BaseSpider class and change all spider instantiations to use the new attribute. * Add allowed_domains to spider This patch implements the merging of spider.domain_name and spider.extra_domain_names in spider.allowed_domains for offsite checking purposes. Note t…
Rolando Espinoza
cleanup and refactor of parse & fetch commands * removed scrapy.utils.fetch * each command schedule requests and start scrapy engine * fetch command instance BaseSpider if given url does not match any spider or match more than one * parse command schedule url if one spider matches * parse and fetch doesn't support multiple urls as parameter * force spider behavior --spider moved from BaseCommand to only commands: fetch, parse, crawl
Rolando Espinoza
spidermanager refactoring * Implements find/create method in Spider Manager API, removed fromdomain and fromurl This method is now in charge of spider resolution, it must return spider object from its argument or raise KeyError if no spider is found. This method obsoletes from_domain and from_url methods. The default implementation of resolve only searches against spider.name, it won't use spider.a…
  1. Prev
  2. Next