Commits

Show all
Author Commit Message Labels Comments Date
Frederic De Groef
updated version
Tags
v0.4.99-20120311-dev
Frederic De Groef
updated frontage scrapper for lavenir.net
Frederic De Groef
updated version
Frederic De Groef
better stats about last update. Commented out the deprecated functions.
Frederic De Groef
using all the frontpage scrappers
Frederic De Groef
added frontpage scrapper for 7sur7
Frederic De Groef
updated readme and version
Frederic De Groef
added frontpage items extractor for levif.be
Frederic De Groef
updated url classification unittest
Frederic De Groef
return empty list for blogposts
Frederic De Groef
added frontpage items extractor for rtbfinfo
Frederic De Groef
extracted locale setup to utils, should be used everywhere.
Frederic De Groef
updated imports
Frederic De Groef
process new errors
Frederic De Groef
new date extraction
Frederic De Groef
safeguard: don't get article data from database if there are no day available
Frederic De Groef
bumped version
Frederic De Groef
fixed text cleanup in dhnet, so we keep paragraphs
Frederic De Groef
return blogpost list
Frederic De Groef
article retrieval for sudinfo temporarily deactivated
Frederic De Groef
started sudinfo revamping using scrapy
Frederic De Groef
Don't classify an empty url. Added a unittest.
Frederic De Groef
new frontpage scrapper
Frederic De Groef
reorganized imports
Frederic De Groef
started sudinfo from the remnants of sudpresse. Moving to scrapy.
Frederic De Groef
removed sudpresse from crawling system
Frederic De Groef
added l'avenir into the crawler system
Frederic De Groef
photosets detection, embedded objects (frames) detection, better handling of links
Frederic De Groef
handle pages with photoalbums
Frederic De Groef
scrapper for www.lavenir.net: first version, testing needed.
  1. Prev
  2. Next