Source

csxj-crawler / csxj / datasources / lavenir.py

Author Commit Message Labels Comments Date
Frederic De Groef
[lavenir] updated date parser to match dates with no specified time
Frederic De Groef
[lavenir] added detection of 'ghost links'
Frederic De Groef
removed useless print
Frederic De Groef
Fixed the article/blogpost url classifier.
Tags
v0.4.99-20120626-dev
Frederic De Groef
removed useless print
Frederic De Groef
lists not sets, dammit
Frederic De Groef
updated frontage parsing for lavenir.net, to reflect cms changes
Frederic De Groef
fixed get_frontpage_toc()
Frederic De Groef
lavenir.net back in the download queue
Frederic De Groef
updated frontage scrapper for lavenir.net
Frederic De Groef
new date extraction
Frederic De Groef
return blogpost list
Frederic De Groef
reorganized imports
Frederic De Groef
photosets detection, embedded objects (frames) detection, better handling of links
Frederic De Groef
handle pages with photoalbums
Frederic De Groef
scrapper for www.lavenir.net: first version, testing needed.
Frederic De Groef
db-wide funcs