Commits

Author Commit Message Labels Comments Date
Frederic De Groef
merge
Frederic De Groef
[La Libre] recursively sanitize text content, rebuilds complete urls instead of yielding just the path
Frederic De Groef
change locale only on unix-ish systems
Frederic De Groef
[la libre] data extraction should be done
Frederic De Groef
moved word_count() to utils module
Frederic De Groef
[la libre] fetch news item links on the frontpage
Frederic De Groef
bitbucket rst parser looks borken, again
Frederic De Groef
bitbucket rst parser looks borken
Frederic De Groef
removed todo.rst
Frederic De Groef
everything in readme, in fact
Frederic De Groef
added todo file
Frederic De Groef
[le soir] removed useless deps
Frederic De Groef
added readme file
Frederic De Groef
[le soir] extracting intro paragraph from story header. Filtering out the random useless <span>
Frederic De Groef
[le soir] extract author name from header
Frederic De Groef
improved report
Frederic De Groef
extract and sanitize the text content from an article
Frederic De Groef
remove the __repr__() because of weird encoding issues
Frederic De Groef
temporarily deactivated the htmlentities conversion
Frederic De Groef
using real datetime objects. Disabled article text retrieval for the moment
Frederic De Groef
reflecting changes in sample data datastruct
Frederic De Groef
cleaned up the report in sample usage
Frederic De Groef
clarified the separation between 'main story' and 'other story'
Frederic De Groef
fetch data for articles lised on the frontpage. Filter out internal blogs
Frederic De Groef
cosmetic changes
Frederic De Groef
detect two_columns stories in 'Le Soir' frontpage
Frederic De Groef
added sample data with missing links containers
Frederic De Groef
check if we can actually fin the links containers before digging in
Frederic De Groef
get article data from rss feed items
Frederic De Groef
hgignore
  1. Prev
  2. Next