Author Commit Message Labels Comments Date
Juliette De Maeyer avatarJuliette De Maeyer
updated the extract_text_content_and_links function
Juliette De Maeyer avatarJuliette De Maeyer
found a new example of article with an embedded tweet that is not detected
Juliette De Maeyer avatarJuliette De Maeyer
fixed the extract°links_from_sidebar function to actually extract links ("lire aussi" links were missing, tags were OK)
Juliette De Maeyer avatarJuliette De Maeyer
added 7s7 parser and made it work (somewhat) properly
Frederic De Groef avatarFrederic De Groef
[dhnet] enhanced embedded media detection (esp. for scripts)
Frederic De Groef avatarFrederic De Groef
reprocess entire database, not just .5%
Frederic De Groef avatarFrederic De Groef
bumped version
Tags
v0.4.99-20121124-dev
Frederic De Groef avatarFrederic De Groef
[dhnet] be more defensive for embedded media detection. handles twitter widgets
Frederic De Groef avatarFrederic De Groef
added helper module to extract info from live embedded twitter widgets
Frederic De Groef avatarFrederic De Groef
whole database reprocess: added error logging
Frederic De Groef avatarFrederic De Groef
reorganised sample data
Frederic De Groef avatarFrederic De Groef
bumped version
Tags
v0.4.99-20121021-dev
Frederic De Groef avatarFrederic De Groef
added a utility to reprocess errors loaded from an error file
Frederic De Groef avatarFrederic De Groef
added a utility to reprocess all the raw html into a new database
Frederic De Groef avatarFrederic De Groef
[sudinfo] extract_article_data() supports url and file-like objects
Frederic De Groef avatarFrederic De Groef
[lalibre] extract_article_data() supports url and file-like objects
Frederic De Groef avatarFrederic De Groef
accessor to the reprocessed date/time for a batch
Frederic De Groef avatarFrederic De Groef
better output
Frederic De Groef avatarFrederic De Groef
save ALL the errors
Frederic De Groef avatarFrederic De Groef
double checking the written error count
Frederic De Groef avatarFrederic De Groef
save listed errors in a file instead of reprocessing on the fly
Frederic De Groef avatarFrederic De Groef
added reprocessing tentative (without saving) when listing errors
Frederic De Groef avatarFrederic De Groef
[lavenir] updated date parser to match dates with no specified time
Frederic De Groef avatarFrederic De Groef
don't assume we have all types of errors in the list
Frederic De Groef avatarFrederic De Groef
don't print the date if there are no errors
Frederic De Groef avatarFrederic De Groef
updated error listing tool
Frederic De Groef avatarFrederic De Groef
[sudinfo] detect all urls in paragraphs. Filter out potential blogposts on the frontpage.
Frederic De Groef avatarFrederic De Groef
[sudinfo] sometimes, embedded docs don't have a title. Go figure
Frederic De Groef avatarFrederic De Groef
added a ghost link detection. just because
Frederic De Groef avatarFrederic De Groef
bumped version
Tags
v0.4.99-20121002-dev
  1. Prev
  2. Next
Help
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.