Commits

Author Commit Message Labels Comments Date
Juliette De Maeyer
[lesoir_new] added embedded media extraction for kewego player (11h02)
Juliette De Maeyer
[7sur7] tagged url naming conventions: kept the "sidebar box" info in tagging article tags (let's tag more tags!)
Juliette De Maeyer
[lesoir_new] created new parser for recent lesoir (new CMS launched in October)
Juliette De Maeyer
[lesoir] added kewego video extraction in the function that extracts embedded media
Juliette De Maeyer
[lesoir] fixed some typos
Juliette De Maeyer
[lesoir] added 'sidebar box' tag to links extracted from the sidebar
Juliette De Maeyer
[lesoir] added in-text link extraction
Juliette De Maeyer
[lesoir] created function to extract embedded media: generic iframe + storify
Frederic De Groef
fixed incorrect import path
Frederic De Groef
use source url for title when extracting info from embedded iframes
Frederic De Groef
[dhnet] extracted embedded media detectors to ipm_utils.py, to reuse with lalibre (which uses the same cms)
Frederic De Groef
Merge
Juliette De Maeyer
allegedly fixed all errors
Frederic De Groef
Merge
Frederic De Groef
[lalibre] pep8 stuff
Juliette De Maeyer
Merge
Juliette De Maeyer
[septsursept] added the function that parses twitter widgets (twitter_utils)
Frederic De Groef
Merge
Frederic De Groef
[hgignore] ignore merge tmp files
Frederic De Groef
[lalibre] reorganised sample data func
Frederic De Groef
[sudinfo] minor rename
Frederic De Groef
[lavenir] renamed function for sample data testing
Juliette De Maeyer
deleted useless comment
Juliette De Maeyer
Merge
Juliette De Maeyer
fixed first error from reprocessing: url in embedded video (kewego) player
Juliette De Maeyer
Merge
Juliette De Maeyer
end of work session
Juliette De Maeyer
removed the tag "video" from the embedded media extraction : did not tag all video, it's an iframe with a youtube link anyway
Juliette De Maeyer
improved the extraction of embedded media to detect tweets, twitter widgets and some videos
Juliette De Maeyer
embedded media detection : finds the two boxes (art_aside, bottom_box) where there could be embedded media
  1. Prev
  2. Next