- edited description
Encoding problems with DLibScraper
Issue #25
resolved
When scraping from D-Lib Magazine, the scraper should decode HTML entities. E.g., Mönnich, Michael
should not appear in the author field but instead Mönnich, Michael
.
Please implement the decoding using StringEscapeUtils.unescapeHtml()
. You can look at other scrapers, how they do it. Just open the call hierarchy for that method.
Also add a JUnit test for the URL http://www.dlib.org/dlib/may08/monnich/05monnich.html.
Comments (4)
-
reporter -
reporter scraper seems to be fixed now, did you also add a JUnit test?
-
Account Deleted Issue
#25is fixed. JUnit test is added. -
Account Deleted - changed status to resolved
JUnit test is added.
- Log in to comment