bibsonomy / BibSonomy / issues / #25 - Encoding problems with DLibScraper — Bitbucket

Issue #25 resolved

Robert Jäschke created an issue 2014-01-15

When scraping from D-Lib Magazine, the scraper should decode HTML entities. E.g., Mönnich, Michael should not appear in the author field but instead Mönnich, Michael.

Please implement the decoding using StringEscapeUtils.unescapeHtml(). You can look at other scrapers, how they do it. Just open the call hierarchy for that method.

Also add a JUnit test for the URL http://www.dlib.org/dlib/may08/monnich/05monnich.html.

Comments (4)

Robert Jäschke reporter
- edited description
- 2014-01-15T10:49:26+00:00
Robert Jäschke reporter
scraper seems to be fixed now, did you also add a JUnit test?
- 2014-02-07T14:34:46+00:00
Former user Account Deleted
Issue ~~#25~~ is fixed. JUnit test is added.
- 2014-02-12T11:24:43+00:00
Former user Account Deleted
- changed status to resolved
JUnit test is added.
- 2014-02-12T12:31:03+00:00
Log in to comment

Assignee: –

Type: bug

Priority: minor

Status: resolved

Component: scraper

Milestone: –

Version: 2.0.42

Votes: 0

Watchers: 0

Jira: the preferred issue tracker for Bitbucket. Join the team!