- changed status to open
scraping issues for computer.org
Scraping the URL http://www.computer.org/csdl/mags/co/2001/02/r2026.pdf results in a weird error message that shows the content of the page instead of the URL.
Please try to find out, why/how this happens.
Comments (6)
-
reporter -
i try to solve that as a JUnit, but it seems that the website don't export a valid BibTex or our Scraper must be improved error: scraped BibTex not valid
-
reporter There's the
IEEEComputerSocietyScraper
that should already handle this page. Why does it not work? -
The getDownloadURL method in the Scraper replace the -.* to -reference.bib. That makes problem when we want to scrape data from this URL http://www.computer.org/csdl/mags/co/2001/02/r2026.pdf because it has not the suffix -. so wee need to expand the method getDownloadURL to handel the URL, that have the suffix .pdf
-
reporter Thanks. Please see my comments at the corresponding commit.
-
- changed status to resolved
resolved wih new test data for the url with suffix .pdf
- Log in to comment