scraping issues for computer.org

Issue #2472 resolved
Robert Jäschke created an issue

Scraping the URL http://www.computer.org/csdl/mags/co/2001/02/r2026.pdf results in a weird error message that shows the content of the page instead of the URL.

Please try to find out, why/how this happens.

Comments (6)

  1. Mohammed Abed

    i try to solve that as a JUnit, but it seems that the website don't export a valid BibTex or our Scraper must be improved error: scraped BibTex not valid

  2. Robert Jäschke reporter

    There's the IEEEComputerSocietyScraper that should already handle this page. Why does it not work?

  3. Mohammed Abed

    The getDownloadURL method in the Scraper replace the -.* to -reference.bib. That makes problem when we want to scrape data from this URL http://www.computer.org/csdl/mags/co/2001/02/r2026.pdf because it has not the suffix -. so wee need to expand the method getDownloadURL to handel the URL, that have the suffix .pdf

    https://bitbucket.org/bibsonomy/bibsonomy/commits/1b4ca543fce5db205e3bd868d51c087d05bb075c?at=bibsonomy-scraper#chg-bibsonomy-scraper/src/main/java/org/bibsonomy/scraper/url/kde/ieee/IEEEComputerSocietyScraper.java

  4. Log in to comment