Scrapers grabbing PDFs

Issue #2456 open
Sebastian Böttger created an issue

Extend scraper to get PDFs automatically.

Comments (6)

  1. Sebastian Böttger reporter
    • changed status to open

    There are two possibilities to implement a PDF grabbing scraper:

    • let grab the PDF by server-side java scraper (not possible due to legal issues)
    • Therefore, grab the PDF on client-side (browser) using JavaScript, afterwards send the received PDF to the server via AJAX.
  2. Robert Jäschke

    They work nicely for JabRef, since JabRef runs on the user's computer. However, in BibSonomy the JabRef would need to be run on the server (since its Java). This would not be possible due to legal reasons.

  3. Oliver Kopp

    I now that browser code should be TypeScript or JavaScript. The FullText fetchers are not having hundreds of lines of code. For instance, the ACS fetcher issues a single GET request to return the URL of the full text pdf. See https://github.com/JabRef/jabref/blob/c6aa7dac3c76cbbdd5142cb43e084ea32e89ec47/src/main/java/net/sf/jabref/logic/importer/fetcher/ACS.java

    I assume, that the resulting two AJAX requests can be simple be made within the browser side of bibsonomy, too.

  4. Log in to comment