Scrapers grabbing PDFs

Daniel Zoller

changed status to wontfix

legal issues :(

2015-03-11T12:58:52+00:00

Sebastian Böttger reporter

changed status to open

There are two possibilities to implement a PDF grabbing scraper:

let grab the PDF by server-side java scraper (not possible due to legal issues)
Therefore, grab the PDF on client-side (browser) using JavaScript, afterwards send the received PDF to the server via AJAX.

2015-03-11T15:16:18+00:00

Robert Jäschke

marked as task

2015-06-10T11:43:53+00:00

Oliver Kopp

JabRef has FullTextFetchers. See https://github.com/JabRef/jabref/blob/c6aa7dac3c76cbbdd5142cb43e084ea32e89ec47/src/main/java/net/sf/jabref/logic/importer/FulltextFetcher.java Maybe, they can be included in BibSonomy., too.

2016-07-27T05:48:26+00:00

Robert Jäschke

They work nicely for JabRef, since JabRef runs on the user's computer. However, in BibSonomy the JabRef would need to be run on the server (since its Java). This would not be possible due to legal reasons.

2016-07-27T07:04:45+00:00

Oliver Kopp

I now that browser code should be TypeScript or JavaScript. The FullText fetchers are not having hundreds of lines of code. For instance, the ACS fetcher issues a single GET request to return the URL of the full text pdf. See https://github.com/JabRef/jabref/blob/c6aa7dac3c76cbbdd5142cb43e084ea32e89ec47/src/main/java/net/sf/jabref/logic/importer/fetcher/ACS.java

I assume, that the resulting two AJAX requests can be simple be made within the browser side of bibsonomy, too.

2016-07-27T08:08:39+00:00

Comments (6)