When I upload a PDF, the bibliographic metadata shall be extracted such that I can post it.

Create issue
Issue #2836 open
Robert Jäschke created an issue

What we can extract

There are types of bibliographic metadata:

  1. The metadata of the article in the PDF itself (e.g., title, authors ... it's probably difficult to extract much more).
  2. The bibliographic metadata of the referenced articles.

How we extract it

Using Grobid and the grobid-pdf branch, where the Grobid class implements the getBibTeX method.


All tasks are collected in the milestone PDFextraction.

Comments (2)

  1. Robert Jäschke reporter

    I propose the following implementation order:

    1. issue #2841: Extend PostPublicationController to support PDF extraction
    2. issue #2842: proper error handling
    3. issue #2837: PDF upload dialoge for PDF extraction
    4. issue #2838: Improve model conversion from Grobid to BibSonomy
    5. issue #2839: title + author extraction from PDFs using Grobid
    6. issue #2843: Extend batch edit view for Grobid PDF extraction
    7. issue #2840: Enrich extracted metadata using metadata from BibSonomy database

    After the first step extraction should basically work and could be (beta)released. Each subsequent step would improve the handling.

  2. Log in to comment