bibsonomy / BibSonomy / issues / #2836 - When I upload a PDF, the bibliographic metadata shall be extracted such that I can post it. — Bitbucket

Issue #2836 open

Robert Jäschke created an issue 2018-08-16

What we can extract

There are types of bibliographic metadata:

The metadata of the article in the PDF itself (e.g., title, authors ... it's probably difficult to extract much more).
The bibliographic metadata of the referenced articles.

How we extract it

Using Grobid and the grobid-pdf branch, where the Grobid class implements the getBibTeX method.

Subtasks

All tasks are collected in the milestone PDFextraction.

Comments (2)

Robert Jäschke reporter
I propose the following implementation order:
1. issue #2841: Extend PostPublicationController to support PDF extraction
2. issue #2842: proper error handling
3. issue #2837: PDF upload dialoge for PDF extraction
4. issue #2838: Improve model conversion from Grobid to BibSonomy
5. issue #2839: title + author extraction from PDFs using Grobid
6. issue #2843: Extend batch edit view for Grobid PDF extraction
7. issue #2840: Enrich extracted metadata using metadata from BibSonomy database
After the first step extraction should basically work and could be (beta)released. Each subsequent step would improve the handling.
- 2018-08-16T15:18:05+00:00
Daniel Zoller
- changed status to open
- 2019-11-24T14:39:42+00:00
Log in to comment

Assignee: –

Type: enhancement

Priority: major

Status: open

Component: webapp

Milestone: PDFextraction

Version: 3.8.10

Votes: 0

Watchers: 1

Jira: the preferred issue tracker for Bitbucket. Join the team!