Enrich extracted metadata using metadata from BibSonomy database

Issue #2840 new
Robert Jäschke created an issue

This is a subtask of issue #2836.

The metadata extracted by Grobid might be incomplete or erroneous. We could try to improve it using metadata stored in BibSonomy. This would require

  1. To search for each post using (parts of) the existing metadata.
  2. To choose among the results the most appropriate one (if step 1 has been done properly we can trust our search engine and use the first result).
  3. To merge the metadata of the two posts.

Step 3 is probably the most difficult step.

This could be done optionally for the posts that represent the bibliographic metadata of the citations but is necessary for the post that represents the uploaded PDF itself, since very likely only title and authors could be extracted. (This is also how we currently do it.)

I did not assign a component, since this is the step where we probably need a new component. If not, this will probably go into the webapp.

Comments (0)

  1. Log in to comment