bibsonomy / BibSonomy / issues / #1726 - Publication data send through "KARLA's" send to bibsonomy / puma button is not corectly parsed — Bitbucket

Issue #1726 closed

Former user created an issue 2012-10-29

Reported by telekoma: When you search for a book e.g. with the ISBN "3-8273-7293-3" (Verteilte Systeme . Grundlagen und Paradigmen) and send it to puma / bibsonomy via the "senden an" button the field author has still braces in it and in many other fields unnecessary white spaces occur.

bzg.png

Comments (8)

Robert Jäschke
This issue is more complicated as it seems: Everything that is stored in BibSonomy is send through the BibTeXParser. The parser currently does not remove or normalize LaTeX formatting. Hence, you see those braces there. The reason for not removing this was that we wanted to keep everything as the users entered it. Probably this should be changed now. Another option would be to add a normalization step to the scrapers. This would however couse double-parsing.
- 2012-10-31T17:25:28+00:00
Robert Jäschke
The solution would be to add a TexDecode-Step inoto the AbstractPublicationController's handleScraper() method.

Therefore, we need a method that gets a BibTeX object and cleans all its fields. Maybe we have something like that already.

Another (probably better) idea would be to create another parseBibTeX(() method of the SimpleBibTeXParser that has a boolean attribute with which one can select to clean the BibTeX. That would nicely encapsulate the LaTeX stuff into the BibTeX parser.
- 2012-10-31T17:36:47+00:00
Former user Account Deleted
Commented by hks: Alternatively one could remove the term .replaceAll(CURLY_BRACKETS, "") from the TexDecode's Java Code and add lines like 00C4 {"A} to the resource file.
- 2012-11-09T02:29:35+00:00
Former user Account Deleted
Commented by hks: I forgot: this will enable TexDecode to handle a String containing the entire BibTeX file. Then the TexDecode-Step could be done by the scrapers without double-parsing.
- 2012-11-09T02:36:52+00:00
Robert Jäschke
I am not sure if that would work because it is probably not necessary to surround "A by {} and thus it would then dismiss these cases. If we would add both versions to the file, it would double its size and we would need to change TexDecode to use a LinkedHashMap instead of a TreeMap, because the order of application of the patterns then does matter. In addition, it is not clear, in which order regular expressions are matched (we would depend on that order).

The only option I see to solve the issue by modifying TexDecode is to omit the brace removal step and then for each single match check if there are braces around the match and if so, remove them. However, this sounds complicated.
- 2012-11-09T09:51:19+00:00
Daniel Zoller
- removed responsible
- edited description
- 2015-09-18T12:45:39+00:00
Daniel Zoller
- changed status to open
- 2016-08-16T21:26:00+00:00
Daniel Zoller
- edited description
- changed status to closed
to old
- 2021-09-24T14:41:48+00:00
Log in to comment

Assignee: –

Type: bug

Priority: minor

Status: closed

Component: –

Milestone: –

Version: –

Votes: 0

Watchers: 0

Jira: the preferred issue tracker for Bitbucket. Join the team!