Publication data send through "KARLA's" send to bibsonomy / puma button is not corectly parsed

Issue #1726 closed
Former user created an issue

Reported by telekoma: When you search for a book e.g. with the ISBN "3-8273-7293-3" (Verteilte Systeme . Grundlagen und Paradigmen) and send it to puma / bibsonomy via the "senden an" button the field author has still braces in it and in many other fields unnecessary white spaces occur.

Comments (8)

  1. Robert Jäschke

    This issue is more complicated as it seems: Everything that is stored in BibSonomy is send through the BibTeXParser. The parser currently does not remove or normalize LaTeX formatting. Hence, you see those braces there. The reason for not removing this was that we wanted to keep everything as the users entered it. Probably this should be changed now. Another option would be to add a normalization step to the scrapers. This would however couse double-parsing.

  2. Robert Jäschke

    The solution would be to add a TexDecode-Step inoto the AbstractPublicationController's handleScraper() method.

    Therefore, we need a method that gets a BibTeX object and cleans all its fields. Maybe we have something like that already.

    Another (probably better) idea would be to create another parseBibTeX(() method of the SimpleBibTeXParser that has a boolean attribute with which one can select to clean the BibTeX. That would nicely encapsulate the LaTeX stuff into the BibTeX parser.

  3. Former user Account Deleted

    Commented by hks: Alternatively one could remove the term .replaceAll(CURLY_BRACKETS, "") from the TexDecode's Java Code and add lines like 00C4 {"A} to the resource file.

  4. Former user Account Deleted

    Commented by hks: I forgot: this will enable TexDecode to handle a String containing the entire BibTeX file. Then the TexDecode-Step could be done by the scrapers without double-parsing.

  5. Robert Jäschke

    I am not sure if that would work because it is probably not necessary to surround "A by {} and thus it would then dismiss these cases. If we would add both versions to the file, it would double its size and we would need to change TexDecode to use a LinkedHashMap instead of a TreeMap, because the order of application of the patterns then does matter. In addition, it is not clear, in which order regular expressions are matched (we would depend on that order).

    The only option I see to solve the issue by modifying TexDecode is to omit the brace removal step and then for each single match check if there are braces around the match and if so, remove them. However, this sounds complicated.

  6. Log in to comment