Name parsing gets confused when a part of the name starts with a character in braces

Issue #98 new
Jellby created an issue

Consider these examples:

>>> from pybtex.database import Person
>>> print Person('Charles Darwin')
Person(u'Darwin, Charles')

OK. But what if I want the abbreviation to be "Ch. Darwin" instead of "C. Darwin"?

>>> print Person('{Ch}arles Darwin')
Person(u'{Ch}arles Darwin')

Too bad, "{Ch}arles" in interpreted as a last name". You can say I'm trying to game the system, but what other option is there with special characters?

>>> print Person('{\AA}ke Jonsson')
Person(u'Jonsson, {\\AA}ke')
>>> print Person(u'{Å}ke Jonsson')
Person(u'{\xc5}ke Jonsson')

It works fine with TeX-encoded characters, but these are converted early when processing a bib file and the second form is actually what is seen by the time the names in a .bib file are split. At least as far as I can see when using the sphinxcontrib-bibtex plugin, but in any case, in my opinion, for the purpose of name parsing:

{Ch} should be considered as an uppercase letter
{Å} should be considered as an uppercase letter
{\relax D} could be considered as a lowercase letter (in case I want a "De" particle to be parsed as a "von" part).

May I suggest the following change in the is_von_name function in database/__init__.py, which seems to do what I want?:

                    if brace_level == 0 and char.isalpha():
                        return char.islower()
                    elif brace_level == 1 and char.startswith('\\'):
                        return special_char_islower(char)
                    elif brace_level == 1 and char.isalpha():
                        return char.islower()

(the last two lines are my addition)

Comments (2)

  1. Matthis Thorade

    Possibly related: https://github.com/mcmtroffaes/sphinxcontrib-bibtex/issues/105

    Consider the following bib entry, it was giving me problems when using sphinx + sphinxcontrib-bibtex:

    @Article{VsirokyEtAl2011,
      Title = {Experimental analysis of Sphinx abbreviations},
      Author = {O'Donnelldagg, James and M{\"u}llermeister, Dirk and {\v{S}}irok\'{y}, Jan and M{\o}ller-P{\'e}dersen, Bengt},
      Journal = {Applied Automata},
      Doi = {10.1000/182},
      Volume = {182},
      Number = {7},
      Pages = {3079--3087},
      Year = {2011},
      Publisher = {Doi System}
    }
    
  2. Matthis Thorade

    Note: It seems braces anywhere in the name are problematic, not just if a name starts with a brace.

  3. Log in to comment