Name parsing gets confused when a part of the name starts with a character in braces

Consider these examples:

>>> from pybtex.database import Person
>>> print Person('Charles Darwin')
Person(u'Darwin, Charles')

OK. But what if I want the abbreviation to be "Ch. Darwin" instead of "C. Darwin"?

>>> print Person('{Ch}arles Darwin')
Person(u'{Ch}arles Darwin')

Too bad, "{Ch}arles" in interpreted as a last name". You can say I'm trying to game the system, but what other option is there with special characters?

>>> print Person('{\AA}ke Jonsson')
Person(u'Jonsson, {\\AA}ke')
>>> print Person(u'{Å}ke Jonsson')
Person(u'{\xc5}ke Jonsson')

It works fine with TeX-encoded characters, but these are converted early when processing a bib file and the second form is actually what is seen by the time the names in a .bib file are split. At least as far as I can see when using the sphinxcontrib-bibtex plugin, but in any case, in my opinion, for the purpose of name parsing:

{Ch} should be considered as an uppercase letter
{Å} should be considered as an uppercase letter
{\relax D} could be considered as a lowercase letter (in case I want a "De" particle to be parsed as a "von" part).

May I suggest the following change in the is_von_name function in database/__init__.py, which seems to do what I want?:

                    if brace_level == 0 and char.isalpha():
                        return char.islower()
                    elif brace_level == 1 and char.startswith('\\'):
                        return special_char_islower(char)
                    elif brace_level == 1 and char.isalpha():
                        return char.islower()

(the last two lines are my addition)

Comments (2)