Issues

Issue #1449 new

Non-Latin headings are not converted into proper anchor links

Konstantin Molchanov
created an issue

If a heading includes non-Latin characters, they are ignored.

If none of the characters in a heading is Latin, an anchor link like "id1" is created.

It gets really bad if you have 2 headings with the same Latin word and the rest in non-Latin. e.g. "Инструкция по Searchanise" and "Модуль Searchanise"; this way both anchor links will be built upon the same word Searchanise, which really kills it.

You can check a live example here: http://searchanise-supporters-guide.readthedocs.org/ru/latest/

Cyrillic anchor names are valid and should be used (see Wikipedia for example: http://ru.wikipedia.org/wiki/Pantera#.D0.92.D0.BB.D0.B8.D1.8F.D0.BD.D0.B8.D0.B5_.D0.B8_.D1.82.D0.B5.D0.BD.D0.B4.D0.B5.D0.BD.D1.86.D0.B8.D0.B8).

Thanks!

Comments (7)

  1. Konstantin Molchanov reporter

    Apologies for the misleading example. Here are some concrete links:

    http://searchanise-supporters-guide.readthedocs.org/ru/latest/widget.html#id1 (heading—Бесплатный)

    http://searchanise-supporters-guide.readthedocs.org/ru/latest/magento.html#searchanise (heading—После установки расширения Searchanise админка недоступна)

    http://searchanise-supporters-guide.readthedocs.org/ru/latest/admin.html#id2 (heading—Клиентская панель управления Searchanise; note that in this case, for some reason, even the word Searchanise is ignored).

    I'm using docutils v. 0.11.

    Should I redirect the issue to docutils then?

  2. Takayuki Shimizukawa

    Thanks. I'm looking for the example as you mentioned:

    It gets really bad if you have 2 headings with the same Latin word and the rest in non-Latin. e.g. "Инструкция по Searchanise" and "Модуль Searchanise"; this way both anchor links will be built upon the same word Searchanise, which really kills it.

    If this behavior is true, it's a bug. However, if these 2 headings generates 2 different ids (as "id1" and "id2"), it's a docutils' current specification I think.

    Should I redirect the issue to docutils then?

    Yeah, in either case bug or specification, I think it is faster/straight way.

  3. Konstantin Molchanov reporter

    OK, I've posted it as a bug at docutils: https://sourceforge.net/p/docutils/bugs/254/

    I think it is a bug rather than the desired behavior since many resources like Wikipedia use non-Latin anchor names and are OK with it.

    UPD: I haven't gotten any response so far, and, according to how they handle other tickets, I won't get any soon. This is really sad since the feature is crucial for all users who use non-Latin alphabets.

  4. Gleb Goncharov

    Hi!

    This issue is really annoying for me since I too use Russian headings.

    The Docutils issue tracker on SourceForge appears to be kind of dead—the issues are not even reviewed.

    Is there any chance this issue is fixed in Sphinx? I really don't think the Docutils people are going to do anything related anytime soon.

    Thanks!

  5. Takayuki Shimizukawa

    I think it is too hard to override/substitute the ID generation function by Sphinx because the function is in the deep of docutils. If sphinx override it by monkey patch, I think it might be so fragile one.

  6. Log in to comment