cyrillic letters in repository filenames results in error

Issue #281 new
Alexander Nikitin created an issue

Hello.

I've got a problem with displaying repository contents when it contains files with cyrillic letters:

when using default setting of default_encoding = utf8 and lang = ru I've got the following behaviour

  1. User interface is ok: 1.user_interface_is_ok.png

  2. File name and it's contents are not ok 2.file_name_and_file_content_are_not_ok.png

  3. File's URL doesn't work 3.url_link_to_file_with_cyrillic_doesn't_work.png

when using setting of default_encoding = utf8,cp1251 and lang = ru I've got the following behaviour

  1. User interface is still ok

  2. File name and it's contents are ok now 2.1.file_name_and_file_content_are_ok_now.png

  3. But file link still doesn't work 3.url_link_to_file_with_cyrillic_still_doesn't_work.png

Attached you will find hg repository that used for tests

Comments (6)

  1. Mads Kiilerich

    As I think I mentioned on another issue: It looks like Kallithea is running in a Python environment where Python doesn't know how to encode non-ascii. On Linux, that would be if LANG=C and can be fixed with for example LANG=en_US.utf8 .

    Can you try that? What is your platform and setup?

  2. Alexander Nikitin reporter

    (venv) nikitin@ubuntu:/srv/kallithea/venv$ uname -a

    Linux ubuntu 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

    (venv) nikitin@ubuntu:/srv/kallithea/venv$ lsb_release -a

    No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.2 LTS Release: 16.04 Codename: xenial

    (venv) nikitin@ubuntu:/srv/kallithea/venv$ pip freeze | grep Kallithea

    Kallithea==0.3.2

    (venv) nikitin@ubuntu:/srv/kallithea/venv$ echo $LANG

    en_US.UTF-8

    I start Kallithea instance with

    (venv) nikitin@ubuntu:/srv/kallithea/venv$ paster serve my.ini

  3. Alexander Nikitin reporter

    Oh, I've forgotten to mention another part of this bug - Kallithea corrupts cyrillic names in zip archive

    4.bad_cyrillic_filename_in_zip.png

  4. Alexander Nikitin reporter

    OK, I've done some debugging (well I'm not a python programmer :) ) .

    I think that _file_paths in class MercurialChangeset contains data that doesn't match request

    for example in def get_node(self, path):

    I have "human readable" path value (from parameter)

    path ::: текст с кириллицей и пробелами.txt
    

    and hex encoded filename in self._file_paths

    self._file_paths :::

    ['.hgignore', '\xf2\xe5\xea\xf1\xf2 \xf1 \xea\xe8\xf0\xe8\xeb\xeb\xe8\xf6\xe5\xe9 \xe8 \xef\xf0\xee\xe1\xe5\xeb\xe0\xec\xe8.txt']
    

    this hex encoded characters are in cp1251 encoding

  5. Alexander Nikitin reporter

    Some more updates for this issue

    path parameter in def get_node(self, path): is passed as unicode string (or utf-8 - I'm not sure)

    self._file_paths data contains hex escaped file name in cp1251 encoding

    that's why method def get_node(self, path): cannot find path in mercurial's _file_paths array

  6. Alexander Nikitin reporter

    As for the second part of of this bug (Kallithea corrupts cyrillic names in zip archive)

    cyrillic file name is encoded as cp1252 instead of cp1251 so that can be mercurial's API issue or misconfiguration of my Kallithea setup environment variables

  7. Log in to comment