Encoding Problems

Issue #757 new
Kyrodan created an issue


  • using hg on windows (default encoding after default installation is cp1252 and for our scenario it's not possible to change this)
  • a directory with non-ascii-character in name, e.g. german umlaut
  • a textfile in this directory generated by notepad (default-utf-8-encoding) with non-ascii-characters
  • all pushed to rhodecode on linux (Ubuntu 12.04) - default-encoding of hg here is utf-8

case 1: default-encoding = cp1252 in production.ini

  1. rhodecode shows directory name correct in "files" view
  2. rhodecode shows content of file incorrect (utf-8-characters as single chars)
  3. paster make-index can't index, error:
2013-02-06 16:02:12.686 INFO  [rhodecode.model] initializing db for mysql://rhodecode:XXXXX@localhost/rhodecode
2013-02-06 16:02:12.791 INFO  [rhodecode.model.scm] scanning for repositories in /opt/repositories
Traceback (most recent call last):
  File "/opt/rhodecode/venv/bin/paster", line 9, in <module>
    load_entry_point('PasteScript==1.7.5', 'console_scripts', 'paster')()
  File "/opt/rhodecode/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 104, in run
    invoke(command, command_name, options, args[1:])
  File "/opt/rhodecode/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke
    exit_code = runner.run(args)
  File "/opt/rhodecode/venv/local/lib/python2.7/site-packages/rhodecode/lib/utils.py", line 673, in run
    return super(BasePasterCommand, self).run(args[1:])
  File "/opt/rhodecode/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 238, in run
    result = self.command()
  File "/opt/rhodecode/venv/local/lib/python2.7/site-packages/rhodecode/lib/indexers/__init__.py", line 138, in command
  File "/opt/rhodecode/venv/local/lib/python2.7/site-packages/rhodecode/lib/indexers/daemon.py", line 416, in run
  File "/opt/rhodecode/venv/local/lib/python2.7/site-packages/rhodecode/lib/indexers/daemon.py", line 408, in update_indexes
  File "/opt/rhodecode/venv/local/lib/python2.7/site-packages/rhodecode/lib/indexers/daemon.py", line 355, in update_file_index
    i, iwc = self.add_doc(writer, path, repo, repo_name)
  File "/opt/rhodecode/venv/local/lib/python2.7/site-packages/rhodecode/lib/indexers/daemon.py", line 143, in add_doc
    node = self.get_node(repo, path)
  File "/opt/rhodecode/venv/local/lib/python2.7/site-packages/rhodecode/lib/indexers/daemon.py", line 131, in get_node
    node = repo.get_changeset().get_node(n_path)
  File "/opt/rhodecode/venv/local/lib/python2.7/site-packages/rhodecode/lib/vcs/backends/hg/changeset.py", line 344, in get_node
    % (path, self.short_id))
rhodecode.lib.vcs.exceptions.NodeDoesNotExistError: There is no file nor directory at the given path: 'Testeinstellungen f\xc3\xbcr Netbeans-Runtime/ProE.txt' at revision '4a595ca265bb'

case 2: default-encoding = utf-8 in production.ini

  1. rhodecode shows directory names correct in "files" view, but I can't navigate within this directory. Nothing happens, but after reloading the page, two identical messages appear on top: "There is no file nor directory at the given path: 'Testeinstellungen f\xc3\xbcr Netbeans-Runtime' at revision '4a595ca265bb'"
  2. rhodecode shows content of file correct (utf-8-characters as single chars)
  3. paster make-index can't index, same error as above

Comments (2)

  1. Marcin Kuzminski repo owner

    non-utf8 encoding are always troublesome :)

    As of problem 3 i can probably fix that, but i don't understand number 2 can you explain a little bit more ?

    btw did you try putting a list in default_encoding eg: default_encoding=cp1252,utf8

  2. Kyrodan reporter

    I tried setting default_encoding to both encodings, but it does not solve my problem.

    To number 2:

    All my source code files are encoded utf-8 and even under windows this is no problem. But default encoding under Windows for path and so on is cp1252 (because windows is "broken" and mercurial tries to make the best of it).

    If I navigate to a view, where the content of a file is displayed (e.g. the changeset view) it displays the content in default encoding: so if it is utf-8 and default encoding is cp1252 (due to problem number 1: navigation in file view), it is displayed "weired". And vice versa: if I set default encoding to utf-8: navgation in file view does not work, but content is displayed correctly.

    To solve this problem it's maybe necessary to take a look at the "strategy" that mercurial uses for encodings: (http://mercurial.selenic.com/wiki/EncodingStrategy)

    So it seems there are three types of data which maybe need to be handled seperately.

  3. Log in to comment