Regression: Unicode comments fail to be posted

Issue #275 resolved
Konstantin Veretennicov
created an issue

Steps to reproduce:

  1. Clean install Kallithea (on Windows, from sources, revision a1f8bf0)
  2. Create 2 users
  3. Create a repo and a PR
  4. Post an inline comment with Unicode characters

Expected: comments to work as usual for all users.

Actual: only PR owner can post Unicode, for other users it fails.

There is an error in the server log:

  File "c:\kallithea\kallithea\lib\celerylib\tasks.py", line 307, in send_email
    % (' '.join(recipients), headers, subject, body, html_body))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 150: ordinal not in range(128)

Comments (5)

  1. Konstantin Veretennicov reporter

    Confirmed on Ubuntu as well.

    The following patch fixes it:

    diff -r a1f8bf0428c5 kallithea/model/notification.py
    --- a/kallithea/model/notification.py   Sat Apr 15 01:56:27 2017 +0200
    +++ b/kallithea/model/notification.py   Sun Apr 30 19:56:08 2017 +0200
    @@ -342,4 +342,4 @@
                     })
    
             log.debug('rendering tmpl %s with kwargs %s', base, _kwargs)
    -        return email_template.render(**_kwargs)
    +        return email_template.render_unicode(**_kwargs)
    

    I wonder though if Mako should be configured globally to always emit Unicode. It has output_encoding='utf-8' at the moment, probably set somewhere by TG2 - I couldn't find it in Kallithea code.

  2. Mads Kiilerich

    I guess the problem is caused by running the WSGI in an environment where the encoding is set to ASCII 7 bit. I can reproduce the behaviour on Linux with LANG=C gearbox serve.

    I am surprised if this is the only problem you see? Can you for example create repositories with non-ASCII characters in the name?

    The Kallithea WSGI application must return encoded unicode and must thus know what encoding the system uses for example in the file system (also on Windows where the Python stack uses the 8-bit API).

  3. Konstantin Veretennicov reporter

    We avoid any non-ASCII paths and filenames in general. Those are fraught with issues. PR comments are different though - sometimes an accented character gets in, other times it's typographic quote copy/pasted.

    I tried to create a Unicode-named repo through Kallithea UI - it worked (don't know if it'd blow up later somewhere). Adding Unicode-named file to it also worked. Hope it helps.

  4. Log in to comment