[doc, unicode] UTF-8 issues in the changelog (and not only)

Andrej Shadura created an issue

For example, ‘Vernooij’ shows as ‘Vernoo?’.

  1. Mads Kiilerich

    It works fine here. I guess it is caused by running the server with LANG=C and thus disabling some unicode handling.

    It should perhaps be documented or made independent of the env settings.

  2. Mads Kiilerich

    I'm not entirely sure.

    In some areas Mercurial makes naive guess/assumptions of what encoding is used. It might thus be necessary to run Kallithea (and thus hg) in an environment with for example HGENCODING=UTF-8 (or perhaps LANG=UTF-8 ... but that might also have other consequences). I guess it should be tested/reviewed and that code or documentation should be changed.

  3. Thomas De Schampheleire

    Some info: I encountered errors when users added unicode characters in changeset/pullrequest comments, pullrequest titles or descriptions, ... It turned out that this was caused by the PostgreSQL database having encoding SQL_ASCII rather than the recommended UTF-8 (you can check this with 'psql -l')

    This in itself was caused by having LC_CTYPE=C set when creating the database initially. Creating the database again (and migrating the existing data) but with LC_CTYPE unset so that the databases are all in UTF-8, made these issues disappear.

    For reference, the LANG was always set to en_US.UTF-8 here.

  4. Andrej Shadura reporter

    I think this can now be closed, we've addressed a bunch of related issues since the date this bug has been reported.

  5. Thomas De Schampheleire

    We should still make sure to update the documentation though: the user still needs to make sure to create the database in UTF-8.

