1. Mark Edgington
  2. crecord
  3. Pull requests

Pull requests

#5 Declined
Repository
immerrr
Branch
default
Repository
edgimar
Branch
default

Fix display of non-ASCII characters in default user locale encoding

Author
  1. immerrr
Reviewers
Description

I've come across issue #38 today and checked that I'm affected too. It's mentioned at http://docs.python.org/2/library/curses.html explicitly that curses won't work with non-ASCII unless locale.setlocale() is invoked.

The patch is trivial and fixes the issue for me.

  • Issues #38: Add UTF-8 support resolved

Comments (8)

  1. Mark Edgington repo owner

    see Issue #12 -- there was a time when we did have this included, but it was removed to deal with this issue. If you have a way of solving both problems (or can show that the former is no longer an issue), please update your pull request.

  2. immerrr author

    I've tried steps described in issue #12 but failed to reproduce the failure. I mean, characters displayed in crecord were obviously wrong, since they were not in UTF-8, but otherwise than that everything worked.

    $ hg --version
    Mercurial Distributed SCM (version 2.5.1)
    (see http://mercurial.selenic.com for more information)
    
    $ echo $LANG
    en_US.UTF-8
    
  3. Mark Edgington repo owner

    Ok -- I would feel more comfortable with pulling the changes if you were able to reproduce the failure (e.g. by updating to a previous hg version), and identifying how or when it was fixed. If it's clear that something in hg has changed which fixes this problem, it would be helpful to know when, so that if it is something more recent, we can accommodate those users with older versions of hg.

  4. immerrr author

    On second thought, the error message described in issue #12 could've been perfectly valid, if the specified byte sequence had a segment that doesn't map to a proper UTF-8 character. And it's a perfectly valid, though rare, case to commit a file encoded with something other than default user locale encoding or even include files in different encodings in a single commit. Especially considering the latter case, falling back to single-byte encoding such as latin1 should definitely be an option, but I don't see how this can be guessed by crecord itself. So the user should do it. Two possible ways come to mind:

    • either rely on unix way of conveying desired locale configuration to executed programs, which means the user should define a value for LANG/LC_ALL env variable (or override it):
    $ LANG=C hg crecord
    
    • or add a command-line parameter for such override to be handled by crecord itself
    $ hg crecord --encoding=C
    

    The first one is easier as it requires no actions on crecord side, but the second one can be more suitable for using crecord as an extension for other software: that software may lack a way to specify custom environment for executed subprocesses.

  5. immerrr author

    About reproducing the issue: I'll try to do that in the nearest future, but I'd guess by the error message that newer version of python was more likely to fix that. Just checked, the issue was reported in May 2010 and python-2.7 came out in July.

  6. Mark Edgington repo owner

    Done. Enjoy. I'm declining this because I merged (rebased) it locally. I don't know how to close a pull request without either declining or merging w/ bb.