trim message correctly

Issue #90 open
Shun-ichi Goto created an issue

util.describe_revision() trims message string to 80 characters (maybe terminal width) to display but it may breaks utf-8 byte sequence. I got UnicodeDecodeError with fixutf8 extention (which is good friend of hgsubversion for Japanese users).

The attached patch fixes it by counting column width of each character of decoded string. So the wide characters like Japanese kanji works fine.

This patch also has a change to use hgutil.termwidth() to know actual terminal width and limit to termwidth()-1 for appending LF.

Comments (7)

  1. Shun-ichi Goto reporter

    I think that's safe way not to wrap at 80th character on terminal. At least windows command prompt and emacs's pty makes two lines by 80 char + LF.

  2. Augie Fackler repo owner

    In looking at this patch, I'm pretty sure that encoding.encoding will always be utf-8 and not the user's actual encoding. Will that be problematic?

  3. Shun-ichi Goto reporter

    Printing log message with utf-8 is not what we expected and we are happy if the messages from hgsubversion are encoded with local encoding. But it is not so important because filenames are utf-8 anyway. And it's delicate issue to manipulates encoding.encoding becuase other extensions may change or exepect in some meaning.

    In my case, I'm also using fixutf8 extension on Japanese windows (shift_jis encoding) to use utf-8 repository (made by hgsubversion). With this combination, hgsubversion's (mercurial's) output is catched and converted to local encoding (detected by getting terminal's codepage via windows API). It also change encoding.encoding to utf-8 and wraps ui.write() function to convert. If hgsubversion make message with non encoding.encoding, fixutf8 will be confused.

  4. Dirkjan Ochtman

    612b8d753549 starts using termwidth and inlines describe_revision(). It doesn't do anything special for the encoding, though. Does it just need to be recoded from utf-8 to encoding.encoding?

  5. Log in to comment