UTF-8 in commit message displayed incorrectly

Issue #3742 resolved
Roman Starkov
created an issue

Take a look at this commit message: https://bitbucket.org/rstarkov/tankiconmaker/changeset/359b0ab55abd

It contains several typographic apostrophes, all of which are displayed incorrectly. The actual commit message contains them properly encoded as UTF-8.

Please could you support this scenario properly, as UTF-8 is the de-facto standard for encoding Unicode.

Comments (5)

  1. Dylan Etkin

    Hi Roman,

    Mercurial will preserve the encoding of the commit message as it was originally created.

    The raw commit for what you have linked to also displays the same way.

    I believe that the character you refer to is not the UTF-8 representation of apostrophes.



  2. Roman Starkov reporter
    • changed status to new

    Hi Dylan,

    That’s true; the "raw" output also shows the same mangled characters, but I’m not convinced it’s completely "raw". If you clone the repo, and then execute this command:

    hg log --template {desc} -r 359b0ab55abd -l 1 >description.txt

    then you will get the commit message correctly encoded in UTF-8, albeit without the BOM. Do you require a BOM to correctly detect UTF-8, or is this supposed to work?

    Note how the commit message has all apostrophes encoded as E2 80 99, which is the correct UTF-8 for ’ (not the same as ' )

    I’m no Mercurial expert, so do point out if I’m doing something wrong. Hope I’m not doing the wrong thing by changing the status back to "new".

  3. Roman Starkov reporter

    It seems that hg log actually takes the "true" commit message and converts it to my system’s non-unicode codepage, so it sounds like BitBucket is not at fault here.

    (though Mercurial most definitely did not preserve the encoding of my commit message for some reason, and I’ll ask some Mercurial experts what’s up with that)

  4. Log in to comment