.. -*-restructuredtext-*-

FixUtf8 Extension

This extension is not distributed with Mercurial.

Author: Stefan Rusek

Download site: (Requires Mercurial 1.1 or later and Python 2.5 or later)

This extension is still in beta, use it at your own risk.


This extension corrects filename encoding problems on Windows.

Windows internally stores all command line arguments and filenames in Unicode UTF-16 (16-bit character strings), and for backward compatibility with Windows 3.x, provides functions to retrieve them in non-Unicode 8-bit character strings. Python 2.x and Mercurial call the non-Unicode functions. This causes Mercurial to misbehave when used with filenames that contain Unicode characters. This extension resolves this issue, by making sure that the Unicode functions are called. Since Mercurial expects 8-bit character strings, the extension converts the strings to UTF-8 before returning them to Mercurial.

There is one case where fixutf8 fails to add support for Unicode. Because the repository object for the current working directory is created before extensions are loaded. There is nothing that fixutf8 can do to fix the problem of a repository residing withing a directory with Unicode characters in it. However, fixutf8 does not have a problem with directories with Unicode characters inside of the repository.

Ideally, you enable the extension before you need international filenames, but if you already have international filenames in your repo, then you need to fix your filenames.

In order for Unicode characters to display properly, you should change the Windows console font from "Raster Fonts" to "Lucida Console".

Fixing existing filenames

To fix your filenames simply do the following:

>hg addremove -s 100
>hg commit -m "Fix filenames"


Configure your .hgrc to enable the extension by adding following lines:

fixutf8 = path/to/

Compatibility with TortoiseHg

Where TortoiseHg uses python or mercurial filename api calls, it is compatibile with FixUtf8. This means that most of the dialogs work fine. Shell integration does not work perfectly, since the TortoiseHg shell extension converts the filenames to the non-Unicode 8-bit character strings, before calling passing them to mercurial.