Problem with 8-bit bytestrings

Issue #316 resolved
ismdiego created an issue

I have successfully installed RhodeCode on Windows, and I have to say this is marvelous! Good job! congratulations.

But I found a problem with some of my repositories. As I am spanish I use some non-ascii letters in filenames. Mercurial and/or tortoiseHG work with them with no problems, but when I add the repository to the RhodeCode location and restart the server, when it is adding the new ones to the installation, it fails and says:

{{{ #!python

sqlalchemy.exc.ProgrammingError: (ProgrammingError) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

u'INSERT INTO repositories (repo_name, clone_uri, repo_type, user_id, private, statistics, downloads, description, created_on, fork_id, group_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)' (u'RepoGroup/Project0945', None, 'hg', 2, 0, 0, 1, 'Folder with some strange letters like \xc3\xb1 and so on', '2011-12-02 21:01:47.559000', None, 1) }}}

Comments (7)

  1. Marcin Kuzminski repo owner

    Can you send the full stack for this error ? also Does it happen only when you add a repo outside of rhodecode, and during startup it crashes on initial scan ? Does it crash when you add it on web interface ?

    I see non ascii chars in description, is that provided from .hgrc ?

  2. ismdiego reporter

    Hi Marcin,

    Excuse me, I have been some days off and could not answer your questions. But I have investigated the problem further, it is only related to "Description" field as you said. It is provided in .hgrc in the [web] section.

    I have tried with .hgrc encoded in UTF8 (with and without BOM) and in ANSI. All tests failed, so currently that field can not have any non-ASCII (7bit) characters for RhodeCode to handle it correctly (at least with v1.2.3).

    I have prepared 4 sample repositories for you to play with:

    • -> repo with .hgrc encoded in UTF8 without BOM and description field using non ASCII chars
    • -> repo with .hgrc encoded in UTF8 with BOM and description field using non ASCII chars
    • -> repo with .hgrc encoded in ANSI and description field using non ASCII chars
    • -> repo with .hgrc encoded in ANSI description field using ONLY ASCII chars (WORKS!)

    By the way, you can see that all the repositories have 2 files and 1 folder with special chars. As I created this files on Windows (without using fixutf8 Mercurial extension) the file/folder name encoding it's ANSI (if I recall well).

    If you unzip "" repository and add to RhodeCode, it is able to serve this one with no problem. But, if you then click on "files" and then on a file/folder (with special chars) to display its content, an error like this is shown: "There is no file nor directory at the given path: 'ficher\xef\xbf\xbdn1.txt' at revision 'a96b762f6ed9'"

    If you prefer, as this is a related but different problem, I can start another issue only with that "" and its details.

  3. Marcin Kuzminski repo owner

    Thanks for detailed description. I already did some fixes for vcs library to try to properly read .hgrc files with non ascii chars. I'll test if this helps here. You can also test it by fetching lastest TIP of vcs.

    I'll get back to you when I have some details.


  4. ismdiego reporter

    Hi Marcin,

    Great news! don't worry about taking long to fix this. The software is free, so no complaints :-) I can only feel gratitude about it.

    By the way, I don't know if you also fixed the other problem I found with the "" repository and file names with "special chars". I am talking about the message: "There is no file nor directory at the given path: 'ficher\xef\xbf\xbdn1.txt' at revision 'a96b762f6ed9'"

    I will check tomorrow with the latest 1.2.4 release and tell you if it works.

    Many thanks for you hard work!

  5. ismdiego reporter

    Hi Marcin,

    I have updated my installation to latest 1.3.3 version and found some small problems that I have just reported (tested with fresh new installation from scracth).

    This issue is definitely fixed, but the second one I talked you about (incorrect file names) is still present. I have described it in more detail (and also with sample, reproducible, data) at issue #398

    Thanks for your great work!

  6. Log in to comment