Patch to fix an int to str parse error and an encoding fatal error

Issue #8 on hold
Michael De Wildt
created an issue

Issue 1: When a repoid was set in the hg config file the extension was casting the value to int that was causing a python type error when it was added to the string http request.

Issue 2: In some circumstances we where getting encoding errors when the HTTP request was being formed. So I added the line to ignore any offending non ASCII chars in the creation of the form data.

Please see the attached patch.

Comments (9)

  1. mdelagra repo owner

    I had to remove the call to encode form data as unicode. I had a user report to me that it can fail, because sometimes a unicode string will get passed into the method and "unicode(content, errors='ignore')" will fail if the content variable is already unicode.

    Unfortunately I could reproduce neither his nor your error, but in my exploratory testing I found that this change was stripping out legitimately encoded special characters that were causing no issues for reviewboard. The change and test is here:

    https://bitbucket.org/mdelagra/mercurial-reviewboard/changeset/0603a67f16a1

    Can you show me how to reproduce the issue? If so, maybe I can find a fix that is more friendly to special characters.

  2. Michael De Wildt reporter

    The issue presents itself in python 2.7's httplib.py file on line 801.

    It appears that we are trying to mix match unicode and non unicode characters. The problem character is the control character '0xc2'.

    I can reproduce the problem with the following code:

    msg = unicode('Hello World')
    message_body = '\xc2'
    msg += message_body
    

    The final line yields the following Trackback:

    Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

    You can add 'ascii' or 'utf-8' as the second parameter for the unicode function and it still yields the same error.

    Maybe we can just strip out the problem char as it is only whitespace? Thoughts?

  3. mdelagra repo owner

    You've lost me. \xc2 is a capital a with a circumflex, right? That's being inserted by your application somehow? I'm still not clear on what specific circumstances lead to this issue.

  4. Michael De Wildt reporter

    I cannot get to the bottom of it either because it does not happen on my systems (Windows and Ubuntu 10.4).

    It does happen on two of my colleagues machines who are both using Fedora with Python 2.7.

    Also, it is not just \xc2 that is the issue either, if you remove that others will cause error.

    I am basically posting as much info up here as I can just in case someone else has the problem and can work out what is going in.

  5. Log in to comment