UnicodeDecodeError: 'utf8' codec can't decode byte 0xa3 in position 97: invalid start byte

Issue #137 resolved
Marc Sanfaçon created an issue

I got this while scanning the changelog. Not sure what causes this... We might have accented characters (French) in the log, but I scanned the 20 changesets that are supposed to be displayed and found nothing.

{{{ 74 <span class="removed tooltip" tooltip_title="${('removed')}${h.literal(changed_tooltip(cs.removed))}">${len(cs.removed)}</span> 75 <span class="changed tooltip" tooltip_title="${('changed')}${h.literal(changed_tooltip(cs.changed))}">${len(cs.changed)}</span> 76 <span class="added tooltip" tooltip_title="${('added')}${h.literal(changed_tooltip(cs.added))}">${len(cs.added)}</span> 77 </div> 78 %if len(cs.parents)>1: 79 <div class="merge"> 80 ${('merge')}<img alt="merge" src="${h.url("/images/icons/arrow_join.png")}"/> 81 </div> 82 %endif }}}

{{{ /usr/lib/python2.6/encodings/utf_8.py, line 16: return codecs.utf_8_decode(input, errors, True) /usr/local/lib/python2.6/dist-packages/MarkupSafe-0.12-py2.6-linux-x86_64.egg/markupsafe/init.py, line 71: return unicode.new(cls, base) /usr/local/lib/python2.6/dist-packages/RhodeCode-1.1.6-py2.6.egg/rhodecode/templates/changelog/changelog.html, line 77: </div> /usr/local/lib/python2.6/dist-packages/RhodeCode-1.1.6-py2.6.egg/rhodecode/templates/base/base.html, line 78: ${next.main()} /usr/local/lib/python2.6/dist-packages/Mako-0.4.0-py2.6.egg/mako/runtime.py, line 711: callable_(context, *args, kwargs) /usr/local/lib/python2.6/dist-packages/Mako-0.4.0-py2.6.egg/mako/runtime.py, line 722: result = template.error_handler(context, error) /usr/local/lib/python2.6/dist-packages/Mako-0.4.0-py2.6.egg/mako/runtime.py, line 713: _render_error(template, context, e) /usr/local/lib/python2.6/dist-packages/Mako-0.4.0-py2.6.egg/mako/runtime.py, line 692: _exec_template(inherit, lclcontext, args=args, kwargs=kwargs) /usr/local/lib/python2.6/dist-packages/Mako-0.4.0-py2.6.egg/mako/runtime.py, line 660: kwargs_for_callable(callable, data)) /usr/local/lib/python2.6/dist-packages/Mako-0.4.0-py2.6.egg/mako/template.py, line 305: as_unicode=True) /usr/local/lib/python2.6/dist-packages/Pylons-1.0-py2.6.egg/pylons/templating.py, line 240: return literal(template.render_unicode(globs)) /usr/local/lib/python2.6/dist-packages/Pylons-1.0-py2.6.egg/pylons/templating.py, line 218: return render_func() /usr/local/lib/python2.6/dist-packages/Pylons-1.0-py2.6.egg/pylons/templating.py, line 243: cache_type=cache_type, cache_expire=cache_expire) /usr/local/lib/python2.6/dist-packages/RhodeCode-1.1.6-py2.6.egg/rhodecode/controllers/changelog.py, line 79: return render('changelog/changelog.html') /usr/local/lib/python2.6/dist-packages/Pylons-1.0-py2.6.egg/pylons/controllers/core.py, line 57: return func(args) /usr/local/lib/python2.6/dist-packages/Pylons-1.0-py2.6.egg/pylons/controllers/core.py, line 105: result = self._perform_call(func, args) /usr/local/lib/python2.6/dist-packages/Pylons-1.0-py2.6.egg/pylons/controllers/core.py, line 162: response = self._inspect_call(func) /usr/local/lib/python2.6/dist-packages/Pylons-1.0-py2.6.egg/pylons/controllers/core.py, line 211: response = self._dispatch_call() /usr/local/lib/python2.6/dist-packages/RhodeCode-1.1.6-py2.6.egg/rhodecode/lib/base.py, line 50: return WSGIController.call(self, environ, start_response) /usr/local/lib/python2.6/dist-packages/Pylons-1.0-py2.6.egg/pylons/wsgiapp.py, line 312: return controller(environ, start_response) /usr/local/lib/python2.6/dist-packages/Pylons-1.0-py2.6.egg/pylons/wsgiapp.py, line 107: response = self.dispatch(controller, environ, start_response) /usr/local/lib/python2.6/dist-packages/Routes-1.12.3-py2.6.egg/routes/middleware.py, line 131: response = self.app(environ, start_response) /usr/local/lib/python2.6/dist-packages/Beaker-1.5.4-py2.6.egg/beaker/middleware.py, line 152: return self.wrap_app(environ, session_start_response) /usr/local/lib/python2.6/dist-packages/RhodeCode-1.1.6-py2.6.egg/rhodecode/lib/middleware/simplehg.py, line 67: return self.application(environ, start_response) /usr/local/lib/python2.6/dist-packages/RhodeCode-1.1.6-py2.6.egg/rhodecode/lib/middleware/simplegit.py, line 101: return self.application(environ, start_response) /usr/local/lib/python2.6/dist-packages/WebError-0.10.3-py2.6.egg/weberror/evalexception.py, line 431: app_iter = self.application(environ, detect_start_response) }}}

Comments (16)

  1. Marc Sanfaçon reporter

    I found the changeset with the problem. It does not give any issue on the command line, when doing an hg log. When skipping over that changeset, everything works fine (I used '1' as the page size to test)

  2. Marc Sanfaçon reporter

    Here is the comment of the changeset. Does not look like it contains any utf8 chars

    * Bugs correction,
    + Add home screen for the app instead of loading the web page directly
    + Add the option to delete the interfaces
    * Change option panel
  3. Marcin Kuzminski repo owner

    it's not in the commits message rather in the file names cs.added etc, are files changed in revision.

  4. Marcin Kuzminski repo owner

    RhodeCode will not accept character in filenames that are not encoded with utf8, everything have to be unicode in RhodeCode, and filenames have to be converted to unicode

    if such code will fail on the fail name: 'filepath'.decode('utf-8', 'replace')

    It'll fail in RhodeCode.

    Files names,commit messages,users that are in repository should not contain accented characters, if they have to, they should be encoded with utf8

  5. Marc Sanfaçon reporter

    Is there any way to trap this error, because as you know, changesets in HG cannot be changed.

  6. Marcin Kuzminski repo owner

    3 solutions i can propose.

    1. If it's an very recent commit strip the repo to the revision that introduced wrong files that will remove the wrong changesets * other devs need to reclone or strip local copies *

    2. hg convert --filemap see http://mercurial.selenic.com/wiki/ConvertExtension#A--filemap, it can remove/rename files in the changeset history so You could rename the wrong files, or remove them from history, but needs same actions as solution 1

    3. patch rhodecode for not using unicode, small changes, visit me on #rhodecode irc for a howto.

  7. Marc Sanfaçon reporter

    It works just fine in hgserve & hgweb. I would think that this should work in Rhodecode also, no?

    1- Not really easy/doable since we would need to do this on all the cloned repo - we're talking more than 50, including some automated build & tests servers. 2- Same reason as above

  8. Marcin Kuzminski repo owner

    "It works just fine in hgserve & hgweb. I would think that this should work in Rhodecode also, no?"

    mercurial,hgweb operates purely on byte strings, and RhodeCode operates on unicode.

    There are few tricks i could do to just make rhodecode not throw errors in that cases (but the file names would display wrongly).

    Care to drop me few changesets repo with some files(can be empty just the names are important) that have accented character and cause rhodecode to crash ?

  9. Marc Sanfaçon reporter

    I don't know if that's enough, but I created a zip file containing empty files with the name that cause problems. I also included a bundle of the revision adding those files, but since you don't have the same repo, I don't think it will work.

    Let me know if you need more?


  10. Marcin Kuzminski repo owner

    That's enough thank you. I'll introduce a fixed version of RhodeCode that will use safe_unicode function.

    btw. 1.2 beta should not be affected by this issue, if You would like to test, while I'm introducing the patches to stable.


  11. Log in to comment