UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

Issue #147 invalid
Udo Spallek created an issue

Related Issue #9

When I try to create a kallithea fork of this repository https://code.google.com/p/hgnested/ an error is raised. In the hg log is an author with Name "Cédric" which makes use of the character u'\xe9'.

TIA and best regards Udo

File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/weberror/errormiddleware.py', line 162 in __call__
  app_iter = self.application(environ, sr_checker)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/beaker/middleware.py', line 155 in __call__
  return self.wrap_app(environ, session_start_response)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/routes/middleware.py', line 131 in __call__
  response = self.app(environ, start_response)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/wsgiapp.py', line 107 in __call__
  response = self.dispatch(controller, environ, start_response)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/wsgiapp.py', line 312 in dispatch
  return controller(environ, start_response)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/base.py', line 383 in __call__
  return WSGIController.__call__(self, environ, start_response)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 211 in __call__
  response = self._dispatch_call()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 162 in _dispatch_call
  response = self._inspect_call(func)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 105 in _inspect_call
  result = self._perform_call(func, args)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 57 in _perform_call
  return func(**args)
File '<string>', line 2 in index
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/auth.py', line 782 in __wrapper
  return func(*fargs, **fkwargs)
File '<string>', line 2 in index
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/auth.py', line 841 in __wrapper
  return func(*fargs, **fkwargs)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/controllers/summary.py', line 180 in index
  return render('summary/summary.html')
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/templating.py', line 243 in render_mako
  cache_type=cache_type, cache_expire=cache_expire)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/templating.py', line 218 in cached_template
  return render_func()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/templating.py', line 240 in render_template
  return literal(template.render_unicode(**globs))
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/template.py', line 452 in render_unicode
  as_unicode=True)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 803 in _render
  **_kwargs_for_callable(callable_, data))
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 835 in _render_context
  _exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 855 in _exec_template
  _render_error(template, context, compat.exception_as())
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 864 in _render_error
  result = template.error_handler(context, error)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 853 in _exec_template
  callable_(context, *args, **kwargs)
File '/var/local/kallithea/data/templates/base/root.html.py', line 209 in render_body
  __M_writer(escape(next.body()))
File '/var/local/kallithea/data/templates/base/base.html.py', line 42 in render_body
  __M_writer(escape(next.main()))
File '/var/local/kallithea/data/templates/summary/summary.html.py', line 241 in render_main
  runtime._include_file(context, u'../changelog/changelog_summary_data.html', _template_uri)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 730 in _include_file
  callable_(ctx, **_kwargs_for_include(callable_, context._data, **kwargs))
File '/var/local/kallithea/data/templates/changelog/changelog_summary_data.html.py', line 79 in render_body
  __M_writer(escape(h.person(cs.author)))
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/helpers.py', line 518 in person
  user = user_or_none(author)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/helpers.py', line 487 in user_or_none
  user = User.get_by_username(author_name(author), case_insensitive=True, cache=True)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/model/db.py', line 541 in get_by_username
  return q.scalar()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2215 in scalar
  ret = self.one()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2184 in one
  ret = list(self)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/caching_query.py', line 80 in __iter__
  return self.get_value(createfunc=lambda:
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/caching_query.py', line 99 in get_value
  ret = cache.get_value(cache_key, createfunc=createfunc)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/beaker/cache.py', line 305 in get
  return self._get_value(key, **kw).get_value()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/beaker/container.py', line 385 in get_value
  v = self.createfunc()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/caching_query.py', line 81 in <lambda>
  list(Query.__iter__(self)))
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2227 in __iter__
  return self._execute_and_instances(context)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2242 in _execute_and_instances
  result = conn.execute(querycontext.statement, self._params)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py', line 1449 in execute
  params)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py', line 1584 in _execute_clauseelement
  compiled_sql, distilled_params
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py', line 1691 in _execute_context
  context)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py', line 331 in do_execute
  cursor.execute(statement, parameters)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

Comments (13)

  1. Thomas De Schampheleire

    Which database are you using? I have seen similar backtraces when adding data into a PostgreSQL database (like revie comments or pull request descriptions) with unicode characters. After a long investigation it turned out that the database was in SQL_ASCII format rather than UTF8. Recreating the database in UTF8 solved the problem for me.

    To detect if you are in that situation, run 'psql -l'

    In my case, the reason that not the default utf8 format was chosen during database creation (initdb) was that I had one of the LC_* environment variables set to C. Unsetting that variable (LC_CTYPE in my case), leaving only LANG and LC_ALL (set to a utf8 compatible value) before creating the database again solved it.

    It is apparently not possible to migrate a live database, but the migration went painless. Essentially I took a pg_dumpall and a 'pg_dump kallithea'. I manually fixed the SQL_ASCII references in these files. Then in the new database I recreate the kallithea user with the right permissions, then imported the data from the pg_dump file (kallithea db only), also using pg_dump, piping the dump from stdin.

  2. Udo Spallek reporter

    Thanks for the pointer and you are right, we use Postgres as database. Unfortunately we already use utf-8 as encoding.

    Name | Owner | Encoding | Collate | Ctype | Access privileges
    -----------+-----------+----------+-------------+-------------+----------------------- kallithea | kallithea | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 |

    I will try to examine the dump.

  3. Udo Spallek reporter

    In the dump I found SQL_ASCII only in the head part as SET client_encoding = 'SQL_ASCII'; But in the database it seems to be sane:

    kallithea=# show client_encoding ;
     client_encoding 
    -----------------
     UTF8
    
    kallithea=# show server_encoding ;
     server_encoding 
    -----------------
     UTF8
    

    Any idea what can I do next? TIA Udo

  4. Mads Kiilerich

    Something in your stack must be forcing everything to be encoded as plain ascii.

    Try reproduce while running as paster serve and try reproducing with a simple sqlite database ... just to figure out what makes the problem occur.

  5. Udo Spallek reporter

    We use kallithea 0.2.1.

    My locale are:

    LANG=de_DE.UTF-8 LANGUAGE= LC_CTYPE="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" LC_COLLATE="de_DE.UTF-8" LC_MONETARY="de_DE.UTF-8" LC_MESSAGES="de_DE.UTF-8" LC_PAPER="de_DE.UTF-8" LC_NAME="de_DE.UTF-8" LC_ADDRESS="de_DE.UTF-8" LC_TELEPHONE="de_DE.UTF-8" LC_MEASUREMENT="de_DE.UTF-8" LC_IDENTIFICATION="de_DE.UTF-8" LC_ALL=

  6. Mads Kiilerich

    Please consider contributing documentation improvements that can help others to avoid this problem.

  7. Thomas De Schampheleire

    @udono Good that you found a solution. I still wonder though what was causing this problem. Would you care testing the problem (with the change in postgresql.conf undone) with following cases: - LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset) - LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I know works). - LC_ALL=en_US.UTF-8 + LANG=en_US.utf8

    Obviously you will have to restart Kallithea between each test.

    This analysis can serve as input into improving the documentation. Thanks a lot in advance...

  8. Udo Spallek reporter

    I still wonder though what was causing this problem.

    IMHO has Kallithea UTF-8 problems, when the client_encoding is set to SQL_ASCII in postgresql.conf.

    A solution could be enforcing the client encoding[1] in kallithea/sqalchemy.

    Workarounds are a. to remove the SQL_ASCII or b. to set utf8 client_encoding in postgresql.conf.

    [1] http://initd.org/psycopg/docs/connection.html#connection.set_client_encoding

    =============
    All Scenarios
    =============
    dispatch.wsgi
    =============
    
    Reset general locale setup:
    
    os.environ['LANG'] = ''
    os.environ['LC_CTYPE'] =""
    os.environ['LC_NUMERIC'] =""
    os.environ['LC_TIME'] =""
    os.environ['LC_COLLATE'] =""
    os.environ['LC_MONETARY'] =""
    os.environ['LC_MESSAGES'] =""
    os.environ['LC_PAPER'] =""
    os.environ['LC_NAME'] =""
    os.environ['LC_ADDRESS'] =""
    os.environ['LC_TELEPHONE'] =""
    os.environ['LC_MEASUREMENT'] =""
    os.environ['LC_IDENTIFICATION'] =""
    os.environ['LC_ALL'] = ""
    
    
    Scenario A
    ==========
    postgresql.conf
    ---------------
    Change from:
        client_encoding = utf8
    to:
        client_encoding = SQL_ASCII
    
    
    Scenario A1
    -----------
    LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset)
    os.environ['LANG'] = 'de_DE.UTF-8'
    os.environ['LC_ALL'] = 'de_DE.UTF-8'
    ::
        $ systemctl restart uwsgi-emperor
    
    Same Error
    
    
    Scenario A2
    -----------
    LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I
    know works).
    os.environ['LANG'] = 'en_US.UTF-8'
    ::
        $ systemctl restart uwsgi-emperor
    
    Same Error
    
    
    Scenario A3
    -----------
    LC_ALL=en_US.UTF-8 + LANG=en_US.utf8
    os.environ['LANG'] = 'en_US.UTF-8'
    os.environ['LC_ALL'] = 'en_US.UTF-8'
    ::
        $ systemctl restart uwsgi-emperor
    
    Same Error
    
    
    Scenario B
    ==========
    postgresql.conf
    ---------------
    Remove option:
        client_encoding = utf8
    
    and use Postgres default.
    
    
    Scenario B1
    -----------
    LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset)
    os.environ['LANG'] = 'de_DE.UTF-8'
    os.environ['LC_ALL'] = 'de_DE.UTF-8'
    ::
        $ systemctl restart uwsgi-emperor
    
    Works Perfect
    
    
    Scenario B2
    -----------
    LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I
    know works).
    os.environ['LANG'] = 'en_US.UTF-8'
    ::
        $ systemctl restart uwsgi-emperor
    
    Works perfect
    
    
    Scenario B3
    -----------
    LC_ALL=en_US.UTF-8 + LANG=en_US.utf8
    os.environ['LANG'] = 'en_US.UTF-8'
    os.environ['LC_ALL'] = 'en_US.UTF-8'
    ::
        $ systemctl restart uwsgi-emperor
    
    Works perfect
    
    
    Scenario C
    ==========
    postgresql.conf
    ---------------
    Set:
        client_encoding = utf8
    
    
    Scenario C1
    -----------
    LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset)
    os.environ['LANG'] = 'de_DE.UTF-8'
    os.environ['LC_ALL'] = 'de_DE.UTF-8'
    ::
        $ systemctl restart uwsgi-emperor
    
    Works Perfect
    
    
    Scenario C2
    -----------
    LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I
    know works).
    os.environ['LANG'] = 'en_US.UTF-8'
    ::
        $ systemctl restart uwsgi-emperor
    
    Works perfect
    
    
    Scenario C3
    -----------
    LC_ALL=en_US.UTF-8 + LANG=en_US.utf8
    os.environ['LANG'] = 'en_US.UTF-8'
    os.environ['LC_ALL'] = 'en_US.UTF-8'
    ::
        $ systemctl restart uwsgi-emperor
    
    Works perfect
    
  9. Mads Kiilerich

    ASCII means no unicode. Why would anybody want to configure that if they want unicode? And if they do, is that Kallithea's problem?

    If some system vendor defaults to that, their users should file a bug and ask them to default to unicode or make it very clear how to change it. As a last resort, the Kallithea documentation could explain how to work around it.

    How did it end up that way in your case?

    I can see you use uwsgi. I haven't tried it but it seems appealing. It would be nice if we could get some documentation of how to configure that. ;-)

  10. Udo Spallek reporter

    ASCII means no unicode. Why would anybody want to configure that if they want unicode? And if they do, is that Kallithea's problem?

    I investigated a little further and found a nice solution: Kallithea just needs to be started with this os environ variable: PGCLIENTENCODING='UTF8'. I tested with SQL_ASCII client encoding in postgres.conf and it works perfect.

    Additionally, to my shame, it is already documented in Kallithea: http://docs.kallithea-scm.org/en/latest/setup.html#apache-s-wsgi-config

    If some system vendor defaults to that, their users should file a bug and ask them to default to unicode or make it very clear how to change it. As a last resort, the Kallithea documentation could explain how to work around it. How did it end up that way in your case? We came along the debops way: https://github.com/debops/ansible-postgresql/search?utf8=%E2%9C%93&q=client_encoding

    I can see you use uwsgi. I haven't tried it but it seems appealing. It would be nice if we could get some documentation of how to configure that. ;-) You can find the configuration in the above link. I do not know more.

  11. Mads Kiilerich

    Thanks for following up.

    But if the admin explicitly set ASCII in inpostgres.conf, should Kallithea really try to overrule it?

    I don't see exactly your problem documented in setup.html ... but the documentation is a bit unclear and can be read in many ways ;-)

  12. Thomas De Schampheleire

    I also wonder why you had SQL_ASCII in postgres.conf, that is not standard.

    For the documentation, I think we can list following attention points: - is the database itself in UTF8 - is there no explicit override in postgres.conf - is there nothing in the environment when starting Kallithea that disables UTF8.

  13. Log in to comment