translate utf8mb4 charset when detected from server side to client side, for _DecodingRowProxy operations

Issue #2771 resolved
kondi created an issue

this has been edited by mike to show an actual bug

from sqlalchemy import *
e = create_engine("mysql://scott:tiger@localhost/test?charset=utf8mb4&use_unicode=0")
e.connect()
#!

Traceback (most recent call last):
...

    self._detect_ansiquotes(connection)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/dialects/mysql/base.py", line 2806, in _detect_ansiquotes
    mode = row[1] or ''
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/dialects/mysql/base.py", line 3309, in __getitem__
    return item.decode(self.charset)
LookupError: unknown encoding: utf8mb4

Comments (13)

  1. Mike Bayer repo owner

    without further detail given, this issue would appear to have nothing to do with SQLAlchemy - the "charset" parameter are straight pass-thrus to the database API and the database itself, SQLAlchemy has no awareness of a "LookupError" and by default defers all unicode behaviors to the DBAPI and database itself in the case of MySQL. A stack trace would help to reveal the source of this issue.

    sample test passes without issue:

    from sqlalchemy import *
    e = create_engine("mysql://scott:tiger@localhost/test", echo=True)
    m = MetaData()
    t = Table('t', m, Column('data', String(50)), mysql_engine='InnoDB', mysql_charset='utf8mb4')
    
    m.create_all(e)
    
    e.execute(t.insert(), data="hello world")
    

    output:

    CREATE TABLE t (
        data VARCHAR(50)
    )ENGINE=InnoDB CHARSET=utf8mb4
    
    
    2013-07-03 11:28:18,033 INFO sqlalchemy.engine.base.Engine ()
    2013-07-03 11:28:18,049 INFO sqlalchemy.engine.base.Engine COMMIT
    2013-07-03 11:28:18,049 INFO sqlalchemy.engine.base.Engine INSERT INTO t (data) VALUES (%s)
    2013-07-03 11:28:18,049 INFO sqlalchemy.engine.base.Engine ('hello world',)
    2013-07-03 11:28:18,051 INFO sqlalchemy.engine.base.Engine COMMIT
    
  2. Mike Bayer repo owner

    start of a patch:

    diff --git a/lib/sqlalchemy/dialects/mysql/base.py b/lib/sqlalchemy/dialects/mysql/base.py
    index b8392bd..9bc8a48 100644
    --- a/lib/sqlalchemy/dialects/mysql/base.py
    +++ b/lib/sqlalchemy/dialects/mysql/base.py
    @@ -3296,8 +3296,16 @@ class _DecodingRowProxy(object):
         # sets.Set(['value']) (seriously) but thankfully that doesn't
         # seem to come up in DDL queries.
    
    +    _mysql_compat = {
    +        "utf8mb4": "utf8"
    +        # TODO: other MySQL charsets that need Python
    +        # translation
    +    }
    +
         def __init__(self, rowproxy, charset):
             self.rowproxy = rowproxy
    +        if charset in self._mysql_compat:
    +            charset = self._mysql_compat[charset]
             self.charset = charset
    
         def __getitem__(self, index):
    
  3. Thomas Grainger
    diff --git a/lib/sqlalchemy/dialects/mysql/base.py b/lib/sqlalchemy/dialects/mysql/base.py
    index b8392bd..9bc8a48 100644
    --- a/lib/sqlalchemy/dialects/mysql/base.py
    +++ b/lib/sqlalchemy/dialects/mysql/base.py
    @@ -3296,8 +3296,16 @@ class _DecodingRowProxy(object):
         # sets.Set(['value']) (seriously) but thankfully that doesn't
         # seem to come up in DDL queries.
    
    +    _mysql_compat = {
    +        "utf8mb4": "utf8"
    +        # TODO: other MySQL charsets that need Python
    +        # translation
    +    }
    +
         def __init__(self, rowproxy, charset):
             self.rowproxy = rowproxy
    -         self.charset = charset
    +         self.charset = self._mysql_compat.get(charset, charset)
    
         def __getitem__(self, index):
    
  4. Thomas Grainger

    The encodings supported by MySQL are:

    ["big5", "dec8", "cp850", "hp8", "koi8r", "latin1", "latin2", "swe7", "ascii", "ujis", "sjis", "hebrew", "tis620", "euckr", "koi8u", "gb2312", "greek", "cp1250", "gbk", "latin5", "armscii8", "utf8", "ucs2", "cp866", "keybcs2", "macce", "macroman", "cp852", "latin7", "utf8mb4", "cp1251", "utf16", "cp1256", "cp1257", "utf32", "binary", "geostd8", "cp932", "eucjpms"]
    

    The encodings not supported by my Python are:

    ["dec8", "hp8", "koi8r", "swe7", "koi8u", "armscii8", "ucs2", "keybcs2", "macce", "utf8mb4", "binary", "geostd8", "eucjpms"]
    
  5. Thomas Grainger

    I've googled each possible missing encoding and the python alias and found:

    {
        'koi8r': 'koi8_r',
        'koi8u': 'koi8_u',
        'ucs2': 'utf-16-be',  # can't be used as client connection encoding
        'utf16': 'utf-16-be',  # MySQL's uft16 is always bigendian
        'utf8mb4': 'utf8', # real utf8
        'eucjpms': 'ujis',
    }
    
  6. Mike Bayer repo owner

    OK, so that, a couple of tests, and if we could put authoritative guidance in the mysql dialect docs for utf8mb4, we're in good shape.

  7. Log in to comment