databases/mysql.py cannot handle unicode COMMENTS of the schema

Issue #372 resolved
Former user created an issue

If a table schema includes a unicode COMMENT text then the mysql.py raise UnicodeEncodeError.

Traceback (most recent call last):
  File "./list_service_machines.py", line 11, in ?
    cstm_table = Table('SRVC', metadata, autoload=True)
  File "/usr/lib/python2.4/site-packages/sqlalchemy/schema.py", line 143, in __call__
    metadata.get_engine().reflecttable(table)
  File "/usr/lib/python2.4/site-packages/sqlalchemy/engine/base.py", line 505, in reflecttable
    self.dialect.reflecttable(conn, table)
  File "/usr/lib/python2.4/site-packages/sqlalchemy/databases/mysql.py", line 364, in reflecttable
    tabletype = self.moretableinfo(connection, table=table)
  File "/usr/lib/python2.4/site-packages/sqlalchemy/databases/mysql.py", line 384, in moretableinfo
    desc_fetched = str(desc_fetched)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 672-675: ordinal not in range(128)

Comments (8)

  1. Former user Account Deleted

    I applied following fix.

    --- sqlalchemy/databases/mysql.py.~1~   2006-11-13 08:45:32.000000000 +0900
    +++ sqlalchemy/databases/mysql.py       2006-11-20 05:54:43.000000000 +0900
    @@ -380,9 +380,10 @@
             desc_fetched = c.fetchone()[1](1)
    
             # this can come back as unicode if use_unicode=1 in the mysql connection
    -        if type(desc_fetched) is unicode:
    -            desc_fetched = str(desc_fetched)
    -        elif type(desc_fetched) is not str:
    +        #if type(desc_fetched) is unicode:
    +        #    desc_fetched = str(desc_fetched)
    +        #elif type(desc_fetched) is not str:
    +        if type(desc_fetched) not in (str, unicode):
                 # may get array.array object here, depending on version (such as mysql 4.1.14 vs. 4.1.11)
                 desc_fetched = desc_fetched.tostring()
             desc = desc_fetched.strip()
    
  2. Mike Bayer repo owner

    hi -

    that patch you have will break if unicode data is returned from the server, since youve removed the conversion from unicode to string. The comment right above it illustrates why this conversion is necessary.

    It would be helpful if you could attach the DDL for the table you're trying to reflect, and the fix would probably involve converting to string with encoding_errors set to IGNORE or something similar.

  3. Former user Account Deleted

    Okey,

    follwing is a my sample DDL.

    • SRVCTEND field have a COMMENT which is written is Japanese.

      CREATE TABLE SRVC ( SRVCSRVC int(8) unsigned zerofill NOT NULL default '00000000', SRVCTREG datetime default NULL, SRVCTSIN datetime default NULL, SRVCTEND date default NULL COMMENT '利用期限', SRVCSTMP timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP, PRIMARY KEY (SRVCSRVC) ) ENGINE=MyISAM DEFAULT CHARSET=ujis;

    I've tested with following script.

    import sys
    from sqlalchemy import *
    db = create_engine('mysql://user:pass@localhost/db?charset=utf8&use_unicode=1')
    db.echo = True
    
    metadata = BoundMetaData(db)
    session = create_session(bind_to=db)
    
    srvc = Table('SRVC', metadata, autoload=True)
    
    
    
    
    $ python sqlalchemy_test.py
    2006-12-15 09:44:35,632 INFO sqlalchemy.engine.base.Engine.0x..d0 show variables like 'lower_case_table_names'
    2006-12-15 09:44:35,633 INFO sqlalchemy.engine.base.Engine.0x..d0 None
    2006-12-15 09:44:35,642 INFO sqlalchemy.engine.base.Engine.0x..d0 describe SRVC
    2006-12-15 09:44:35,642 INFO sqlalchemy.engine.base.Engine.0x..d0 {}
    2006-12-15 09:44:35,660 INFO sqlalchemy.engine.base.Engine.0x..d0 SHOW CREATE TABLE SRVC
    2006-12-15 09:44:35,660 INFO sqlalchemy.engine.base.Engine.0x..d0 {}
    Traceback (most recent call last):
      File "sqlalchemy_test.py", line 11, in ?
        srvc = Table('SRVC', metadata, autoload=True)
      File "/usr/lib/python2.4/site-packages/sqlalchemy/schema.py", line 143, in __call__
        metadata.get_engine().reflecttable(table)
      File "/usr/lib/python2.4/site-packages/sqlalchemy/engine/base.py", line 505, in reflecttable
        self.dialect.reflecttable(conn, table)
      File "/usr/lib/python2.4/site-packages/sqlalchemy/databases/mysql.py", line 364, in reflecttable
        tabletype = self.moretableinfo(connection, table=table)
      File "/usr/lib/python2.4/site-packages/sqlalchemy/databases/mysql.py", line 384, in moretableinfo
        desc_fetched = str(desc_fetched)
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 672-675: ordinal not in range(128)
    $
    

    I think, to avoid this problem, COMMENT attribute should be removed from desc_fetched before converting to str.

    --- /usr/lib/python2.4/site-packages/sqlalchemy/databases/mysql.py.~1~  2006-11-13 08:45:32.000000000 +0900
    +++ /usr/lib/python2.4/site-packages/sqlalchemy/databases/mysql.py      2006-12-15 10:10:44.000000000 +0900
    @@ -381,7 +381,7 @@
    
             # this can come back as unicode if use_unicode=1 in the mysql connection
             if type(desc_fetched) is unicode:
    -            desc_fetched = str(desc_fetched)
    +            desc_fetched = str(re.sub(r"COMMENT '.*?'", "", desc_fetched))
             elif type(desc_fetched) is not str:
                 # may get array.array object here, depending on version (such as mysql 4.1.14 vs. 4.1.11)
                 desc_fetched = desc_fetched.tostring()
    
  4. Log in to comment