Error when there are accented chars in the fieldnames using mssql

Issue #1264 resolved

Former user created an issue 2008-12-26

Hi,

Another problem appeared working against the Microsoft Dynamics NAV database (MSSQL). The error raises whenever a field with special chars like accented vocals (typical in spanish) are involved. I suppose that sqlalchemy thinks that the fieldnames are encoded in utf8, but it looks as if they are encoded in unicode. The collation property of the database is Modern_Spanish_CI_AS. When I have used ADO I must write the fieldname encoded in unicode.

C:\Python26\lib\site-packages\sqlalchemy-0.5.0rc5dev-py2.6.egg\sqlalchemy\databa
ses\mssql.py:878: DeprecationWarning: object.__new__() takes no parameters
  return super(MSSQLDialect, cls).__new__(cls, *args, **kwargs)
Traceback (most recent call last):
  File "exportOrdersFromNAV2.py", line 328, in <module>
    ).filter(PurchaseHeader.documentType==1).one()
  File "c:\python26\lib\site-packages\SQLAlchemy-0.5.0rc5dev-py2.6.egg\sqlalchem
y\orm\query.py", line 1061, in one
    ret = list(self[0:2](0:2))
  File "c:\python26\lib\site-packages\SQLAlchemy-0.5.0rc5dev-py2.6.egg\sqlalchem
y\orm\query.py", line 961, in __getitem__
    return list(res)
  File "c:\python26\lib\site-packages\SQLAlchemy-0.5.0rc5dev-py2.6.egg\sqlalchem
y\orm\query.py", line 1101, in __iter__
    return self._execute_and_instances(context)
  File "c:\python26\lib\site-packages\SQLAlchemy-0.5.0rc5dev-py2.6.egg\sqlalchem
y\orm\query.py", line 1104, in _execute_and_instances
    result = self.session.execute(querycontext.statement, params=self._params, m
apper=self._mapper_zero_or_none(), _state=self._refresh_state)
  File "c:\python26\lib\site-packages\SQLAlchemy-0.5.0rc5dev-py2.6.egg\sqlalchem
y\orm\session.py", line 753, in execute
    clause, params or {})
  File "c:\python26\lib\site-packages\SQLAlchemy-0.5.0rc5dev-py2.6.egg\sqlalchem
y\engine\base.py", line 819, in execute
    return Connection.executors[c](c)(self, object, multiparams, params)
  File "c:\python26\lib\site-packages\SQLAlchemy-0.5.0rc5dev-py2.6.egg\sqlalchem
y\engine\base.py", line 869, in _execute_clauseelement
    return self.__execute_context(context)
  File "c:\python26\lib\site-packages\SQLAlchemy-0.5.0rc5dev-py2.6.egg\sqlalchem
y\engine\base.py", line 896, in __execute_context
    return context.get_result_proxy()
  File "c:\python26\lib\site-packages\SQLAlchemy-0.5.0rc5dev-py2.6.egg\sqlalchem
y\engine\default.py", line 266, in get_result_proxy
    return base.ResultProxy(self)
  File "c:\python26\lib\site-packages\SQLAlchemy-0.5.0rc5dev-py2.6.egg\sqlalchem
y\engine\base.py", line 1410, in __init__
    self._init_metadata()
  File "c:\python26\lib\site-packages\SQLAlchemy-0.5.0rc5dev-py2.6.egg\sqlalchem
y\engine\base.py", line 1444, in _init_metadata
    colname = item[0](0).decode(self.dialect.encoding)
  File "C:\Python26\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 29-31: invalid d
ata

The value of item in line 1444 in sqlalchemy/engine/base.py is:

('Bizak$Purchase Header_Tipolog\xeda de Abono', <type 'int'>, None, 10, None, None, False)

The table Header has a field called de Abono, note the accented 'i' in the first word. And self.dialect.encoding is:

'utf8'

I hope all this can help. I used the last mercurial trunk to test this.

Thank you.

1264.diff

Comments (14)

Mike Bayer repo owner
- changed milestone to 0.5.xx
Since you haven't posted a full round trip test case, I don't know if your issue is related to reflection, table configuration, encoding configuration, specific DBAPI in use (not specified here), or other issues.

SQLA supports unicode strings as identifier names. The proper approach for most DBAPIs is to encode the unicode string into the specified encoding before sending. This is heavily dependent on what the DBAPI allows. pymssql in particular has absolute zero support for unicode identifiers or data.

I'm not familiar with the level of our support for MSSQL overall although the unicode schema tests are currently enabled for this platform which would suggest they are working.

In any case, if you are building the Table manually, you must use a Python unicode object for each multibyte identifier name, not an encoded string.

If the issue is dependent on reflection, it may be that the MSSQL dialect is not decoding incoming schema identifiers to unicode objects. If this is the case, I'd suggest not using reflection for this particular table as a workaround, and this ticket can be made specific to reflection.
- 2008-12-26T12:07:06+00:00
Mike Bayer repo owner
- changed component to mssql
- assigned issue to
- 2008-12-26T12:25:18+00:00
Michael Trier
The unicode schema tests, although enabled, all fail on MSSQL with pyodbc.
- 2008-12-26T15:09:08+00:00
Mike Bayer repo owner
OK so, someone needs to provide for us a working round trip using pyodbc alone that properly understands multibyte characters. Once we see that its possible with pyodbc, an approach can be built for SQLAlchemy.
- 2008-12-26T20:13:22+00:00

Mike Bayer repo owner

hey michael -

what happens if you apply this patch ?

Index: lib/sqlalchemy/databases/mssql.py
===================================================================
--- lib/sqlalchemy/databases/mssql.py   (revision 5532)
+++ lib/sqlalchemy/databases/mssql.py   (working copy)
@@ -1199,7 +1199,7 @@
     supports_sane_multi_rowcount = False
     # PyODBC unicode is broken on UCS-4 builds
     supports_unicode = sys.maxunicode == 65535
-    supports_unicode_statements = supports_unicode
+    supports_unicode_statements = False
     execution_ctx_cls = MSSQLExecutionContext_pyodbc

     def __init__(self, **params):

All this does is cause SQLA to encode the string into the configured encoding before passing the statement along.

2008-12-27T00:21:42+00:00

Michael Trier
zzzeek: with that change the table names get created all messed up. For instance the table name Unitéble2 gets created as UnitÃ©ble2. This causes the INSERTS and so forth to be unable to locate the table. It just seems like we have a mismatch somewhere.

I've been working with the UnicodeSchemaTest trying to pin down the problem. Notice the issue here when I run that test, the acute in u'méil' gets stored as \xe9 (which is standard), but in the order by it's showing up as \x82. Ideas?
```
(Pdb) print tt1.select(order_by=desc(u'méil'))
SELECT unitable1.[m\xe9il](m\xe9il), unitable1.[\u6e2c\u8a66](\u6e2c\u8a66)
FROM unitable1 ORDER BY m\x82il DESC
```
- 2008-12-27T11:11:46+00:00

Michael Trier

It turns out with the xe9 / x82 issue it was that the order by was actually using a different character. So the information was correct. It doesn't solve the problem though, so that wasn't related to the issue. Here's a stack trace if it's useful at all:

======================================================================
ERROR: test_reflect (__main__.UnicodeSchemaTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test\sql\unicode.py", line 103, in test_reflect
    self.assert_(tt1.select(order_by=desc(u'm├⌐il')).execute().fetchall() ==
  File "S:\sqlalchemy.git\lib\sqlalchemy\sql\expression.py", line 1127, in execute
    return e._execute_clauseelement(self, multiparams, params)
  File "S:\sqlalchemy.git\lib\sqlalchemy\engine\base.py", line 1204, in _execute_clauseelement
    return connection._execute_clauseelement(elem, multiparams, params)
  File "S:\sqlalchemy.git\lib\sqlalchemy\engine\base.py", line 1279, in _execute_clauseelement
    return proxy.execute(self, super(ProxyConnection, self).execute, elem, *(multiparams or [**(params or {}))
  File "S:\sqlalchemy.git\test\testlib\assertsql.py", line 261, in execute
    result = execute(clauseelement, *multiparams, **params)
  File "S:\sqlalchemy.git\lib\sqlalchemy\engine\base.py", line 819, in execute
    return Connection.executors[c](]),)(self, object, multiparams, params)
  File "S:\sqlalchemy.git\lib\sqlalchemy\engine\base.py", line 869, in _execute_clauseelement
    return self.__execute_context(context)
  File "S:\sqlalchemy.git\lib\sqlalchemy\engine\base.py", line 896, in __execute_context
    return context.get_result_proxy()
  File "S:\sqlalchemy.git\lib\sqlalchemy\engine\default.py", line 266, in get_result_proxy
    return base.ResultProxy(self)
  File "S:\sqlalchemy.git\lib\sqlalchemy\engine\base.py", line 1410, in __init__
    self._init_metadata()
  File "S:\sqlalchemy.git\lib\sqlalchemy\engine\base.py", line 1444, in _init_metadata
    colname = item[0](0).decode(self.dialect.encoding)
  File "C:\Python25\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data

2008-12-27T12:10:14+00:00

Mike Bayer repo owner
OK you need to break this test down into smaller steps. that specific error is occurring when we get a result set and there's multibyte column names in the cursor.description. that's actually fairly promising. Its likely that cursor.description in this case already contains python unicode strings or perhaps they are double-encoded in some way. we can make that decode() conditional if need be.
- 2008-12-27T12:40:46+00:00
Michael Trier
Just for the record. We discussed this on IRC. The problem is with how pyodbc is returning the cursor.description. It's always returning it as latin-1, so the default conversion to utf-8 is failing. We discussed possibly adding a dialect option to support description_encoding where it could be set to 'latin-1' for the pyodbc dialect.
- 2008-12-27T22:24:02+00:00
Former user Account Deleted
Replying to empty:

Just for the record. We discussed this on IRC. The problem is with how pyodbc is returning the cursor.description. It's always returning it as latin-1, so the default conversion to utf-8 is failing. We discussed possibly adding a dialect option to support description_encoding where it could be set to 'latin-1' for the pyodbc dialect.

Just to confirm, I was using the pyodbc dialect when the exception appeared.
- 2008-12-28T06:43:28+00:00
Michael Trier
How does this look?
- 2008-12-30T01:01:08+00:00
Michael Trier
- changed milestone to 0.5.0
- changed status to resolved
I corrected this in dfd80ba089c0d0637f54cbd6b21332d5f5115999. If the implementation is not right let me know.
- 2008-12-30T01:41:42+00:00
Former user Account Deleted
Replying to empty:

I corrected this in dfd80ba089c0d0637f54cbd6b21332d5f5115999. If the implementation is not right let me know.

I confirm that now it works. Thank you.
- 2008-12-30T02:47:14+00:00
Mike Bayer repo owner
- removed milestone
Removing milestone: 0.5.0 (automated comment)
- 2014-02-23T18:14:05+00:00
Log in to comment

Assignee: –

Type: bug

Priority: critical

Status: resolved

Component: mssql

Milestone: –

Votes: 0

Watchers: 0