Unicode decoding errors not configurable
The Python codec encode/decode methods support the option of specifying how to handle encoding/decoding errors.
For example:
>>> value = '\x92'
>>> value.decode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 0: unexpected code byte
>>> value.decode('utf-8','replace')
(u'\ufffd', 1)
This is a problem for me because we have what is supposed to be unicode data in our database, but they were able to coerce it to store non-unicode data. I cannot retrieve anything right now.
I propose changing 3 lines:
sqlalchemy/types.py
class String(Concatenable, TypeEngine):
def __init__(self, length=None, convert_unicode=False, assert_unicode=None, unicode_errors='strict'):
self.length = length
self.convert_unicode = convert_unicode
self.assert_unicode = assert_unicode
self.unicode_errors = unicode_errors
def result_processor(self, dialect):
if self.convert_unicode or dialect.convert_unicode:
def process(value):
if value is not None and not isinstance(value, unicode):
return value.decode(dialect.encoding,self.unicode_errors)
else:
return value
return process
else:
return None
Comments (8)
-
repo owner -
Account Deleted pass the test that ignores unicode codec error
-
Account Deleted Change was needed in these places, in addition to bind_processor and result_processor:[BR]
-
types.py String.!init!() to accept unicode_error='ignore'[BR]
-
types.py String.adapt() so subtypes of String gets the right parameters[BR]
-
database/mysql.py MSString.!init!() needs to pass on unicode_error to String.!init!()[BR]
There are two other places that calls String.!init!(). One in mssql.py: MSString(). The other in types.py: MSNVarChar(). The patch uploaded on 04/04/09 18:22:05 didn't affect those two places. [BR]
The patch passes the test with mysql, doesn't run the test with sqlite (dbapi exception on non-conforming unicode).
-
-
repo owner - changed milestone to 0.6.0
-
repo owner - changed status to resolved
this is in 05d5fc11d92e4d46ba9af1fd2e1bc2ad11353d19. Many caveats, and definitely not a feature people should switch on casually if not on MySQL.
-
repo owner - removed status
- changed status to open
-
repo owner -
repo owner - removed milestone
Removing milestone: 0.6.0 (automated comment)
- Log in to comment
immediate workaround is to use a custom TypeDecorator which does the decode.