Unicode decoding errors not configurable

Issue #1257 resolved
Former user created an issue

The Python codec encode/decode methods support the option of specifying how to handle encoding/decoding errors.

For example:

>>> value = '\x92'
>>> value.decode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 0: unexpected code byte

>>> value.decode('utf-8','replace')
(u'\ufffd', 1)

This is a problem for me because we have what is supposed to be unicode data in our database, but they were able to coerce it to store non-unicode data. I cannot retrieve anything right now.

I propose changing 3 lines:

sqlalchemy/types.py

class String(Concatenable, TypeEngine):

    def __init__(self, length=None, convert_unicode=False, assert_unicode=None, unicode_errors='strict'):
        self.length = length
        self.convert_unicode = convert_unicode
        self.assert_unicode = assert_unicode
        self.unicode_errors = unicode_errors

    def result_processor(self, dialect):
        if self.convert_unicode or dialect.convert_unicode:
            def process(value):
                if value is not None and not isinstance(value, unicode):
                    return value.decode(dialect.encoding,self.unicode_errors)
                else:
                    return value
            return process
        else:
            return None

Comments (8)

  1. Former user Account Deleted

    Change was needed in these places, in addition to bind_processor and result_processor:[BR]

    1. types.py String.!init!() to accept unicode_error='ignore'[BR]

    2. types.py String.adapt() so subtypes of String gets the right parameters[BR]

    3. database/mysql.py MSString.!init!() needs to pass on unicode_error to String.!init!()[BR]

    There are two other places that calls String.!init!(). One in mssql.py: MSString(). The other in types.py: MSNVarChar(). The patch uploaded on 04/04/09 18:22:05 didn't affect those two places. [BR]

    The patch passes the test with mysql, doesn't run the test with sqlite (dbapi exception on non-conforming unicode).

  2. Log in to comment