Standardized bind/result processors collection

Issue #1627 resolved
Former user created an issue

(original reporter: ged) As you know, I've been playing with a C extension lately. I am at the point where I have translated all the RowProxy methods which benefit from more speed, and all common result processors (the result is quite encouraging btw). I have also reviewed the result processors for "common" types in all dialects and noticed there is quite a bit of duplication.

The code seem to be either copy-pasted from other dialects or worse: recoded slightly differently to do (apparently) the same work. The most glaring example is the Boolean type: AFAICT, every dialect except PostgreSQL redefine it, but they have all the same intent: convert a boolean to an int and back. For this particular case, I think it would make sense to move that code to types.py.

Even in more "complex" types, such as Numeric, the processors seem to return one of a few predefined conversion functions: decimal_to_float or float_to_decimal. I think it would be a good idea to make an explicit module with a collection of those "most often used conversion functions". This would have several advantages: - less code duplication - avoiding subtle implementation differences when the requirements of the DBAPI are the same. This is especially True for bind_processors, which sometimes accept different types on different dialects, usually without a good reason to do so (AFAICT). - less code to write for dialect implementors - I would only have to try ... except for the C processor in one place, and not in all dialects. - it would make it more obvious when one dialect doesn't use one of those common functions that it does something special.

For example, the _PGNumeric result_processor would look like this:

    def result_processor(self, dialect, coltype):
        if self.asdecimal:
            if coltype in (700, 701):
                return processors.float_to_decimal
            elif coltype == 1700:
                # pg8000 returns Decimal natively for 1700
                return None
            else:
                raise exc.InvalidRequestError("Unknown PG numeric type: %d" % coltype)
        else:
            if coltype in (700, 701):
                # pg8000 returns float natively for 701
                return None
            elif coltype == 1700:
                return processors.decimal_to_float
            else:
                raise exc.InvalidRequestError("Unknown PG numeric type: %d" % coltype)

I should also note that some processors (such as String and Decimal -- for the Decimal lookup) require local storage. Those usually correspond to Python closures in the current code. That does not translate well into C. Defining factories for those would be the solution:

def string_to_unicode_result_processor_factory(encoding):
    decoder = codecs.getdecoder(encoding)
    def process(value):
        if value is not None:
            # decoder returns a tuple: (value, len)
            return decoder(value)[0](0)
        else:
            return value

which can then use the C processor (or in fact, the C class in this case):

try:
    from demo import UnicodeResultProcessor
    def string_to_unicode_result_processor_factory(encoding):
        result_processor = UnicodeResultProcessor(encoding)
        return result_processor.process
except ImportError:
    ... (same as above)

What do you think? I'm willing to do all the changes myself, I just need an official blessing from you. If you agree with the general idea, does "processors" as the module name also suit you?

Comments (5)

  1. Mike Bayer repo owner

    the general idea sounds great. where would the C source file live ? wondering if we should convert types.py into a package.

  2. Former user Account Deleted
    • assigned issue to

    (original author: ged) My idea was to have two files (at least for the SQL layer) and 1 module. Something like sqlalchemy/cextensions.c and sqlalchemy/processors.c

  3. Log in to comment