Implementing next on ResultProxy

Issue #4077 resolved
Philip Martin created an issue

Since ResultProxy references a DB API cursor is there any reason why the next protocol is not implemented on ResultProxy? Given it can iterator over query results in a loop, it seems to me that it would make sense from a Python language standpoint if it implemented next. I am not aware of how other DB API cursors operate but using psycopg2 one could:

import psycopg2
select_statement = 'select * from generate_series(1, 5)'
conn = psycopg2.connect()
cursor = conn.cursor()
cursor.execute(select_statement)
next(cursor)

Whereas in Sqlalchemy, calling next on the result issues a TypeError:

from sqlalchemy import create_engine
engine = create_engine()
conn = engine.connect()
results = conn.execute(select_statement)
next(results)

Comments (9)

  1. Mike Bayer repo owner

    OK, per pep249 they are doing it, and also having __iter__() return self, so for us that looks like:

    diff --git a/lib/sqlalchemy/engine/result.py b/lib/sqlalchemy/engine/result.py
    index 3aae932f2..f7a2fdcfa 100644
    --- a/lib/sqlalchemy/engine/result.py
    +++ b/lib/sqlalchemy/engine/result.py
    @@ -862,12 +862,15 @@ class ResultProxy(object):
                 self.closed = True
    
         def __iter__(self):
    -        while True:
    -            row = self.fetchone()
    -            if row is None:
    -                return
    -            else:
    -                yield row
    +        return self
    +
    +    def __next__(self):
    +        row = self.fetchone()
    +        if row is None:
    +            raise StopIteration()
    +        return row
    +
    +    next = __next__
    
         @util.memoized_property
         def inserted_primary_key(self):
    

    I'm a little nervous about StopIteration being slow so I want to run a performance test on that, to see if Python's iteration internals take away the usual overhead of an exception throw in this case.

  2. Mike Bayer repo owner

    relying upon __next__ with StopIteration is unfortunately 20% slower in the general case for both python 2.7 and 3.6:

    from __future__ import print_function
    
    import timeit
    
    
    # short result set since we want to see
    # about when the iteration stops
    collection = list(range(3))
    len_collection = len(collection)
    
    
    class Result(object):
        def __init__(self):
            self.index = -1
    
        def get_next(self):
            self.index += 1
            if self.index >= len_collection:
                return None
            else:
                return collection[self.index]
    
    
    class Go(object):
        def __init__(self):
            self.result = Result()
    
        def get_next(self):
            return self.result.get_next()
    
    
    class Go1(Go):
        def __iter__(self):
            while True:
                elem = self.get_next()
                if elem is None:
                    return
                else:
                    yield elem
    
    
    class Go2(Go):
        def __iter__(self):
            return self
    
        def __next__(self):
            elem = self.get_next()
            if elem is None:
                raise StopIteration()
            else:
                return elem
    
        next = __next__
    
    
    def run(result):
        list(result)
    
    print(timeit.timeit("run(Go1())", "from __main__ import run, Go1"))
    print(timeit.timeit("run(Go2())", "from __main__ import run, Go2"))
    
    [classic@photon2 sqlalchemy]$ /opt/python3.6/bin/python3 test.py 
    5.241073623998091
    5.97535327001242
    [classic@photon2 sqlalchemy]$ /opt/python2.7/bin/python test.py 
    5.01953792572
    5.92616796494
    

    which implies we'd need to build __next__ totally separately from __iter__. not a big deal but uglier.

  3. Philip Martin reporter

    Mike, again, thank you for your responsiveness and your time. A 20% performance decrease is a lot for the general case even if StopIteration is considered more Pythonic.

    For reference in the standard library, the next implementation on DictReader does not rely on StopIteration.. It seems like this is a pretty similar circumstance regarding the performance needs of both ResultProxy and DictReader.

  4. Philip Martin reporter

    Semantically speaking though it probably would be more Pythonic to have ResultProxy.fetchone() wrap next versus the other way around and ResultProxy.first() call next before closing the transaction. I think this makes more sense in that someone would expect the iter and next methods to be the direct method for iterating over a record stream.

  5. Mike Bayer repo owner

    For reference in the standard library, the next implementation on DictReader does not rely on StopIteration.. It seems like this is a pretty similar circumstance regarding the performance needs of both ResultProxy and DictReader.

    you have to raise StopIteration, or in the example you posted it is likely relying on the next() called within to raise StopIteration. That's the behavioral contract of next() / __next__().

    ResultProxy.fetchone() wrap next versus the other way around and ResultProxy.first() call next before closing the transaction.

    fetchone() is the classic interface for the cursor so that is the "low level" method here. Iterators are slow, not just the StopIteration but also the "yield" approach in __iter__() produces overhead as well. There may also be confusion or bugs created for third party libraries, subclasses, etc. in reversing these methods (e.g. a class that tries to override fetchone() would now fail).

  6. Mike Bayer repo owner

    Add next(), next() to ResultProxy

    Added __next__() and next() methods to :class:.ResultProxy, so that the next() builtin function works on the object directly. :class:.ResultProxy has long had an __iter__() method which already allows it to respond to the iter() builtin. The implementation for __iter__() is unchanged, as performance testing has indicated that iteration using a __next__() method with StopIteration is about 20% slower in both Python 2.7 and 3.6.

    Change-Id: I70569a4c48ad85a3c21a7ad422f270a559926cfb Fixes: #4077

    → <<cset 339416b821ed>>

  7. Mike Bayer repo owner

    if you take a look at all four of fetchone(), first(), fetchmany(), and fetchall() on ResultProxy, you can see that it would be pretty jarring to arbitrarily have just fetchone() and first() use the iterator protocol directly while not fetchmany() and fetchall(), and you can also see the implementation of first() is not a complete superset of fetchone() in any case.

  8. Log in to comment