Implementing next on ResultProxy

Issue #4077 resolved

Philip Martin created an issue 2017-09-14

Since ResultProxy references a DB API cursor is there any reason why the next protocol is not implemented on ResultProxy? Given it can iterator over query results in a loop, it seems to me that it would make sense from a Python language standpoint if it implemented next. I am not aware of how other DB API cursors operate but using psycopg2 one could:

import psycopg2
select_statement = 'select * from generate_series(1, 5)'
conn = psycopg2.connect()
cursor = conn.cursor()
cursor.execute(select_statement)
next(cursor)

Whereas in Sqlalchemy, calling next on the result issues a TypeError:

from sqlalchemy import create_engine
engine = create_engine()
conn = engine.connect()
results = conn.execute(select_statement)
next(results)

Comments (9)

Philip Martin reporter
- edited description
- 2017-09-14T22:17:17+00:00

Mike Bayer repo owner

changed milestone to 1.2
marked as enhancement

OK, per pep249 they are doing it, and also having __iter__() return self, so for us that looks like:

diff --git a/lib/sqlalchemy/engine/result.py b/lib/sqlalchemy/engine/result.py
index 3aae932f2..f7a2fdcfa 100644
--- a/lib/sqlalchemy/engine/result.py
+++ b/lib/sqlalchemy/engine/result.py
@@ -862,12 +862,15 @@ class ResultProxy(object):
             self.closed = True

     def __iter__(self):
-        while True:
-            row = self.fetchone()
-            if row is None:
-                return
-            else:
-                yield row
+        return self
+
+    def __next__(self):
+        row = self.fetchone()
+        if row is None:
+            raise StopIteration()
+        return row
+
+    next = __next__

     @util.memoized_property
     def inserted_primary_key(self):

I'm a little nervous about StopIteration being slow so I want to run a performance test on that, to see if Python's iteration internals take away the usual overhead of an exception throw in this case.

2017-09-15T14:46:32+00:00

Mike Bayer repo owner

relying upon __next__ with StopIteration is unfortunately 20% slower in the general case for both python 2.7 and 3.6:

from __future__ import print_function

import timeit


# short result set since we want to see
# about when the iteration stops
collection = list(range(3))
len_collection = len(collection)


class Result(object):
    def __init__(self):
        self.index = -1

    def get_next(self):
        self.index += 1
        if self.index >= len_collection:
            return None
        else:
            return collection[self.index]


class Go(object):
    def __init__(self):
        self.result = Result()

    def get_next(self):
        return self.result.get_next()


class Go1(Go):
    def __iter__(self):
        while True:
            elem = self.get_next()
            if elem is None:
                return
            else:
                yield elem


class Go2(Go):
    def __iter__(self):
        return self

    def __next__(self):
        elem = self.get_next()
        if elem is None:
            raise StopIteration()
        else:
            return elem

    next = __next__


def run(result):
    list(result)

print(timeit.timeit("run(Go1())", "from __main__ import run, Go1"))
print(timeit.timeit("run(Go2())", "from __main__ import run, Go2"))

#!

[classic@photon2 sqlalchemy]$ /opt/python3.6/bin/python3 test.py 
5.241073623998091
5.97535327001242
[classic@photon2 sqlalchemy]$ /opt/python2.7/bin/python test.py 
5.01953792572
5.92616796494

which implies we'd need to build __next__ totally separately from __iter__. not a big deal but uglier.

2017-09-15T15:03:15+00:00

Mike Bayer repo owner
https://gerrit.sqlalchemy.org/518
- 2017-09-15T15:15:38+00:00
Philip Martin reporter
Mike, again, thank you for your responsiveness and your time. A 20% performance decrease is a lot for the general case even if StopIteration is considered more Pythonic.

For reference in the standard library, the next implementation on DictReader does not rely on StopIteration.. It seems like this is a pretty similar circumstance regarding the performance needs of both ResultProxy and DictReader.
- 2017-09-15T21:53:50+00:00
Philip Martin reporter
Semantically speaking though it probably would be more Pythonic to have ResultProxy.fetchone() wrap next versus the other way around and ResultProxy.first() call next before closing the transaction. I think this makes more sense in that someone would expect the iter and next methods to be the direct method for iterating over a record stream.
- 2017-09-15T22:14:28+00:00
Mike Bayer repo owner
For reference in the standard library, the next implementation on DictReader does not rely on StopIteration.. It seems like this is a pretty similar circumstance regarding the performance needs of both ResultProxy and DictReader.

you have to raise StopIteration, or in the example you posted it is likely relying on the next() called within to raise StopIteration. That's the behavioral contract of next() / __next__().

ResultProxy.fetchone() wrap next versus the other way around and ResultProxy.first() call next before closing the transaction.

fetchone() is the classic interface for the cursor so that is the "low level" method here. Iterators are slow, not just the StopIteration but also the "yield" approach in __iter__() produces overhead as well. There may also be confusion or bugs created for third party libraries, subclasses, etc. in reversing these methods (e.g. a class that tries to override fetchone() would now fail).
- 2017-09-15T23:11:07+00:00
Mike Bayer repo owner
- changed status to resolved
Add next(), next() to ResultProxy

Added __next__() and next() methods to :class:.ResultProxy, so that the next() builtin function works on the object directly. :class:.ResultProxy has long had an __iter__() method which already allows it to respond to the iter() builtin. The implementation for __iter__() is unchanged, as performance testing has indicated that iteration using a __next__() method with StopIteration is about 20% slower in both Python 2.7 and 3.6.

Change-Id: I70569a4c48ad85a3c21a7ad422f270a559926cfb Fixes: ~~#4077~~

→ <<cset 339416b821ed>>
- 2017-09-18T13:50:13+00:00
Mike Bayer repo owner
if you take a look at all four of fetchone(), first(), fetchmany(), and fetchall() on ResultProxy, you can see that it would be pretty jarring to arbitrarily have just fetchone() and first() use the iterator protocol directly while not fetchmany() and fetchall(), and you can also see the implementation of first() is not a complete superset of fetchone() in any case.
- 2017-09-18T13:51:09+00:00
Log in to comment

Assignee: –

Type: enhancement

Priority: major

Status: resolved

Component: engine

Milestone: 1.2

Votes: 0

Watchers: 1