Patch adding support for multiple result sets in ResultProxy

Issue #1635 new
Former user created an issue

This patch adds support for multiple result sets in the ResultProxy. It basically just adds a "nextset" method and replaces all the calls to close in the fetch* methods with a call to nextset, which closes the proxy if there are no more results available or no nextset method.

Comments (26)

  1. Mike Bayer repo owner

    the ResultProxy can't work without parsing cursor.description and having an accurate sense of what columns are present in each row. What happens to cursor.description when nextset() is called ? What happens to ResultRow.__getitem__() when called with a non-integer value ?

    also if you send an email address I can add it to the CC.

  2. Former user Account Deleted

    Whoops, yes, that slipped my mind. I'm guessing just a call to self._init_metadata() after the self.cursor.nextset() is called?

    Email is damoxc@gmail.com

  3. Former user Account Deleted

    After adding the call to ResultProxy._init_metadata() accessing a columns value via RowProxy.__getitem__() works fine when using a string key, also ResultProxy.keys contain the correct column names.

    Is there anything else I should be checking??

  4. Mike Bayer repo owner

    I'll try to get out every point:

    • I think this should focus on 0.6 for now since we've made major changes to ResultProxy in 0.6, both in its construction and the degree to which it's used - due to RETURNING support the interaction of ResultProxy is more complicated now. Its possible for an 0.5 backport but 0.6 is really where the major new features are going.

    • The hasattr(cursor, 'nextset') is needlessly expensive and isn't consistent with the usual method of feature discovery. Dialects that support nextset() should supply a class-level supports_nextset flag, and that's what would be checked here.

    • The additional overhead of calling nextset() should be minimized as much as possible. Need to see if call counts go over the mark in test/aaa_profiling/test_zoomark and test_zoomark_orm.

    • Does MySQL-python support nextset() when plain execute() is called ? or is callproc() required ? Hopefully not since SQLA doesn't use callproc() as of yet.

    • Lots of tests are needed. I'm not sure which DBAPIs support nextset() but assuming its MySQL and MSSQL, tests need to be present which illustrate multiple result sets being fetched for both of those backends. The result sets should have radically different column sets so that the cursor.description re-init is well exercised. This is enough of a feature to support a new module test/sql/test_nextset.py , if its possible to build a test that is generic between multiple backends (not sure if the MySQL case requires building stored procedures to work).

    • of course, the full suite of unit tests needs to pass 100% for MySQL, MSSQL and other nextset()-supporting dialects (and everything else too). Note that the current MSSQL-pyodbc dialect is actually calling cursor.nextset() on an INSERT statement in some cases, as it appends "; select scope_identity()" in some cases to work around a known pyodbc quirk regarding access to the scope_identity() function - so this case must be covered when testing.

  5. Former user Account Deleted

    Replying to zzzeek:

    I'll try to get out every point:

    I think this should focus on 0.6 for now since we've made major changes to ResultProxy in 0.6, both in its construction and the degree to which it's used - due to RETURNING support the interaction of ResultProxy is more complicated now. Its possible for an 0.5 backport but 0.6 is really where the major new features are going.

    Okay, sounds sensible, I'll install 0.6 in which case for testing.

    [BR]

    The hasattr(cursor, 'nextset') is needlessly expensive and isn't consistent with the usual method of feature discovery. Dialects that support nextset() should supply a class-level supports_nextset flag, and that's what would be checked here.

    Agreed.

    [BR]

    The additional overhead of calling nextset() should be minimized as much as possible. Need to see if call counts go over the mark in test/aaa_profiling/test_zoomark and test_zoomark_orm.

    Limiting this to only calling it on dialects that support it would definitely be an improvement.

    [BR]

    Does MySQL-python support nextset() when plain execute() is called ? or is callproc() required ? Hopefully not since SQLA doesn't use callproc() as of yet.

    Lots of tests are needed. I'm not sure which DBAPIs support nextset() but assuming its MySQL and MSSQL, tests need to be present which illustrate multiple result sets being fetched for both of those backends. The result sets should have radically different column sets so that the cursor.description re-init is well exercised. This is enough of a feature to support a new module test/sql/test_nextset.py , if its possible to build a test that is generic between multiple backends (not sure if the MySQL case requires building stored procedures to work).

    Just did a simple test case of:

    >>> cursor.execute('SELECT 1, 2, 3; SELECT 4, 5, 6')
    1L
    >>> cursor.fetchone()
    (1L, 2L, 3L)
    >>> cursor.nextset()
    1
    >>> cursor.fetchone()
    (4L, 5L, 6L)
    

    Although only a basic test seems to show that sprocs are not needed.

    [BR]

    of course, the full suite of unit tests needs to pass 100% for MySQL, MSSQL and other nextset()-supporting dialects (and everything else too). Note that the current MSSQL-pyodbc dialect is actually calling cursor.nextset() on an INSERT statement in some cases, as it appends "; select scope_identity()" in some cases to work around a known pyodbc quirk regarding access to the scope_identity() function - so this case must be covered when testing.

    I'll go through and build a more extensive patch against svn trunk and implement the tests. I'll attach it when I'm done!

  6. Former user Account Deleted

    (original author: ged) Replying to zzzeek:

    • The additional overhead of calling nextset() should be minimized as much as possible. Need to see if call counts go over the mark in test/aaa_profiling/test_zoomark and test_zoomark_orm.

    I'm pretty sure this would not cause any significant overhead (both time and call count) for dialects which don't support it (after the supports_nextset change is done). The overhead of one call shouldn't be noticeable at that point. However, I'm a bit more concerned about the time overhead of the actual call to cursor.nextset() for dialects which support it, when the user does not use the functionality. My concern comes from the fact that I have no idea how fast it is. I suppose it's a near instantaneous when there is not more data, but that would need to be tested on all dialects which support it, and if this is not the case, I'd strongly -1 on the current implementation and would favor some kind of optional code path instead.

  7. Former user Account Deleted

    (original author: ged) Also I think the current patch is buggy regarding exception handling: it would happen twice: once in nextset, then exception is reraised, and recatched in fetch* and rehandled there, then reraised...

  8. Former user Account Deleted

    Replying to ged:

    Also I think the current patch is buggy regarding exception handling: it would happen twice: once in nextset, then exception is reraised, and recatched in fetch* and rehandled there, then reraised...

    Should there just not be exception handling in nextset(), or perhaps change it to nextset(raise_exc=True)?

  9. Former user Account Deleted

    (original author: ged) Replying to guest:

    Replying to ged:

    Also I think the current patch is buggy regarding exception handling: it would happen twice: once in nextset, then exception is reraised, and recatched in fetch* and rehandled there, then reraised...

    Should there just not be exception handling in nextset(), or perhaps change it to nextset(raise_exc=True)?

    The later seem fine to me, but wait for zzzeek's take on this as my opinion is purely informative.

    One more point: I am personally not sure calling nextset automatically is a good idea after all: it sure looks convenient in the common case, but it is both a bit surprising (normally calling fetchall several times on a cursor returns the same data) and prevents to fetch the results of the same set several times (unless the user manually copies the result of each set, which seem to be a sign of bad design to me). I think that leaving the user call nextset manually would be better. We of course need a way to tell the proxy to not close the cursor after the first fetchall, but I'll leave that problem to you guys :)

  10. Mike Bayer repo owner

    the problem with not calling nextset() is that the resultproxy autocloses when all results are fetched. Doing away with autoclose would be a huge inconvenience to a tremendous amount of code.

  11. Former user Account Deleted

    (original author: ged) Replying to zzzeek:

    the problem with not calling nextset() is that the resultproxy autocloses when all results are fetched. Doing away with autoclose would be a huge inconvenience to a tremendous amount of code.

    I know, hence my last sentence... IMO, the best option is to detect that there are several resultsets (eg by looking for ';' in the query) and not close the cursor in that case but that might be tricky to implement and/or not 100% reliable. The only other option I see is to use a monkeypatch strategy:

    res = xxx.execute()
    res.autoclose = False
    set1 = res.fetchall()
    res.nextset()
    set2 = res.fetchall()
    res.close()
    

    If you want to implement an "automatic nextset" behavior in addition to the above, that's fine with me, but the manual nextset behavior needs to be in, IMO. Again, a monkeypatch would do nicely:

    res = xxx.execute()
    res.automatic_multiset = True 
    # or yet another option:
    #res.autoclose = "nextset" 
    set1 = res.fetchall()
    set2 = res.fetchall()
    
  12. Mike Bayer repo owner

    Replying to ged:

    I know, hence my last sentence... IMO, the best option is to detect that there are several resultsets (eg by looking for ';' in the query) and not close the cursor in that case but that might be tricky to implement and/or not 100% reliable.

    oh, no that's not reliable nor very performant. The usual case for multiple return sets is a stored procedure call.

    I think we may have to go with execution options here, similarly to the way we are looking into doing "stream_results" in #1619. it would be an option for the text() and select() constructs. i think I should take a look at the various options we're considering and ensuring theres a consistent system for all of them (including autocommit, which should move to select().options() too).

    Additionally I'd even say that, as a self documenting measure, calling nextset() on the ResultProxy would raise an error saying that "enable_nextset" wasn't enabled for the execution. prefer that to silent failure.

  13. Mike Bayer repo owner

    So we now have execution_options() ready to go. This patch can be resubmitted, using multiple_results=True as the statement option to indicate that autoclose should be disabled and replaced with nextset() (or that autoclose just shuts off and nextset() must be called, that might be better to start). the patch also requires that cursor.description be re-initialized within _init_metadata(). and, we really would need to make RowProxy and friends aware of this, as each row maintains a reference to its parent. with _init_metadata() being called repeatedly, we likely need to rearrange the internal structure such that the "parent" of a row remains referencing an object that corresponds to its result set. so a lot of work here.

  14. Former user Account Deleted

    (original author: ged) About the last part (RowProxy changes & related), I would like to review any change before it gets committed, as to not have something which is incompatible with a decent C implementation, and if possible, which is consistent with my current C implementation.

  15. Mike Bayer repo owner

    Replying to ged:

    About the last part (RowProxy changes & related), I would like to review any change before it gets committed, as to not have something which is incompatible with a decent C implementation, and if possible, which is consistent with my current C implementation.

    I've committed in c3702fa5169fef29ca13427abea2f6d41449c658 the basic idea of detaching the metadata from the ResultProxy, and while I was going to present as a patch, halfway through it became apparent that this is a much better way to go overall as we lose the redundancy of PickledResultProxy and we also cleanly detach RowProxy objects from the parent. This should make the C implementation easier since ResultProxyMetadata is where the intensive stuff occurs - you should be able to leave ResultProxy alone.

  16. Mike Bayer repo owner

    Also, it removes the close() method from RowProxy. I'm trying to get 0.6beta1 out the door so I wanted this change to be present.

  17. Former user Account Deleted

    guys, is there any way how to get several row sets until this issue will be fixed ?

  18. Mike Bayer repo owner

    pull out the DBAPI cursor from the result proxy and work with it directly:

    result = connection.execute("whataver")
    cursor = result.cursor
    
    for row in cursor.fetchall():
       # etc.
    cursor.nextset()
    for row in cursor.fetchall():
       # etc.
    
  19. Mike Bayer repo owner

    extra notes on this feature from https://github.com/zzzeek/sqlalchemy/pull/71:

    with all of that, the feature is still fundamentally broken until we add support to the compiler system for multiple result sets. the ability to associate a series of columns/types with result columns (even with string SQL, see http://docs.sqlalchemy.org/en/rel_0_9/core/sqlelement.html#sqlalchemy.sql.expression.TextClause.columns) is critical, and particularly we have users running stored procedures and stuff using this feature. some new kind of API has to be devised for how this might work. I think perhaps we would assume that the linkage between compiled SQL elements and multiple result sets would still involve a single SQL element that then returns multiple results (like a stored proc, or custom SQL element). So we might need a generic method on any Selectable called "additional_columns()" or something like that. This part is outside of your realm so I'd have to look into it. But adding this feature means I have to jump on this part of it b.c. it will be asked for very quickly.

  20. Log in to comment