Patch: Ordering of ShardedQuery results

Issue #1043 resolved
Former user created an issue

This patch series adds the capability to perform an ordered merge of results from multiple shards in a Sharded``Query. The ordering relation can be specified either with a user-supplied comparator function applied to the result entities, or with SQLAlchemy expressions similar to order_by().

The series also contains a patch for using iterators to return results in the non-ordered case as well.

The current version of the patch series can be found at http://raidi.us/sqlalchemy/ .

Comments (7)

  1. Michael Trier

    Would it be possible to clarify this a bit so it's clear which patch you're interested in getting applied and what it will do for SQLAlchemy?

  2. Former user Account Deleted

    Due to my having changed employers in the meantime, I haven't had time to keep this patchset up-to-date. Between that and the delay caused by my bad timing WRT the release schedule, it almost certainly does not apply against the current trunk right now. If you are interested in adding this feature, I will try to get it cleaned up and re-based, I wouldn't even bother trying with the current versions of this patchset assuming Query/Sharded``Query changed at all.

    The added feature is thus: when order_by() is used in a multiple-shard query, each of the databases involved return ordered results, but the default behaviour of simply concatenating them produces unordered output. This change adds a new generative method to Sharded``Query, tentatively named _merge_by(). When it is invoked with the same expressions as are supplied to order_by(), it causes the result sets to be merged according to the specified ordering, instead of concatenating them.

    There are several other changes in this patchset that may be obsolete on current trunk, such as fixing several places where full lists of results were being realized, instead of using the generator interfaces. With these changes, plus the virtue of using a merge and not a sort, A) the feature can be implemented as a generator, and B) it works in linear time. As a generator it theoretically uses more-or-less constant space with databases that can "stream" their result sets, although PG was the only DBMS that had a DBAPI that supported this at the time these patches were written.

    The most relevant thread discussing this change was here: http://groups.google.com/group/sqlalchemy/browse_thread/thread/c9e84c87dd1a9fd2/d54f492498b5432c

    -Kyle

  3. Mike Bayer repo owner

    still not seeing this as something we're prepared to support here- I'd prefer for now if users of the "sharded" extension could continue to extend it as they see fit. I'd love to see an external project on github that provides more comprehensive "sharding" behavior. As it stands, the biggest advantage to the sharded extension right now is that it's very short and simple, so that users can understand it completely and extend it as needed.

  4. Log in to comment