- attached stress.py
[PATCH] Optimize SA speed (up to 2x for large raw SQL queries)
(original reporter: ged) Today I played a bit to optimize SA queries and the results are quite encouraging. I've only looked at optimizing SQL queries so far (but it also pays off in the ORM).
The patch is not complete so far, as it breaks some functionality which I didn't have time to rewrite: PickledResultProxy, BufferedColumnResultProxy, and possibly other rarely used stuff (the whole Elixir test suite passes though).
See attached ugly test script I used to time this stuff (sorry, Elixir based, I was so excited I wanted to test this quickly) along with timing on my computer.
The speedup is more important for larger queries (especially when using more columns) and is more important for columns which don't need processing (String vs Unicode). The speedup is much less impressive through the ORM, but it still achieves 10% boost pretty easily, and much more on larger queries involving a lot of columns which don't need to be processed... Yeah that's probably a rare usecase, but well... it doesn't hurt.
PS: Sorry, couldn't resist the catchy summary ;-)
Comments (10)
-
Account Deleted -
Account Deleted - attached query_speedup.patch
(original author: ged) Patch against trunk (r6430)
-
repo owner we'll get it to work, but its true the
PickledResultProxy
andBufferedColumnResultProxy
have to change. There are advantages to the abstracted approach of "populatedict", even if its a speed hump here. -
Account Deleted - assigned issue to
(original author: ged) What are those advantages? Yes it seems "cleaner" that way, but honestly, I don't see any concrete advantage: it's not easier to understand, it doesn't seem more extensible, and it's slower.
-
repo owner ive got most of the patch working here. Im going to say "easier to understand" is a complete 50/50 split here - one thing I have to work around here is that
PickledResultProxy
andBufferedColumnResultProxy
require a dictionary of all possible keys to integer indexes. In the old version, the three-tuplerec
had this data. So I've figured a way for those two objects to "reverse-engineer" that information based on the existingcolfuncs
dict but, its quite awkward:funcs_to_index = dict((c, i) for i, c in resultproxy._colfuncs.iteritems() if isinstance(i, int)) self._props = dict( (k, funcs_to_index[c](c)) for k, c in resultproxy._colfuncs.iteritems() if isinstance(k, (basestring, int)) and c in funcs_to_index )
so I am still looking to a way to get
ResultProxy
to have its nice fast dict ofcolfuncs
but also still have the integer index. Which might mean putting tuples back incolfuncs
. I don't know what the speed overhead would be. I've yet to do any speed tests on the approach overall; of note is that the zoomark tests do not report any notable method call savings (the tests just pass, meaning method call counts have not moved more than 5%) - so I'm curious how this could be "2x" faster. Do you feel like showing me some of your simple benchmarks so I can get an understanding of what aspects here provide the speed improvement ? -
repo owner oh one is attached, duh.
-
repo owner K great, it saves 2x the time and method calls for a large result. good that I have a baseline. not sure why zoomark didn't show this.
-
repo owner OK this is the test I'm using:
from sqlalchemy import * import timeit engine = create_engine('sqlite://') m = MetaData(engine) NUM_FIELDS = 100 NUM_RECORDS = 1000 t = Table('table', m, *[% fnum, String) for fnum in range(NUM_FIELDS)](Column("field%d")) m.create_all() t.insert().execute( [% fnum, u"value%d" % fnum) for fnum in range(NUM_FIELDS)) for r_num in range(NUM_RECORDS)](dict(("field%d") ) print "insert complete" print timeit.repeat("[for row in t.select().execute().fetchall()](tuple(row))", "from __main__ import t", number=10, repeat=5)
here's unmodified trunk:
[2.8764309883117676, 2.8733901977539062, 2.8830940723419189, 2.8701350688934326](2.9099102020263672,)
here's the original patch:
[1.4064900875091553, 1.4115869998931885, 1.3944659233093262, 1.3996870517730713](1.4123070240020752,)
and here's a change that turns the
colfuncs
dict to store tuples, somewhat like it was before but still with the speedups:rec = (colfunc, i) colfuncs[i](i) = rec
the extra overhead here is that
RowProxy
now has to say:return self.__colfuncs[key](key)[0](0)(self.__row)
but the change makes the job of alternate result sets like
PickledResultProxy
much easier if we maintain those integer indexes.the test then returns:
[1.3983778953552246, 1.3870720863342285, 1.3840348720550537, 1.4013819694519043](1.4030461311340332,)
which is basically identical. So I'm going to go forward trying to get that approach to work with the two alternate result sets that need it.
Similarly, the number of method calls when running through profile for both versions of the patch is identical (106568 on py2.6.2).
-
repo owner - changed status to resolved
OK its in 52b1ace6768a7f00b411d5f8b86ad8d1f84666b8. the only change was that the colfuncs dict still stores a 3-tuple, the second two values of which are only used by the special purpose result sets for pickling/converting the dict to other formats. the call count overhead is entirely maintained vs. the original patch and a test is added. thanks very much for all the effort on this !
-
repo owner - removed milestone
Removing milestone: 0.6.0 (automated comment)
- Log in to comment
(original author: ged) Timing test