[PATCH] Optimize SA speed (up to 2x for large raw SQL queries)

Issue #1586 resolved
Former user created an issue

(original reporter: ged) Today I played a bit to optimize SA queries and the results are quite encouraging. I've only looked at optimizing SQL queries so far (but it also pays off in the ORM).

The patch is not complete so far, as it breaks some functionality which I didn't have time to rewrite: PickledResultProxy, BufferedColumnResultProxy, and possibly other rarely used stuff (the whole Elixir test suite passes though).

See attached ugly test script I used to time this stuff (sorry, Elixir based, I was so excited I wanted to test this quickly) along with timing on my computer.

The speedup is more important for larger queries (especially when using more columns) and is more important for columns which don't need processing (String vs Unicode). The speedup is much less impressive through the ORM, but it still achieves 10% boost pretty easily, and much more on larger queries involving a lot of columns which don't need to be processed... Yeah that's probably a rare usecase, but well... it doesn't hurt.

PS: Sorry, couldn't resist the catchy summary ;-)

Comments (10)

  1. Mike Bayer repo owner
    • changed component to engine
    • changed milestone to 0.6.0

    we'll get it to work, but its true the PickledResultProxy and BufferedColumnResultProxy have to change. There are advantages to the abstracted approach of "populatedict", even if its a speed hump here.

  2. Former user Account Deleted
    • assigned issue to

    (original author: ged) What are those advantages? Yes it seems "cleaner" that way, but honestly, I don't see any concrete advantage: it's not easier to understand, it doesn't seem more extensible, and it's slower.

  3. Mike Bayer repo owner

    ive got most of the patch working here. Im going to say "easier to understand" is a complete 50/50 split here - one thing I have to work around here is that PickledResultProxy and BufferedColumnResultProxy require a dictionary of all possible keys to integer indexes. In the old version, the three-tuple rec had this data. So I've figured a way for those two objects to "reverse-engineer" that information based on the existing colfuncs dict but, its quite awkward:

            funcs_to_index = dict((c, i) for i, c in resultproxy._colfuncs.iteritems() if isinstance(i, int))
            self._props = dict(
                (k, funcs_to_index[c](c)) for k, c in resultproxy._colfuncs.iteritems()
                    if isinstance(k, (basestring, int)) and c in funcs_to_index
            )
    

    so I am still looking to a way to get ResultProxy to have its nice fast dict of colfuncs but also still have the integer index. Which might mean putting tuples back in colfuncs. I don't know what the speed overhead would be. I've yet to do any speed tests on the approach overall; of note is that the zoomark tests do not report any notable method call savings (the tests just pass, meaning method call counts have not moved more than 5%) - so I'm curious how this could be "2x" faster. Do you feel like showing me some of your simple benchmarks so I can get an understanding of what aspects here provide the speed improvement ?

  4. Mike Bayer repo owner

    K great, it saves 2x the time and method calls for a large result. good that I have a baseline. not sure why zoomark didn't show this.

  5. Mike Bayer repo owner

    OK this is the test I'm using:

    from sqlalchemy import *
    import timeit
    
    engine = create_engine('sqlite://')
    
    m = MetaData(engine)
    
    NUM_FIELDS = 100
    NUM_RECORDS = 1000
    
    t = Table('table', m, *[% fnum, String) for fnum in range(NUM_FIELDS)](Column("field%d"))
    m.create_all()
    
    t.insert().execute(
        [% fnum, u"value%d" % fnum) for fnum in range(NUM_FIELDS)) for r_num in range(NUM_RECORDS)](dict(("field%d")
    )
    print "insert complete"
    
    print timeit.repeat("[for row in t.select().execute().fetchall()](tuple(row))",
                        "from __main__ import t",
                        number=10, repeat=5)
    

    here's unmodified trunk:

    [2.8764309883117676, 2.8733901977539062, 2.8830940723419189, 2.8701350688934326](2.9099102020263672,)
    

    here's the original patch:

    [1.4064900875091553, 1.4115869998931885, 1.3944659233093262, 1.3996870517730713](1.4123070240020752,)
    

    and here's a change that turns the colfuncs dict to store tuples, somewhat like it was before but still with the speedups:

    rec = (colfunc, i)
    colfuncs[i](i) = rec
    

    the extra overhead here is that RowProxy now has to say:

    return self.__colfuncs[key](key)[0](0)(self.__row)
    

    but the change makes the job of alternate result sets like PickledResultProxy much easier if we maintain those integer indexes.

    the test then returns:

    [1.3983778953552246, 1.3870720863342285, 1.3840348720550537, 1.4013819694519043](1.4030461311340332,)
    

    which is basically identical. So I'm going to go forward trying to get that approach to work with the two alternate result sets that need it.

    Similarly, the number of method calls when running through profile for both versions of the patch is identical (106568 on py2.6.2).

  6. Mike Bayer repo owner

    OK its in 52b1ace6768a7f00b411d5f8b86ad8d1f84666b8. the only change was that the colfuncs dict still stores a 3-tuple, the second two values of which are only used by the special purpose result sets for pickling/converting the dict to other formats. the call count overhead is entirely maintained vs. the original patch and a test is added. thanks very much for all the effort on this !

  7. Log in to comment