Performance degradation on version 1.0.xx

Issue #3915 resolved
Xiaojun Lin created an issue

We are developing with openstack. Lately we’ve upgraded openstack from Juno to Mitaka, with sqlalchemy from 0.8.6 to 1.0.11. Some performance degradation of database queries was observed. Especially when the query involves join of 10 or more tables.

We did some tests by executing join queries of 10 tables using sqlalchemy 0.8.6 and 1.0.11 and compared the average time spent in a single query.

It comes out that for the same query, sqlalchemy 1.0.11 takes 113ms on average while 0.8.6 takes 86ms. With some operations take hundreds of such queries, the time becomes noticeable to users.

The debugging logs show that 1.0.11 version does something more, for example:

DEBUG:sqlalchemy.orm.path_registry:set 'memoized_setups' on path 'EntityRegistry((<Mapper at 0x7f16b6cd8f10; VM>,))' to '{}' DEBUG:sqlalchemy.orm.path_registry:set 'eager_row_processor' on path 'Mapper|VM|t_vms -> VM.users1' to '<sqlalchemy.orm.util.ORMAdapter object at 0x7f16b1c08b90>' DEBUG:sqlalchemy.orm.path_registry:set 'memoized_setups' on path 'EntityRegistry((<Mapper at 0x7f16b6cd8f10; VM>, <RelationshipProperty at 0x7f16b6cd5cf8; users1>, <Mapper at 0x7f16b6cd8f10; VM>))' to '{}' DEBUG:sqlalchemy.orm.path_registry:set 'eager_row_processor' on path 'Mapper|VM|t_vms -> VM.users2' to '<sqlalchemy.orm.util.ORMAdapter object at 0x7f16b1bb1c50>' DEBUG:sqlalchemy.orm.path_registry:set 'memoized_setups' on path 'EntityRegistry((<Mapper at 0x7f16b6cd8f10; VM>, <RelationshipProperty at 0x7f16b6cd5d70; users2>, <Mapper at 0x7f16b6cd8f10; VM>))' to '{}' DEBUG:sqlalchemy.orm.path_registry:set 'eager_row_processor' on path 'Mapper|VM|t_vms -> VM.users3' to '<sqlalchemy.orm.util.ORMAdapter object at 0x7f16b1bcd990>' DEBUG:sqlalchemy.orm.path_registry:set 'memoized_setups' on path 'EntityRegistry((<Mapper at 0x7f16b6cd8f10; VM>, <RelationshipProperty at 0x7f16b6cd5de8; users3>, <Mapper at 0x7f16b6cd8f10; VM>))' to '{}' DEBUG:sqlalchemy.orm.path_registry:set 'eager_row_processor' on path 'Mapper|VM|t_vms -> VM.users4' to '<sqlalchemy.orm.util.ORMAdapter object at 0x7f16b1b6b590>' DEBUG:sqlalchemy.orm.path_registry:set 'memoized_setups' on path 'EntityRegistry((<Mapper at 0x7f16b6cd8f10; VM>, <RelationshipProperty at 0x7f16b6cd5e60; users4>, <Mapper at 0x7f16b6cd8f10; VM>))' to '{}' DEBUG:sqlalchemy.orm.path_registry:set 'eager_row_processor' on path 'Mapper|VM|t_vms -> VM.users5' to '<sqlalchemy.orm.util.ORMAdapter object at 0x7f16b1b76f10>' DEBUG:sqlalchemy.orm.path_registry:set 'memoized_setups' on path 'EntityRegistry((<Mapper at 0x7f16b6cd8f10; VM>, <RelationshipProperty at 0x7f16b6cd5ed8; users5>, <Mapper at 0x7f16b6cd8f10; VM>))' to '{}' DEBUG:sqlalchemy.orm.path_registry:set 'eager_row_processor' on path 'Mapper|VM|t_vms -> VM.users6' to '<sqlalchemy.orm.util.ORMAdapter object at 0x7f16b1b15d50>' DEBUG:sqlalchemy.orm.path_registry:set 'memoized_setups' on path 'EntityRegistry((<Mapper at 0x7f16b6cd8f10; VM>, <RelationshipProperty at 0x7f16b6cd5f50; users6>, <Mapper at 0x7f16b6cd8f10; VM>))' to '{}' DEBUG:sqlalchemy.orm.path_registry:set 'eager_row_processor' on path 'Mapper|VM|t_vms -> VM.users7' to '<sqlalchemy.orm.util.ORMAdapter object at 0x7f16b1b36c10>' DEBUG:sqlalchemy.orm.path_registry:set 'memoized_setups' on path 'EntityRegistry((<Mapper at 0x7f16b6cd8f10; VM>, <RelationshipProperty at 0x7f16b6ce4050; users7>, <Mapper at 0x7f16b6cd8f10; VM>))' to '{}' DEBUG:sqlalchemy.orm.path_registry:set 'eager_row_processor' on path 'Mapper|VM|t_vms -> VM.users8' to '<sqlalchemy.orm.util.ORMAdapter object at 0x7f16b1ad28d0>' DEBUG:sqlalchemy.orm.path_registry:set 'memoized_setups' on path 'EntityRegistry((<Mapper at 0x7f16b6cd8f10; VM>, <RelationshipProperty at 0x7f16b6ce40c8; users8>, <Mapper at 0x7f16b6cd8f10; VM>))' to '{}' DEBUG:sqlalchemy.orm.path_registry:set 'eager_row_processor' on path 'Mapper|VM|t_vms -> VM.users9' to '<sqlalchemy.orm.util.ORMAdapter object at 0x7f16b1b36f90>' DEBUG:sqlalchemy.orm.path_registry:set 'memoized_setups' on path 'EntityRegistry((<Mapper at 0x7f16b6cd8f10; VM>, <RelationshipProperty at 0x7f16b6ce4140; users9>, <Mapper at 0x7f16b6cd8f10; VM>))' to '{}' DEBUG:sqlalchemy.orm.path_registry:set 'eager_row_processor' on path 'Mapper|VM|t_vms -> VM.users10' to '<sqlalchemy.orm.util.ORMAdapter object at 0x7f16b1b76650>' DEBUG:sqlalchemy.orm.path_registry:set 'memoized_setups' on path 'EntityRegistry((<Mapper at 0x7f16b6cd8f10; VM>, <RelationshipProperty at 0x7f16b6ce41b8; users10>, <Mapper at 0x7f16b6cd8f10; VM>))' to '{}'

But I’ve no idea what it is doing, and I am not sure if it is the case.

Does anyone who also encounter the same problem? Or are there any suggestions on how to deal with the performance degradation?

Comments (23)

  1. Mike Bayer repo owner

    This looks like query setup stuff but you absolutely need to provide a succinct test script illustrating the queries in use. The "baked" query system also mitigates query setup overhead, as I develop openstack also here's some proposed code for Neutron that would reduce this overhead:. https://review.openstack.org/#/c/430973/

    Also note that Python logging is slow and not very informative (since how do you know 0.8 logs everything that 1.0 does?). You need to use cProfile to get meaningful profiling results, however when I receive your test code I can run that.

    So please attach an MCVE. can't do anything without specific examples (as the bug posting guidelines include). Thanks!

  2. Xiaojun Lin reporter

    Great! The code demo you provided is exactly what we need, since the actual problem we encountered is slowness of neutron port show interface. We also tried to adapt the "baked" way, but it involves too much code modifications of neutron, and our knowledge to sqlalchemy is not so sufficient. We'll try to adapt your code demo and see how the result comes. On the other hand, I think we should still look into the performance issue of sqlalchemy itself. I will provide the test script as soon as possible.

    Thank you again for the quick response!

  3. Mike Bayer repo owner

    OK, so this is the same thing I already heard about, as Neutron has this very heavy reliance on a single enormous joinedload query that they hammer on like mad, this is an area that I've not tried to optimize as normally, if one is doing a JOIN of ten tables they are already in the weeds.

    I can see the overhead is related to aliased columns so we'll start with this profile that hits a lot of the things I saw in the neutron profile, doesn't actually hit the DB just generates the SQL:

    from sqlalchemy import Table, MetaData, create_engine, Column, Integer, \
        ForeignKey
    from sqlalchemy.orm import Session, relationship, mapper, joinedload_all
    import cProfile
    import StringIO
    import pstats
    import contextlib
    
    
    def make_some_columns():
        return [
            Column('c%d' % i, Integer)
            for i in range(10)
        ]
    
    metadata = MetaData()
    a = Table(
        'a',
        metadata,
        Column('id', Integer,
               primary_key=True),
        *make_some_columns()
    )
    b = Table(
        'b',
        metadata,
        Column('id', Integer,
               primary_key=True),
        Column('a_id', ForeignKey('a.id')),
        *make_some_columns()
    )
    c = Table(
        'c',
        metadata,
        Column('id', Integer,
               primary_key=True),
        Column('b_id', ForeignKey('b.id')),
        *make_some_columns()
    )
    d = Table(
        'd',
        metadata,
        Column('id', Integer,
               primary_key=True),
        Column('c_id', ForeignKey('c.id')),
        *make_some_columns()
    )
    e = Table(
        'e',
        metadata,
        Column('id', Integer,
               primary_key=True),
        Column('a_id', ForeignKey('a.id')),
        *make_some_columns()
    )
    f = Table(
        'f',
        metadata,
        Column('id', Integer,
               primary_key=True),
        Column('e_id', ForeignKey('e.id')),
        *make_some_columns()
    )
    g = Table(
        'g',
        metadata,
        Column('id', Integer,
               primary_key=True),
        Column('e_id', ForeignKey('e.id')),
        *make_some_columns()
    )
    
    
    class Basic(object):
        def __init__(self, **kw):
            for k in kw:
                setattr(self, k, kw[k])
    
    
    class A(Basic):
        pass
    
    
    class B(Basic):
        pass
    
    
    class C(Basic):
        pass
    
    
    class D(Basic):
        pass
    
    
    class E(Basic):
        pass
    
    
    class F(Basic):
        pass
    
    
    class G(Basic):
        pass
    
    
    mapper(A, a, properties={
        'bs': relationship(B),
        'es': relationship(E)
    })
    mapper(B, b, properties={
        'cs': relationship(C)
    })
    mapper(C, c, properties={
        'ds': relationship(D)
    })
    mapper(D, d)
    mapper(E, e, properties={
        'fs': relationship(F),
        'gs': relationship(G)
    })
    mapper(F, f)
    mapper(G, g)
    
    e = create_engine("sqlite://")
    metadata.create_all(e)
    s = Session(e)
    s.add(
        A(
            bs=[B(cs=[C(ds=[D()])]), B(cs=[C()])],
            es=[E(fs=[F()], gs=[G()])]
        )
    )
    s.commit()
    
    
    @contextlib.contextmanager
    def profiled():
        pr = cProfile.Profile()
        pr.enable()
        yield
        pr.disable()
        s = StringIO.StringIO()
        ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
        ps.print_stats()
        # uncomment this to see who's calling what
        # ps.print_callers()
        print(s.getvalue())
    
    sess = Session(e)
    
    
    with profiled():
        for i in range(100):
            q = sess.query(A).options(
                joinedload_all(A.bs, B.cs, C.ds),
                joinedload_all(A.es, E.fs),
                joinedload_all(A.es, E.gs),
            )
            q._compile_context()
    

    The internals of aliasing and eager loading have gained lots of fortification in their architectures to be as consistent as possible across an enormous range of scenarios. In this architecture we have various "getter" objects that tend to be expensive as they aren't doing enough caching and are not cached themselves in the eager load case because they are generated each time.

    We see performance degradation as follows:

    #!
    
    # 1.1: 1381820 function calls (1370906 primitive calls) in 1.925 seconds
    # 1.0: 1380001 function calls (1369086 primitive calls) in 1.905 seconds
    # 0.9: 1196726 function calls (1185612 primitive calls) in 1.648 seconds
    # 0.8: 1073223 function calls (1061809 primitive calls) in 1.333 seconds
    

    in particular there's a lot of time spent on column mechanics that can be cached. The 0.9/0.8 versions have this overhead too, but we'll see if trying to reduce that in 1.1 makes up for the additional structural work.

  4. Mike Bayer repo owner

    Gerrit at https://gerrit.sqlalchemy.org/#/c/311/, please try this out as you'll note the above test case goes from 1381820 function calls to:

    499604 function calls (489104 primitive calls) in 0.658 seconds
    

    e.g. about 270% fewer calls. However, this is very specific to the usage of joinedload() and lazy="joined"; to the degree that the slowness is due to regular query.join() or subqueryload(), that's not yet addressed. But if we can see if just hitting joinedload() makes an impact, that would be very helpful.

  5. Mike Bayer repo owner

    for "subquery" loading, it would be much more difficult to cut down on the time spent with aliased objects as they are made on the fly against the source Query object, which is produced on the outside. There doesn't seem to be much of a quick win in the subquery eager loading department.

  6. Mike Bayer repo owner
    • changed milestone to 1.1.x

    neutron also tells me that as of a week ago they've modified those queries a lot to no longer use joinedload, instead are using more subqueryload :(

  7. Xiaojun Lin reporter

    Hi, sorry for my late response, busy doing tests :P

    We tried the code you provided at https://gerrit.sqlalchemy.org/#/c/311/ . Preliminary tests show that the time spent really goes down. That's encouraging!

    We ran a 100 times iteration of neutron port DB query and recorded the time spent, here's the result:

    First and second column of time is from openstack mitaka before and after the optimization, the third column is from openstack juno.

    after before Juno

    2738ms 4799ms 1162ms

    2729ms 4682ms 1168ms

    2671ms 4688ms 1190ms

    2675ms 4715ms 1129ms

    2655ms 4936ms 1125ms

    We can see a nearly 50% of time optimization, that's an exiting result. However, compared to openstack juno, there's still some gap. But maybe that's because of the hardware difference or maybe it's introduced by neutron itself, for the number of tables to join when querying a port increases from 10 to 16 (not so certain about the number). Still need further analysis.

    We'll keep on looking into this. I'll update for any further progress.

    Thank you and have a nice weekend :)

  8. Mike Bayer repo owner

    the 1.0 / 1.1 series are overall vastly better performing than the 0.8 series, and this is measurable in terms of lots of different tests. Without specifics I can't offer any insight on the specific performance issue you are seeing - e.g. MCVE or at the very least cProfile files. The joinedloader() issue here is one that was demonstrated via cProfile file, and as a result I was able to take the time spent for the above test case to less than half the number of function calls that 0.8, 0.9 or any other version spends.

  9. Mike Bayer repo owner

    Memoize AliasedClass, ClauseAdapter objects in joined eager load

    Addressed some long unattended performance concerns within the joined eager loader query construction system. The use of ad-hoc :class:.AliasedClass objects per query, which produces lots of column lookup overhead each time, has been replaced with a cached approach that makes use of a small pool of :class:.AliasedClass objects that are reused between invocations of joined eager loading. Callcount reduction of SQL query generation for worst-case joined loader scenarios (lots of joins, lots of columns) is reduced by approximately 270%.

    Change-Id: Ie43beecf37945b2bb7fff0aaa597a597293daa18 Fixes: #3915

    → <<cset 1a7e8d77bae2>>

  10. Mike Bayer repo owner

    given the above test, localize to just the fetch:

    q = sess.query(A).options(
        joinedload_all(A.bs, B.cs, C.ds),
        joinedload_all(A.es, E.fs),
        joinedload_all(A.es, E.gs),
    )
    
    context = q._compile_context()
    
    with profiled():
        for i in range(100):
            s = Session(e)
            obj = q.with_session(s)._execute_and_instances(context)
            list(obj)
    

    with cexts still observing:

    644237 function calls (611362 primitive calls) in 0.934 seconds

    on master w/ patch and

    502992 function calls (478209 primitive calls) in 0.779 seconds

    0.8.6. The overhead is certainly not in resultproxy and compilation is the same on both so the difference is within _instances().

  11. Mike Bayer repo owner

    a key part of this is the newer path mechanics which are much more comprehensive, but as always because we have lots of joins that are not default we are spending a lot of time building new paths and calling the memoized props off of them.

  12. Mike Bayer repo owner

    for this run, here's most of it:

    diff --git a/lib/sqlalchemy/orm/interfaces.py b/lib/sqlalchemy/orm/interfaces.py
    index 3fad83e..de03daf 100644
    --- a/lib/sqlalchemy/orm/interfaces.py
    +++ b/lib/sqlalchemy/orm/interfaces.py
    @@ -472,14 +472,17 @@ class StrategizedProperty(MapperProperty):
    
             # search among: exact match, "attr.*", "default" strategy
             # if any.
    -        for path_key in (
    -            search_path._loader_key,
    -            search_path._wildcard_path_loader_key,
    -            search_path._default_path_loader_key
    -        ):
    -            if path_key in context.attributes:
    -                load = context.attributes[path_key]
    -                break
    +        load = context.attributes.get(
    +            ("loader", search_path.path)
    +        )
    +        #for path_key in (
    +        #    search_path._loader_key,
    +        #    search_path._wildcard_path_loader_key,
    +        #    search_path._default_path_loader_key
    +        #):
    +        #    if path_key in context.attributes:
    +        #        load = context.attributes[path_key]
    +        #        break
    
             return load
    

    this brings it down to 553231 function calls (520356 primitive calls) in 0.799 seconds.

  13. Mike Bayer repo owner

    Performance within instances()

    Continuing from Ie43beecf37945b2bb7fff0aaa597a597293daa18, also observed is the overhead of PathRegsitry memoized token functions, as these paths are not cached in the case of long joinedloader chains. The memoizations here were made with short paths in mind, and have been replaced with an inlined straight create of these paths up front, producing callcounts very similar to 0.8. Combined with the previous optimizations, 1.1 now runs the "joined eager load of one row" worst case test in about 40% fewer calls than 0.8 and 60% fewer than 1.1.5.

    Change-Id: Ib5e1c1345a1dd8edfbdb3fed06eb717d4e164d31 Fixes: #3915

    → <<cset fe1cabb88a06>>

  14. Xiaojun Lin reporter

    Hi Michael,

    I merged the 2 patches back to sqlalchemy 1.0.11, it has been running well for the past month. But lately when we were testing getting token from keystone concurrently, some errors occurred randomly. The errors appear every 1 or 2 hours. The traceback is as below:

    File "/usr/lib/python2.7/site-packages/keystone/common/wsgi.py", line 249, in __call__     result = method(context, **params)   
    File "/usr/lib/python2.7/site-packages/keystone/auth/controllers.py", line 416, in authenticate_for_token     parent_audit_id=token_audit_id)   
    File "/usr/lib/python2.7/site-packages/keystone/common/manager.py", line 124, in wrapped     __ret_val = __f(*args, **kwargs)   
    File "/usr/lib/python2.7/site-packages/keystone/token/provider.py", line 389, in issue_v3_token     parent_audit_id)   
    File "/usr/lib/python2.7/site-packages/keystone/token/providers/common.py", line 621, in issue_v3_token     audit_info=parent_audit_id)   
    File "/usr/lib/python2.7/site-packages/keystone/token/providers/common.py", line 526, in get_token_data     project_id, trust)   
    File "/usr/lib/python2.7/site-packages/keystone/token/providers/common.py", line 469, in _populate_service_c     user_id, project_id)   
    File "/usr/lib/python2.7/site-packages/keystone/common/manager.py", line 124, in wrapped     __ret_val = __f(*args, **kwargs)   
    File "/usr/lib/python2.7/site-packages/dogpile/cache/region.py", line 1053, in decorate     should_cache_fn)   
    File "/usr/lib/python2.7/site-packages/dogpile/cache/region.py", line 657, in get_or_create     async_creator) as value:   
    File "/usr/lib/python2.7/site-packages/dogpile/core/dogpile.py", line 158, in __enter__     return self._enter()   
    File "/usr/lib/python2.7/site-packages/dogpile/core/dogpile.py", line 98, in _enter     generated = self._enter_create(createdtime)   
    File "/usr/lib/python2.7/site-packages/dogpile/core/dogpile.py", line 149, in _enter_create     created = self.creator()   
    File "/usr/lib/python2.7/site-packages/dogpile/cache/region.py", line 625, in gen_value     created_value = creator()   
    File "/usr/lib/python2.7/site-packages/dogpile/cache/region.py", line 1049, in creator     return fn(*arg, **kw)   
    File "/usr/lib/python2.7/site-packages/keystone/catalog/core.py", line 304, in get_v3_catalog     return self.driver.get_v3_catalog(user_id, tenant_id)   
    File "/usr/lib/python2.7/site-packages/keystone/catalog/backends/sql.py", line 344, in get_v3_catalog     sql.joinedload(Service.endpoints)).all())   
    File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2588, in all     return list(self)   
    File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py", line 86, in instances     util.raise_from_cause(err)   
    File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause     reraise(type(exception), exception, tb=exc_tb)   
    File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py", line 53, in instances     for query_entity in query._entities   
    File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3538, in row_processor     polymorphic_discriminator=self._polymorphic_discriminator   
    File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py", line 326, in _instance_processor     context, path, mapper, result, adapter, populators)   
    File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/interfaces.py", line 532, in create_row_processor     mapper, result, adapter, populators)   
    File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/strategies.py", line 1536, in create_row_processor     eager_adapter)   
    File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py", line 314, in _instance_processor     col = adapter.columns[col]   
    File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/util.py", line 563, in __getitem__     self.parent.include_fn and not self.parent.include_fn(key)   
    File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/util.py", line 307, in _include_fn     entity = elem._annotations.get('parentmapper', None) 
    AttributeError: 'NoneType' object has no attribute '_annotations' 
    

    Then we removed the 2 patches from the code and ran the test again, the error didn't show up again for almost a half day. So I assume it may have something to do with the patches. Maybe it's because that I merged them directly to 1.0.11, while the patches are for 1.1.x ?

    Could you kindly take a look at this problem and find out a possible cause?

    A piece of information maybe useful: The keystone service is running on apache using mod_wsgi under the worker mode, meaning that the service is running in a multi processing way and the child processes are restarted after processing some number of requests.

  15. Log in to comment