baked get() implementation needs to consider mapper._get_clause as part of key

Issue #3597 resolved
Taha Jahangir created an issue

Envrionment: CentOS 7 / Python 3.4.2 / SQL Alchemy 1.0.9 App details: uwsgi web app running in multi-process and multi-thread mode (5 thread per process)

It seems it's a race condition in baked query loading (both in baked relations and explicit baked queries) causes these StatementError errors:

Traceback (most recent call last):
  File "site-packages/sqlalchemy/engine/base.py", line 1073, in _execute_context
    context = constructor(dialect, self, conn, *args)
  File "site-packages/sqlalchemy/engine/default.py", line 552, in _init_compiled
    grp, m in enumerate(parameters)]
  File "site-packages/sqlalchemy/engine/default.py", line 552, in <listcomp>
    grp, m in enumerate(parameters)]
  File "site-packages/sqlalchemy/sql/compiler.py", line 476, in construct_params
    % bindparam.key)
sqlalchemy.exc.InvalidRequestError: A value is required for bind parameter '%(140719693927088 param)s'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "query.py", line 146, in baked_get
    return baked_query.get(ident)
  File "site-packages/sqlalchemy/ext/baked.py", line 322, in get
    return query._get_impl(ident, self._load_on_ident)
  File "site-packages/sqlalchemy/orm/query.py", line 862, in _get_impl
    return fallback_fn(self, key)
  File "site-packages/sqlalchemy/ext/baked.py", line 365, in _load_on_ident
    result = list(bq.for_session(self.session).params(**params))
  File "site-packages/sqlalchemy/ext/baked.py", line 254, in __iter__
    with_session(self.session)._execute_and_instances(context)
  File "site-packages/sqlalchemy/orm/query.py", line 2586, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "site-packages/sqlalchemy/engine/base.py", line 1078, in _execute_context
    None, None)
  File "site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "site-packages/sqlalchemy/util/compat.py", line 188, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=exc_value)
  File "site-packages/sqlalchemy/util/compat.py", line 181, in reraise
    raise value.with_traceback(tb)
  File "site-packages/sqlalchemy/engine/base.py", line 1073, in _execute_context
    context = constructor(dialect, self, conn, *args)
  File "site-packages/sqlalchemy/engine/default.py", line 552, in _init_compiled
    grp, m in enumerate(parameters)]
  File "site-packages/sqlalchemy/engine/default.py", line 552, in <listcomp>
    grp, m in enumerate(parameters)]
  File "site-packages/sqlalchemy/sql/compiler.py", line 476, in construct_params
    % bindparam.key)
sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) A value is required for bind parameter '%(140719693927088 param)s' [SQL: 'SELECT table.col1 AS table_col1, ..., table.col7 AS table_col7 \nFROM table \nWHERE table.col1 = %(param_1)s AND table.col2 = %(param_2)s'] [parameters: [{'%(140719693926920 param)s': 79228, '%(140719693926640 param)s': 90033}]]

And similarly in:

Traceback (most recent call last):
  File "site-packages/sqlalchemy/engine/base.py", line 1073, in _execute_context
    context = constructor(dialect, self, conn, *args)
  File "site-packages/sqlalchemy/engine/default.py", line 552, in _init_compiled
    grp, m in enumerate(parameters)]
  File "site-packages/sqlalchemy/engine/default.py", line 552, in <listcomp>
    grp, m in enumerate(parameters)]
  File "site-packages/sqlalchemy/sql/compiler.py", line 476, in construct_params
    % bindparam.key)
sqlalchemy.exc.InvalidRequestError: A value is required for bind parameter '%(139926623193184 param)s'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "template.py", line 202, in template
    tabl1_obj = obj.tabl1_relation
  File "site-packages/sqlalchemy/orm/attributes.py", line 237, in __get__
    return self.impl.get(instance_state(instance), dict_)
  File "site-packages/sqlalchemy/orm/attributes.py", line 583, in get
    value = self.callable_(state, passive)
  File "site-packages/sqlalchemy/orm/strategies.py", line 532, in _load_for_state
    return self._emit_lazyload(session, state, ident_key, passive)
  File "site-packages/sqlalchemy/ext/baked.py", line 450, in _emit_lazyload
    session.query(self.mapper), ident_key)
  File "site-packages/sqlalchemy/ext/baked.py", line 365, in _load_on_ident
    result = list(bq.for_session(self.session).params(**params))
  File "site-packages/sqlalchemy/ext/baked.py", line 254, in __iter__
    with_session(self.session)._execute_and_instances(context)
  File "site-packages/sqlalchemy/orm/query.py", line 2586, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "site-packages/sqlalchemy/engine/base.py", line 1078, in _execute_context
    None, None)
  File "site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "site-packages/sqlalchemy/util/compat.py", line 188, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=exc_value)
  File "site-packages/sqlalchemy/util/compat.py", line 181, in reraise
    raise value.with_traceback(tb)
  File "site-packages/sqlalchemy/engine/base.py", line 1073, in _execute_context
    context = constructor(dialect, self, conn, *args)
  File "site-packages/sqlalchemy/engine/default.py", line 552, in _init_compiled
    grp, m in enumerate(parameters)]
  File "site-packages/sqlalchemy/engine/default.py", line 552, in <listcomp>
    grp, m in enumerate(parameters)]
  File "site-packages/sqlalchemy/sql/compiler.py", line 476, in construct_params
    % bindparam.key)
sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) A value is required for bind parameter '%(139926623193184 param)s' [SQL: 'SELECT table1.col1 AS table1_col1, ..., table1.col20 AS table1_col20 \nFROM table1 \nWHERE table1.id = %(param_1)s'] [parameters: [{'%(139926623194696 param)s': 86317}]]

Comments (9)

  1. Mike Bayer repo owner

    hello -

    it would be extremely helpful if you could please try to provide a test case; there's no information here how the baked system or the session itself is being used which might reveal mistakes in usage. In the absence of that, trying basic subtractive things like using single-thread to see if issues resolve, disabling baked lazy loaders but using baked elsewhere, etc. would also serve to add details about the issue.

    Please use the complete test case below as reference, I've tried several variants of this and am not able to reproduce any issues:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.ext import baked
    
    Base = declarative_base()
    
    
    class A(Base):
        __tablename__ = 'a'
        id = Column(Integer, primary_key=True)
        bs = relationship("B", lazy="baked_select")
    
    
    class B(Base):
        __tablename__ = 'b'
        id = Column(Integer, primary_key=True)
        a_id = Column(ForeignKey('a.id'))
    
    e = create_engine("postgresql://scott:tiger@localhost/test") #, echo=True)
    Base.metadata.drop_all(e)
    Base.metadata.create_all(e)
    
    s = Session(e)
    s.add_all([
        A(id=i, bs=[B() for j in range(5)]) for i in range(1, 100)
    ])
    s.commit()
    
    bakery = baked.bakery()
    
    import threading
    import random
    import time
    
    # works fine up here too, sharing same baked_query among 20 threads
    # baked_query = bakery(lambda session: session.query(A))
    
    
    def run():
        while True:
            baked_query = bakery(lambda session: session.query(A))
            s = Session(e)
            for i in range(5):
                id_ = random.randint(1, 99)
                obj = baked_query(s).get(id_)
                obj.bs
                time.sleep(.0001)
    
            s.close()
            print "Success! loaded object %s and b collection %s" % (
                obj.id, ", ".join(str(b.id) for b in obj.bs)
            )
            time.sleep(.01)
    
    threads = [threading.Thread(target=run) for i in range(20)]
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    
  2. Mike Bayer repo owner

    what can cause this is that mapper._get_clause is being hit under "dogpile" conditions when a process first begins to access this attribute. If one baked query hits it, caches the query, while at the same time another thread calls _get_clause at the same time, you'd get two different values for the bound parameter keys. _get_clause does not seem to be part of the configure process.

    not sure how to test this normally but if we slow down the function then the script above reproduces:

    diff --git a/lib/sqlalchemy/orm/mapper.py b/lib/sqlalchemy/orm/mapper.py
    index 5ade4b9..a95b0d5 100644
    --- a/lib/sqlalchemy/orm/mapper.py
    +++ b/lib/sqlalchemy/orm/mapper.py
    @@ -2199,6 +2199,9 @@ class Mapper(InspectionAttr):
             """
             params = [(primary_key, sql.bindparam(None, type_=primary_key.type))
                       for primary_key in self.primary_key]
    +        import time
    +        import random
    +        time.sleep(random.random())
             return sql.and_(*[k == v for (k, v) in params]), \
                 util.column_dict(params)
    
  3. Mike Bayer repo owner

    so that said this is the test case:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.ext import baked
    
    Base = declarative_base()
    
    
    class A(Base):
        __tablename__ = 'a'
        id = Column(Integer, primary_key=True)
    
    e = create_engine("sqlite://")
    Base.metadata.create_all(e)
    
    s = Session(e)
    s.add(A(id=1))
    s.commit()
    
    bakery = baked.bakery()
    
    baked_query = bakery(lambda session: session.query(A))
    s = Session(e)
    obj = baked_query(s).get(1)
    
    del A.__mapper__.__dict__['_get_clause']
    
    
    for i in range(12):
        s = Session(e)
        obj = baked_query(s).get(1)
    
    # ideally would be "2", assuming we re-cache on new _get_clause
    print len(baked_query._bakery)
    
  4. Mike Bayer repo owner
    • Fixed an issue in baked queries where the .get() method, used either directly or within lazy loads, didn't consider the mapper's "get clause" as part of the cache key, causing bound parameter mismatches if the clause got re-generated. This clause is cached by mappers on the fly but in highly concurrent scenarios may be generated more than once when first accessed. fixes #3597

    → <<cset b0308a7b3af9>>

  5. Mike Bayer repo owner
    • Fixed an issue in baked queries where the .get() method, used either directly or within lazy loads, didn't consider the mapper's "get clause" as part of the cache key, causing bound parameter mismatches if the clause got re-generated. This clause is cached by mappers on the fly but in highly concurrent scenarios may be generated more than once when first accessed. fixes #3597

    (cherry picked from commit b0308a7b3af91cc61fbe3347376024ad8b7be019)

    → <<cset 670fd506fe4c>>

  6. Taha Jahangir reporter

    Thanks a lot, ....

    The bug was not occurred always, but only some times.

    uwsgi has a max_request limit, when the limit is reached, uwsgi kills the worker and starts a new fresh one (to avoid possible memory leaks).

    We noticed that sometimes, the new born worker is ill and will produce the exception on every method access. We temporarily removed usage of baked queries to avoid exceptions.

    There is another race condition that (in very rare cases, about 1 in 500,000) results in two different python objects for a single db row. Is it related to this bug or not?

  7. Mike Bayer repo owner

    There is another race condition that (in very rare cases, about 1 in 500,000) results in two different python objects for a single db row. Is it related to this bug or not?

    well, I'm not aware of this issue so I hardly can comment on that without any information either :).

  8. Log in to comment