Delayed configuration of polymorphic subclasses leads to duplicate backref event handler registration

Issue #3710 resolved
Chris Wilson created an issue

We have a complex system with mapped classes defined in many different files, used by different applications. Therefore we use derived-map polymorphic loading to load only those classes which are needed, sometimes when processing query results.

We have been running into problems with SQLAlchemy generating SQL that tries to violate database constraints, for example trying to insert the same row twice in a many-to-many association secondary table:

IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "cons_m2m_primary_key"
DETAIL:  Key (id_foo, id_bar)=(182145, 29586) already exists.
 [SQL: 'INSERT INTO m2m (id_foo, id_bar) VALUES (%(id_foo)s, %(id_bar)s)'] [parameters: ({'id_foo': 182145, 'id_bar': 29586}, {'id_foo': 182145, 'id_bar': 29586})]

We tracked this down to delayed initialisation of polymorphic classes and their subclasses resulting in duplicate event listeners being added for the backref relationship. This results in A.bs == [b], but B.as == [a, a]. Depending on the order in which the unitofwork flushes these out to the database, 50% of the time it will try to flush B.as first and generate SQL which violates the database constraints, as above.

Please consider the following abstract example, where the base class Animal is registered first, along with its relationship to Home, and then the polymorphic subclasses Mammal and Dog are loaded and configured in a separate step:

from sqlalchemy import Column, ForeignKey, Integer, Table, Text, create_engine, event
from sqlalchemy.ext.declarative import declarative_base, declared_attr
from sqlalchemy.orm import Mapper, configure_mappers, relationship, sessionmaker
from sqlalchemy.orm.mapper import _mapper_registry

# This event listener may be useful for debugging:
if False:
    @event.listens_for(Mapper, 'mapper_configured')
    def receive_mapper_configured(mapper, class_):
        # assets_allowed = mapper._props.get('assets_allowed')
        homes = getattr(class_, 'homes', None)

        if homes is not None:
            dispatch = list(homes.dispatch.append)

            if len(dispatch) > 2:
                print "{} failed: {} ({})".format(class_, mapper, dispatch)

            assert len(dispatch) <= 2, "{}: {}".format(class_, dispatch)

        print "{} configured: {} ({})".format(class_, mapper, repr(homes))

_mapper_registry.clear()

Base = declarative_base()

# Register these classes first, including the concrete polymorphic base class with the M2M association:

class Home(Base):
    __tablename__ = 'home'
    id = Column(Integer, primary_key=True)
    name = Column(Text)

class Animal(Base):
    __tablename__ = 'mammal'

    id = Column(Integer, primary_key=True)
    name = Column(Text)
    homes_table =  Table("mammal_home",
                         Base.metadata,
                         Column('id_mammal', Integer, ForeignKey(id), primary_key=True),
                         Column('id_home', Integer, ForeignKey(Home.id), primary_key=True)
                         )

    homes = relationship(Home, secondary=homes_table, backref='animals')
    species = Column(Text)

    @declared_attr
    def __mapper_args__(cls):
        return {
            'polymorphic_on':       cls.species,
            'polymorphic_identity': cls.__name__,
        }

# Register the first set of classes and create their Mappers
configure_mappers()

# Simulate dynamic loading of additional mapped classes
class Mammal(Animal):
    pass

class Dog(Mammal):
    pass

# These new classes should not be configured at this point:
unconfigured = [m for m in _mapper_registry if not m.configured]
assert len(unconfigured) == 2, str(unconfigured)

# Now register them in a separate pass.
#
# If we take the first branch, this test will fail randomly 50% of the time, depending on the order of
# Mammal and Dog in _mapper_registry. The second branch manipulates the registry and calls configure_mappers()
# twice to force the Mappers to register in the wrong order, causing the test to fail every time.
if True:
    configure_mappers()
else:
    old_mapper_registry = dict(_mapper_registry)
    del _mapper_registry[Mammal.__mapper__]
    Mapper._new_mappers = True
    configure_mappers()

    _mapper_registry[Mammal.__mapper__] = old_mapper_registry[Mammal.__mapper__]
    Mapper._new_mappers = True
    configure_mappers()

    # In this case, we know that we have hit the error: the event listeners are registered twice:
    assert len(Dog.homes.dispatch.append) == 4, list(Dog.homes.dispatch.append)

assert len(Home.animals.dispatch.append) == 2, list(Home.animals.dispatch.append)
assert len(Mammal.homes.dispatch.append) == 2, list(Mammal.homes.dispatch.append)
# This assertion will fail 50% of the time if the first branch is taken, so I've commented it out,
# and copied it under the second branch above where it should always "succeed" (indicating that the
# event listeners are wrongly double-configured):
# assert len(Dog.homes.dispatch.append) == 4, list(Dog.homes.dispatch.append)

engine = create_engine('sqlite:///sqlalchemy_example.db')

Base.metadata.create_all(engine)

DBSession = sessionmaker(bind=engine)
session = DBSession(autocommit=True)

session.begin()
home = Home(name="localhost")
dog = Dog(name="fido", homes=[home])
assert len(dog.homes) == 1
assert len(home.animals) == 1, "This fails: Dog is listed twice in home.animals: {}".format(home.animals)
print "Test passed: this will happen 50% of the time if the first branch is taken"

When configure_mappers() is called to configure the subclasses, it loops over the _mapper_registry in random order. Thus, it may configure Dog first and then Mammal, or vice versa.

When it configures Dog and Mammal, it will add two event listeners to their homes.dispatch.append list: append and emit_backref_from_collection_append_event. The latter adds the Dog/Mammal to the backref relationship, unconditionally (even if already present), so it must only be called once.

However, if it configures Dog first and Mammal second, then when it configures Mammal it loops over mapper.self_and_descendants (in _register_attribute) and thus registers the same event listeners on Dog.homes.dispatch.append again. So whenever a Dog is given some homes, it will be added twice to the animals collection on each of those Homes. This does not happen if Mammal is registered first and Dog second, because when post-configuring Dog.homes, mapper.class_manager._attr_has_impl(self.key) returns True and it is not initialised again.

I think the problem is related to the comment on _post_configure_properties which says "This is a deferred configuration step which is intended to execute once all mappers have been constructed." However this function is called whenever configure_mappers() finds unconfigured mappers, not only once. Possibly this changed at some point and violated some assumptions?

One fix would be to check mapper.class_manager._attr_has_impl(key) on each mapper when looping over mapper.self_and_descendants, to avoid registering it twice. However that might be a layering violation.

Another fix might be, when looping over mappers in configure_mappers() looking for unconfigured ones, to walk up the ancestor chain of each unconfigured mapper until we find the most ancient unconfigured one, and configure that first, followed by its children, and their children, and so on.

Comments (8)

  1. Chris Wilson reporter

    Yes I agree, this happens regardless of any user event registration and affects database writes (association flushes). Thanks for looking into it so quickly!

  2. Mike Bayer repo owner

    okey doke if you do a simple test like this:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    Base = declarative_base()
    
    
    class A(Base):
        __tablename__ = 'a'
        id = Column(Integer, primary_key=True)
        bs = relationship("B")
    
    
    class B(Base):
        __tablename__ = 'b'
        id = Column(Integer, primary_key=True)
        a_id = Column(ForeignKey('a.id'))
    
    configure_mappers()
    
    
    class ASub(A):
        pass
    
    
    class ASubSub(ASub):
        pass
    
    configure_mappers()
    

    and do a print like this:

    diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py
    index 3c03a68..32288be 100644
    --- a/lib/sqlalchemy/orm/strategies.py
    +++ b/lib/sqlalchemy/orm/strategies.py
    @@ -74,6 +74,9 @@ def _register_attribute(
         for m in mapper.self_and_descendants:
             if prop is m._props.get(prop.key):
    
    +            if prop.key == 'bs':
    +                print "YES! %s" % m
    +
                 desc = attributes.register_attribute_impl(
                     m.class_,
                     prop.key,
    

    you get:

    #!
    
    YES! Mapper|A|a
    YES! Mapper|ASubSub|a
    YES! Mapper|ASub|a
    YES! Mapper|ASubSub|a
    

    the way this function gets called is a little chaotic already, so while ordering the mappers would help, that is yet another "thing on the outside" and I think it's better to make the function a litle more robust so I think the attr_has_impl() check is best:

    diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py
    index 3c03a68..3e08929 100644
    --- a/lib/sqlalchemy/orm/strategies.py
    +++ b/lib/sqlalchemy/orm/strategies.py
    @@ -72,7 +72,11 @@ def _register_attribute(
             )
    
         for m in mapper.self_and_descendants:
    -        if prop is m._props.get(prop.key):
    +        if prop is m._props.get(prop.key) and \
    +                not m.class_manager._attr_has_impl(prop.key):
    +
    +            if prop.key == 'bs':
    +                print "YES! %s" % m
    
                 desc = attributes.register_attribute_impl(
                     m.class_,
    

    then we only get:

    #!
    
    YES! Mapper|A|a
    YES! Mapper|ASubSub|a
    YES! Mapper|ASub|a
    

    the other question is, is it even OK that the sub-most mapper is set up before the superclass, as I don't think this scenario was anticipated here. I think that's OK as well but I want to add a comment here to note this.

  3. Chris Wilson reporter

    That looks like a good solution :) Thanks!

    I have implemented a workaround while we wait for this to get into a release. I've added a listener to the before_configured event that sorts the mappers into order and configures them itself. This appears to work and should keep us going for the time being.

  4. Mike Bayer repo owner

    Check for duplicate calls to register_attribute_impl

    Fixed bug whereby the event listeners used for backrefs could be inadvertently applied multiple times, when using a deep class inheritance hierarchy in conjunction with mutiple mapper configuration steps.

    Change-Id: I712beaf4674e2323bf5b282922658020a6d00b53 Fixes: #3710

    → <<cset 743e9d458994>>

  5. Mike Bayer repo owner

    Check for duplicate calls to register_attribute_impl

    Fixed bug whereby the event listeners used for backrefs could be inadvertently applied multiple times, when using a deep class inheritance hierarchy in conjunction with mutiple mapper configuration steps.

    Change-Id: I712beaf4674e2323bf5b282922658020a6d00b53 Fixes: #3710 (cherry picked from commit 743e9d4589946f1a29cdec7f2f1a2e4ec0853db7)

    → <<cset a9f74400b179>>

  6. Log in to comment