topological.organize_as_tree may create an overly deep tree

Issue #1081 resolved
Mike Bayer repo owner created an issue

this script illustrates topological.organize_as_tree creating a very deep tree as a result of many dependencies that are not related to each other; this within unitofwork _sort_circular_dependencies. The script can cause a recursion overflow during the flush process, but its very very heisenbuggy; running it on different versions/platforms will have the bug appear, then not -in particular it seems to require both dont_generate_error and generate_error to run fully in order to trip it.

But the main idea is that organize_as_tree needs to be more intelligent about this kind of graph so that the tree is not very deep.

import sqlalchemy as sa
from sqlalchemy import create_engine, MetaData, orm
from sqlalchemy import Column, ForeignKey
from sqlalchemy import Integer, String
from sqlalchemy.orm import mapper

class Object(object):
    pass

class Q(Object):
    pass

class A(Object):
    pass

class C(Object):
    pass

class WC(C):
    pass

engine = create_engine('sqlite:///:memory:', echo=True)

sm = orm.sessionmaker(autoflush=True, transactional=True, bind=engine)

SA_Session = orm.scoped_session(sm)

SA_Metadata = MetaData()

object_table =  sa.Table('Object',
                          SA_Metadata,
                          Column('ObjectID', Integer,primary_key=True),
                          Column('Type', String(1), nullable=False))

q_table = sa.Table('Q',
                   SA_Metadata,
                   Column('QID', Integer, ForeignKey('Object.ObjectID'),primary_key=True))

c_table = sa.Table('C',
                   SA_Metadata,
                   Column('CID', Integer, ForeignKey('Object.ObjectID'),primary_key=True))

wc_table = sa.Table('WC',
                    SA_Metadata,
                    Column('WCID', Integer, ForeignKey('C.CID'), primary_key=True))

a_table = sa.Table('A',
                   SA_Metadata,
                   Column('AID', Integer, ForeignKey('Object.ObjectID'),primary_key=True),
                   Column('QID', Integer, ForeignKey('Q.QID')),
                   Column('CID', Integer, ForeignKey('C.CID')))

mapper(Object, object_table, polymorphic_on=object_table.c.Type, polymorphic_identity='O')

mapper(Q, q_table, inherits=Object, polymorphic_identity='Q')
mapper(C, c_table, inherits=Object, polymorphic_identity='C')
mapper(WC, wc_table, inherits=C, polymorphic_identity='W')

mapper(A, a_table, inherits=Object, polymorphic_identity='A',
       properties = {
                     'Q' : orm.relation(Q,primaryjoin=a_table.c.QID==q_table.c.QID,
                                        backref='As'
                                        ),
                     'C' : orm.relation(C,primaryjoin=a_table.c.CID==c_table.c.CID,
                                        backref='A',
                                        uselist=False)
                     }
       )

SA_Metadata.create_all(engine)

def generate_error():
    q = Q()
    for j in range(400): #at 306 the error does not pop out (depending on recursion depth)
        a = A()
        a.Q = q
        a.C = WC()

    SA_Session.save(q)
    SA_Session.commit() #here the error pops out

def dont_generate_error():
    q = Q()
    SA_Session.save(q)
    for j in range(600):
        a = A()
        a.Q = q
        a.C = WC()
        SA_Session.commit()

dont_generate_error()

generate_error()

If you comment out dont_generate_error(), and place change the number of iterations in generate_error() from 400 to 10, and then place a pdb.set_trace() on line 734 of unitofwork.py, you get results like these for tuples and head:

(Pdb) [y.class_) for x, y in tuples]((x.class_,)
['__main__.Q'>, <class '__main__.A'>), (<class '__main__.Q'>, <class '__main__.A'>), (<class '__main__.Q'>, <class '__main__.A'>), (<class '__main__.Q'>, <class '__main__.A'>), (<class '__main__.Q'>, <class '__main__.A'>), (<class '__main__.WC'>, <class '__main__.A'>), (<class '__main__.WC'>, <class '__main__.A'>), (<class '__main__.WC'>, <class '__main__.A'>), (<class '__main__.WC'>, <class '__main__.A'>), (<class '__main__.WC'>, <class '__main__.A'>), (<class '__main__.Q'>, <class '__main__.A'>), (<class '__main__.WC'>, <class '__main__.A'>), (<class '__main__.Q'>, <class '__main__.A'>), (<class '__main__.WC'>, <class '__main__.A'>), (<class '__main__.Q'>, <class '__main__.A'>), (<class '__main__.WC'>, <class '__main__.A'>), (<class '__main__.Q'>, <class '__main__.A'>), (<class '__main__.WC'>, <class '__main__.A'>), (<class '__main__.Q'>, <class '__main__.A'>), (<class '__main__.WC'>, <class '__main__.A'>)]((<class)

above you can see that each A is dependent on the Q as well as each WC. But that's it; a tree of these dependencies should place all the As as a straight down list after the WCs and single Q. But looking at the head which is the tree:

(Pdb) head
(<sqlalchemy.orm.attributes.InstanceState object at 0x2ac75ca9b550>, [[(<sqlalchemy.orm.attributes.InstanceState object at 0x2ac75ca9b690>, [](],), [object at 0x2ac75ca95150>, []((<sqlalchemy.orm.attributes.InstanceState), [object at 0x2ac75ca9b390>, []((<sqlalchemy.orm.attributes.InstanceState), [object at 0x2ac75ca95350>, []((<sqlalchemy.orm.attributes.InstanceState), [])](]))), (<sqlalchemy.orm.attributes.InstanceState object at 0x2ac75ca9b7d0>, [[(<sqlalchemy.orm.attributes.InstanceState object at 0x2ac75ca9b810>, [](],), [])](]))), (<sqlalchemy.orm.attributes.InstanceState object at 0x2ac75ca9b410>, [[(<sqlalchemy.orm.attributes.InstanceState object at 0x2ac75ca9b450>, [](],), [])](]))), (<sqlalchemy.orm.attributes.InstanceState object at 0x2ac75ca9b590>, [[](],)), (<sqlalchemy.orm.attributes.InstanceState object at 0x2ac75ca9b6d0>, [[](],))])])])

(Pdb) head[2](2)[0](0)[2](2)[0](0)[2](2)[0](0)[2](2)
[object at 0x2ac75ca95350>, []((<sqlalchemy.orm.attributes.InstanceState), [])](]))
(Pdb)

the structure is deeply nested. When the number of objects is upped to the hundreds we get recursion overflows when this tree is acted upon.

Comments (12)

  1. Mike Bayer reporter

    an alternative here is to do away with the whole "tree" thing and modify the way UOWExecutor works so that it is not using nested calls to handle "cyclical" dependencies. Or just inlining UOWExecutor. But it would be nicer to do away with the "tree" concept completely.

  2. Mike Bayer reporter

    at this point the error is in UOWExecutor. A little trickier to make that non recursive.

  3. Mike Bayer reporter
    • changed milestone to 0.7.0

    the entire tree notion could be taken out based on the idea at #1742. If we're able to proceed with that then this ticket is implicitly fixed.

  4. jek

    Trying the branch with my codebase I get a ton of this:

      File "uow_rethink/lib/sqlalchemy/orm/query.py", line 1343, in all
        return list(self)
      File "uow_rethink/lib/sqlalchemy/orm/query.py", line 1450, in __iter__
        self.session._autoflush()
      File "uow_rethink/lib/sqlalchemy/orm/session.py", line 879, in _autoflush
        self.flush()
      File "uow_rethink/lib/sqlalchemy/orm/session.py", line 1347, in flush
        self._flush(objects)
      File "uow_rethink/lib/sqlalchemy/orm/session.py", line 1425, in _flush
        flush_context.execute()
      File "uow_rethink/lib/sqlalchemy/orm/unitofwork.py", line 198, in execute
        for rec in cycles
      File "uow_rethink/lib/sqlalchemy/orm/unitofwork.py", line 198, in <genexpr>
        for rec in cycles
      File "uow_rethink/lib/sqlalchemy/orm/unitofwork.py", line 371, in per_state_flush_actions
        False):
      File "uow_rethink/lib/sqlalchemy/orm/mapper.py", line 1312, in _per_state_flush_actions
        for rec in prop.per_state_flush_actions(uow, states_for_prop, isdelete):
      File "uow_rethink/lib/sqlalchemy/orm/properties.py", line 1222, in per_state_flush_actions
        for rec in self._dependency_processor.per_state_flush_actions(uow, state, isdelete):
    TypeError: 'NoneType' object is not iterable
    
  5. Mike Bayer reporter

    apparently you're using mappings that aren't covered in any tests. The issue you're having is most likely solved in the latest tip of that branch. The new uow is more deliberate about various contingencies and requires more test coverage than the old version.

  6. Log in to comment