topological sort relies on hefty recursion at the end

Issue #423 resolved

Mike Bayer repo owner created an issue 2007-01-16

we have to rewrite the _create_batched_tree method to not use recursion the way its doing - very easy for it to run out and its inefficient.

        tuples = [       for i in range(0,1500, 2):
            tuples.append((i, i+1))
        head = DependencySorter(tuples, [](]
)).sort()

== kaboom

Comments (19)

Former user Account Deleted
- attached nonrecursive.treebuilding.v2.patch
Patch making batched treebuilding nonrecursive
- 2007-01-26T02:35:21+00:00
Former user Account Deleted
As this is apparently a source of heisenbugs I tested this very thoroughly. Ensured by hand that the code is isomorphic to original, ran some manually engineered testcases and lots of randomly constructed tests. Other than some valid variations of tree structure when dictionary ordering changed in topological sort, the output is identical.
- 2007-01-26T02:45:38+00:00
Former user Account Deleted
- attached nonrecursive.treebuilding.v3.patch
Improved patch
- 2007-01-26T09:28:47+00:00
Former user Account Deleted
Added an early out for nodes without dependencies, which should be a pretty common case. For lists of independent nodes the asymptotic complexity is now O(n) instead of O(n^2^). Worst case, which is the case in the bug description, is still O(n^2^), although about 7 times faster than before this patch. I don't see a good way to make that any faster either.
- 2007-01-26T09:42:54+00:00
Mike Bayer reporter
im sure its great, the issue was mostly the recursion overflow....im a little short on time this weekend so ill try to review this when i get a chance.

can you post your test cases ? have you been able to run through all SA unit test ? (in particular test/orm/alltests.py , with a good strict database like postgres ?)
- 2007-01-26T14:41:25+00:00

Former user Account Deleted

I have two unit tests failures, but they are exactly the same as before the patch. Namely:

======================================================================
FAIL: test_recycle (engine.pool.PoolTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ants/work/sqlalchemy/trunk/test/engine/pool.py", line 167, in test_recycle
    assert id(c3.connection) != c_id
AssertionError

======================================================================
FAIL: testmanytoonepolymorphic (orm.inheritance5.RelationTest4)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ants/work/sqlalchemy/trunk/test/orm/inheritance5.py", line 296, in testmanytoonepolymorphic
    assert str(usingProperty) == "Engineer E4, status X"
AssertionError

----------------------------------------------------------------------

My own tests were hand crafted nonautomatic tests that I used to verify correctness. The random test generated random graphs and compared output of old algorithm and new algorithm and alerted when they were not exactly the same. I verified for all exceptions that the differences were trivial reorderings due to different topological sort order. The difficult part isn't generating the testcases, but verifying the results. I suppose I could make a testcase processor that can verify that the produced tree is valid by validating that all dependency constraints hold. If you deem it necessary I will create it, but it's going to take a bit as I have to do a nasty database migration in the next week.

2007-01-28T17:44:42+00:00

Former user Account Deleted
Forgot to mention, those results are on Python 2.5, Postgres 8.1.5, x86_64, SVN revision 2255. Just tried with 2267, same failures + 3 errors from orm.inheritance5.
- 2007-01-28T17:49:30+00:00
Mike Bayer reporter
what OS are you on, linux ? (its frequent that tests which pass on OSX fail on linux, usually due to differences in dictionary ordering). im curious if the latest trunk (2271) fixes those particular test failures....even though I dont believe they are related to the topological sorting (although its remotely possible).
- 2007-01-28T18:46:50+00:00
Former user Account Deleted
It doesn't fix those particular issues but there are lots more now. I'll attach the test log. Anyway, this has got nothing to do with this patch as test log is identical with or without the patch.
- 2007-01-28T21:06:28+00:00
Former user Account Deleted
- attached unittests.log
Lots of failing unittests here
- 2007-01-28T21:06:59+00:00
Mike Bayer reporter
i know, i apologize for the tangent. as far as the topological patch, im pretty sure its OK, i just want to make sure I understand it.

that log output looks very much like the tests are being run off the wrong version of the code...particularly the quoting test, i cant see how that one would screw up unless youre running against the older version of the code. ill try running all tests on a linux box later today.
- 2007-01-29T14:29:47+00:00
Former user Account Deleted
Ok, this is embarrasing. I for some reason figured that the tests would augment the path with ../lib automatically. So the previous runs were against my SQLAlchemy install, not against trunk. Reran the unittests with correct path and found a bug in my code. it was throwing an exception on empty sequences. If I fixed that everything but the test_recycle in engine.pool.PoolTest ran OK. Sorry for the confusion.
- 2007-01-30T11:40:01+00:00
Former user Account Deleted
- attached nonrecursive.treebuilding.v4.patch
Fixed empt list behaviour to return None instead of throwing an Exception
- 2007-01-30T11:40:53+00:00
Mike Bayer reporter
the unit tests do prepend './lib/' to the sys.path, its at the top of test/testbase.py, which is generally needed since easy_install favors installed .eggs over paths in PYTHONPATH (except on my system, where i added an init file to override that).
- 2007-01-30T12:15:28+00:00
Former user Account Deleted
- attached nonrecursive.treebuilding.v5.patch
Added comments for the algorithm
- 2007-01-30T12:31:20+00:00
Former user Account Deleted
Is there anything I can do to help get this patch in?
- 2007-02-08T15:44:03+00:00
Mike Bayer reporter
i think you are good for now, this is a patch where I need to find the time to really pore through the algorithm and make sure i understand it (since I will usually be the one supporting it if new errors arise). i have some time constraints including traveling currently but this is on my high priority list of items and hopefully will be in the next release.
- 2007-02-09T14:02:58+00:00
Former user Account Deleted
Maybe I can making understanding the algorithm simpler. It's actually exactly the same algorithm as before, only transformed into iterative form, so there exists a 1:1 mapping between the old version and the new version. Here are some of the mappings: The ''sort(index + 1, l2)'' recursion is transformed into ''for node in reversed(nodes):'' loop. The ''l'' and ''l2'' lists are mapped to ''independets'' list. ''for n in l2:'' loop is transformed into the ''for index in xrange(len(independents)-1,-1,-1):'' loop. Because we don't rebuild the independents list on each recursion/iteration, the ''if previous_dependent_on_node: add_previous_to_nodes_children, else: add_previous_to_new_independents_list'' part changes to ''if previous_dependent_on_node: add_previous_to_nodes_children, remove_previous_from_independents''.

The search_dep function underwent a slightly larger transformation. Previously each recursive dependency check meant checking recursively for each node in the tree, 1) if that node is in the potential parents dependency list or 2) in any of the potential parents cycles dependency lists or if 3) any of that nodes cycles is in the potential parents dependency list. This is replaced by for each independent subtree collecting a set of all nodes in that subtree. Then checking 1) and 2) maps to checking the intersection of that set and the set of all the dependencies of the parent node and its cycles (the new all_deps function). For checking 3) we also collect a set of all the cycles in a subtree and check whether it interesects with the parent nodes dependencies. Note that there is a slight unorthogonality here as the childs cycles aren't checked against parents cycles dependencies. I'm not sure whether it should be so or not, but it was so in the old algorithm. If it's a bug, it would actually simplify the algorithm a bit as the subtree and cycle sets could be merged.

I hope that this makes understanding the algorithm simpler. My suggestion is to merge the patch as soon as you make the decision to put it in. Leaves more time for testing to find potential bugs.
- 2007-02-11T12:14:52+00:00
Mike Bayer reporter
- changed status to resolved
youre being pushy, but youre expending tremendous effort to do so so thats OK :). so fine, its in, changeset:2312, it works extremely well as I ran the unit tests with session logging on into a text file and did a diff of the same run against both versions, and the log outputs are more or less identical. and of course the test in this ticket passes.

also i didnt really understand the code that I had already written for that section anyway, like i understood the general idea but i found it just as hard to follow as the new version (im not a deep algorithm guy), so i have an equal chance of understanding either one at this point. thanks for all the comments.
- 2007-02-12T20:28:38+00:00
Log in to comment

Assignee: Mike Bayer

Type: bug

Priority: critical

Status: resolved

Component: orm

Milestone: –

Votes: 0

Watchers: 0