Commits

Klaas van Schelven committed afaeac9

Perfomance: Use Dynamic Programming for dependency analysis.

Rationale: when using larging amounts of migrations (> 100) and some explicit
dependencies (using depends_on) the dependency resolver's performance became
pathological.

It was observed that the Depth First Search algorithm used to establish the
dependencies had large overlapping subproblems, and that therefor Dynamic
Programming was a viable strategy:

http://en.wikipedia.org/wiki/Dynamic_programming#Dynamic_programming_in_computer_programming

Note that the 'path' parameter is not used in the cache. However, it is not
needed there: if if given subtree has been proven to not have circular
dependencies it cannot have circular dependencies at a later moment by the
very definition of circular.

The method _dfs() is rewritten to return lists rather than individual elements
to facilitate the storing of results in a global cache.

The performance gain is not proven with tests. However, on my local project
it is visible with the naked eye: dependency analysis dropped from > 2 minutes
to < 1 second.

  • Participants
  • Parent commits e7067d3

Comments (0)

Files changed (1)

south/migration/utils.py

         else:
             yield x
 
+dependency_cache = {}
+
 def _dfs(start, get_children, path):
+    if (start, get_children) in dependency_cache:
+        return dependency_cache[(start, get_children)]
+
+    results = []
     if start in path:
         raise exceptions.CircularDependency(path[path.index(start):] + [start])
     path.append(start)
-    yield start
+    results.append(start)
     children = sorted(get_children(start), key=lambda x: str(x))
-    if children:
-        # We need to apply all the migrations this one depends on
-            yield (_dfs(n, get_children, path) for n in children)
+    
+    # We need to apply all the migrations this one depends on
+    for n in children:
+        for result in _dfs(n, get_children, path):
+            results.append(result)
+
     path.pop()
 
+    dependency_cache[(start, get_children)] = results
+    return results
+
 def dfs(start, get_children):
-    return flatten(_dfs(start, get_children, []))
+    return _dfs(start, get_children, [])
 
 def depends(start, get_children):
     result = SortedSet(reversed(list(dfs(start, get_children))))