make sort_tables() stable

Issue #3084 resolved
Sebastian Bank created an issue

I use the mock-engine recipe from the FAQ to get and compare the schema of different instances of a database on different platforms. As the table creation order is not completely fixed, I need to monkey-patch sqlalchemy.util.topological.sort to make diff work on the resulting SQL dumps.

I propose to change sqlalchemy.sql.ddl.sort_tables to first totally order the tables and then use a stable topological sort roughly like the following:

def sort_as_subsets_stable(tuples, allitems):

    edges = collections.defaultdict(set)
    for parent, child in tuples:
        edges[child].add(parent)

    todo = set(allitems)
    queue = list(allitems)

    while queue:
        output = []
        later = []
        for node in queue:
            if not todo.intersection(edges[node]):
                output.append(node)
            else:
                later.append(node)

        if not output:
            raise RuntimeError

        todo.difference_update(output)
        queue = later
        yield output

def sort_stable(tuples, allitems):
    for set_ in sort_as_subsets_stable(tuples, allitems):
        for s in set_:
            yield s

For the deterministic ordering prior to the topological sort, name or definition order would do (I prefer the latter).

I suppose the performance impact does not matter, if this is only used in database creation.

I have not tried, but I think that the hash randomization in Python 3.3+ might shuffle the table order with each invocation with the current implementation.

Comments (11)

  1. Mike Bayer repo owner

    just curious, would replacing todo = set(allitems) with todo = OrderedSet(allitems) have the same effect? might be simpler, and woudl allow "set / orderedset" to be dropped in as an implementation.

  2. Sebastian Bank reporter

    Hah, I did not realize that there is an OrderedSet implementation in sqlalchemy..

    If it also says output = OrderedSet(), I suppose, that should be fine.

    The list() call in line 25 is not necessary, right?

  3. Mike Bayer repo owner

    probably not.

    So i want the sig to be like this:

    def sort(tuples, allitems, deterministic_order=False):
    

    the UOW uses this function also and I dont want OrderedSet getting in the way on that.

  4. Sebastian Bank reporter

    Would adding (and incrementing) the value of a definition counter to registered table instances here be enough to represent the definition order for later use in sort_tables?

    One could also go the extra mile and make Metadata.tables an ordered mapping. In my experience, ordered datastructures for testing/debugging (quick eye-balling) become more desirable when hash randomization comes in.

    Name order would perhaps be simpler.

  5. Mike Bayer repo owner

    I had in mind something a lot simpler, just in metadata.sorted_tables() we pass in sorted(self.tables.values, key=lambda t:t.key), taking into account the other filtering and such that might be going on also.

  6. Log in to comment