make sort_tables() stable
I use the mock-engine recipe from the FAQ to get and compare the schema of different instances of a database on different platforms. As the table creation order is not completely fixed, I need to monkey-patch sqlalchemy.util.topological.sort
to make diff
work on the resulting SQL dumps.
I propose to change sqlalchemy.sql.ddl.sort_tables
to first totally order the tables and then use a stable topological sort roughly like the following:
def sort_as_subsets_stable(tuples, allitems):
edges = collections.defaultdict(set)
for parent, child in tuples:
edges[child].add(parent)
todo = set(allitems)
queue = list(allitems)
while queue:
output = []
later = []
for node in queue:
if not todo.intersection(edges[node]):
output.append(node)
else:
later.append(node)
if not output:
raise RuntimeError
todo.difference_update(output)
queue = later
yield output
def sort_stable(tuples, allitems):
for set_ in sort_as_subsets_stable(tuples, allitems):
for s in set_:
yield s
For the deterministic ordering prior to the topological sort, name or definition order would do (I prefer the latter).
I suppose the performance impact does not matter, if this is only used in database creation.
I have not tried, but I think that the hash randomization in Python 3.3+ might shuffle the table order with each invocation with the current implementation.
Comments (11)
-
repo owner -
reporter Hah, I did not realize that there is an OrderedSet implementation in sqlalchemy..
If it also says
output = OrderedSet()
, I suppose, that should be fine.The
list()
call in line 25 is not necessary, right? -
repo owner probably not.
So i want the sig to be like this:
def sort(tuples, allitems, deterministic_order=False):
the UOW uses this function also and I dont want OrderedSet getting in the way on that.
-
repo owner also how are we "totally ordering" the tables here? sort by name somewhere?
-
reporter Would adding (and incrementing) the value of a definition counter to registered table instances here be enough to represent the definition order for later use in
sort_tables
?One could also go the extra mile and make
Metadata.tables
an ordered mapping. In my experience, ordered datastructures for testing/debugging (quick eye-balling) become more desirable when hash randomization comes in.Name order would perhaps be simpler.
-
repo owner I had in mind something a lot simpler, just in metadata.sorted_tables() we pass in sorted(self.tables.values, key=lambda t:t.key), taking into account the other filtering and such that might be going on also.
-
repo owner feel free if you want to PR that...
-
repo owner - changed component to utils
-
assigned issue to
- marked as enhancement
- changed milestone to 0.9.xx
-
repo owner - changed milestone to 1.0.xx
-
reporter -
repo owner - changed status to resolved
→ <<cset ee40c7afaabe>>
- Log in to comment
just curious, would replacing
todo = set(allitems)
withtodo = OrderedSet(allitems)
have the same effect? might be simpler, and woudl allow "set / orderedset" to be dropped in as an implementation.