A table can support only one "clustered" index. That clustered index effectively determines how the table is actually laid out on disk.
I think there are only three avenues for clustering
adding a clustered index to an existing table (which we already cover)
adding a clustered primary key inline in a create table
handled in this patch by specifying mssql_clustered on the table or the PrimaryKeyConstraint
adding a clustered unique key (but not a primary key) inline in a create table e.g.
So probably we should support mssql_clustered on a UniqueConstraint as well.
Obviously clustering applies to the constraint object, so the nicest place for it to appear is only on PrimaryKeyConstraint or UniqueConstraint and not on Table … but that means the shortcut of being able to tag columns with primary_key=True disappears which is the only reason I also put it onto Table as well …
Perhaps if clustering you should be forced to do it via an explicit PrimaryKeyConstraint ??
it would certainly be easier to just support clustering on an explicit PrimaryKeyConstraint for now - if an app wants all the PK constraints to be this way, schema attachment events can be used. but is the use case for an explicit "clustered PK index" usually just a few tables or how will this normally be used ?
Clustering a table using a particular index (be it the primary key, some other unique constraint, or even a non unique index) effectively "makes the table the index". That is:
in a non clustered table, the table is some big amorphous blob of data on disk and an index is e.g. a B-tree that contains row pointers into this mass of data.
in a table that has a clustered index, that clustered index is the table. The index/table is stored as e.g. a B-tree where the leaf pages contain actual table data rather than row pointers. (And other non clustered indexes use a different sort of pointer into this B-tree)
This makes reads from a clustered index much faster than a non clustered index as there is no row lookup. e.g. SQLServer will ignore non clustered indexes often if the table is not large as a plain table scan may beat an index scan + lookup.
Conversely, writes are slower as page allocation overhead is higher.
So … in cases where:
there isn't a pressing constraint on write performance; and
there is a clear candidate for a unique key that will often be used (e.g. an opaque id field that is often joined to); and
where the table is of reasonable size so there will actually be a benefit …
then a clustered key is a good idea. This means that in many use cases, developers may cluster every table by default.
I think at the moment, if mssql_clustered also stays on Table then the only ambiguity becomes in a statement like
right I thought maybe something like: "mssql_clustered_on=('y',)" vs. "mssql_clustered_on=('x',)" would be a way this could be set up at the table kw level, but this seems to be well within "there's more than one way to do it" territory - since the argument essentially has to poke around the table columns looking for a UniqueConstraint or PrimaryKeyConstraint that has those columns (and then, a column can be part of the primary key and a unique constraint at the same time!) and figure out which constraint actually receives the "CLUSTERED" keyword.
There's other kinds of arguments one can put on PrimaryKeyConstraint, like a "name" for example, and we don't have table-level arguments for that either.
What I thought should work, but unfortunately does not, is this:
I think that if the user has tagged any columns individually, then the PrimaryKeyConstraint should raise an exception if it also contains any columns.
e.g. the new code should be something like
@@ -2528,10 +2523,12 @@ class PrimaryKeyConstraint(ColumnCollectionConstraint):
def _set_parent(self, table):
- if table.primary_key in table.constraints:- table.constraints.remove(table.primary_key)- table.primary_key = self- table.constraints.add(self)+ if table.primary_key is not self:+ table.constraints.discard(table.primary_key)+ table.primary_key = self+ table.constraints.add(self)++ if self.columns:+ if [c for c in table.c if c.primary_key]:+ raise SomeSortOfException+ else:+ self.columns.extend(c for c in table.c if c.primary_key)
probably generates a PKC of (c, a, b). Its more likely they wanted (a, b, c) if they meant anything at all. I think its probably an error if the PrimaryKeyConstraint columns don't match exactly those which have been tagged with primary_key=True.
The ordering for the second one? I think if they have tagged columns individually, then either they can specify a PrimaryKeyConstraint with no columns, or with all the columns and the latter gives them the opportunity to reorder them into exactly what they want.
(which is what you've said is more appropriate too!).
also patched an existing bug where an empty UniqueConstraint() (i.e. with no columns specified) would generate bad sql. Changed this in both the dialects.mssql.compiler and sql.compiler. Added test to test.sql.test_constraints