Change UNIQUE constraint to allow multiple NULL values for MS-SQL

Issue #2608 resolved
Former user created an issue

According to the ANSI standards SQL:92, SQL:1999, and SQL:2003, a UNIQUE constraint should disallow duplicate non-NULL values, but allow multiple NULL values. SQL Server has always implemented a crippled version of this, allowing a single NULL but disallowing multiple NULL values. From http://connect.microsoft.com/SQLServer/feedback/details/299229/change-unique-constraint-to-allow-multiple-null-values

To fix this issue SQL has added the filtered indexes feature. Using this feature it is possible to perform the logic of an index to a filter set of items in a table using a WHERE clause. i.e.

create unique nonclustered index idx on dbo.DimCustomer(emailAddress)
where EmailAddress is not null;

Attached is a patch which applies the above feature when SQL 2008 or newer is used as a server. This patch has been tested on Windows 7/SQL Server 2012 using pyodbc and the normal SQL Server driver.

Comments (3)

  1. Mike Bayer repo owner

    In general, SQLAlchemy does not apply major behavior-changing workarounds like this automatically. There are easy paths to producing this kind of DDL, including selectively for MSSQL only in multiple-backend environments. (see http://docs.sqlalchemy.org/en/latest/core/schema.html#metadata-ddl)

    Issues here include:

    • I don't understand why the compilation has been moved out of that of UniqueConstraint and into a post-create-table hook. The approach here is not compatible with the AddConstraint(), DropConstraint() constructs nor will it work with tools like Alembic or SQLAlchemy-migrate.

    • the hardcoded naming scheme especially means those scripts are hard to write, and there's no system here to control that naming scheme.

    • the performance behavior of these indexes may be a surprise to DBAs who expect the traditional UNIQUE DDL to be generated. Especially with more "enterprisey" databases like MSSQL its important that we allow custom strategies, but not push them out by default. Past a certain threshhold, this is perceived as surprise behavior.

    • I've hit this issue myself with SQL Server, attempted to use a similar solution here through simple usage of DDL(), but we had to abandon it as it caused some other issue that I can't recall at the moment; but it was not that we were attempting to store multiple nulls - the same application runs on Postgresql where the UNIQUE index works as expected. So I'm not very confident that invoking this SQL automatically is free of surprise side effects.

    • The post_create_table() step as implemented by the base DDL compiler is skipped entirely if server_version_info >= 10. this is a generic hook that can have any number of behaviors added in the future.

    • Applications that are actually trying to store multiple NULLs based on SQL Server's documented behavior of UNIQUE constraints will suddenly break on newly generated schemas

    • The usage of ";" to emit multiple statements within one execute() call is not guaranteed to work on all DBAPIs and is something we try to avoid, except in DBAPI-specific cases where it's unavoidable.

    Given how dramatically different this behavior is from what a lot of users would expect, I'd be more amenable to creating a simple DDL()-based recipe for this and adding it as a documentation note on the MSSQL documentation, but I'm pretty -1 on invoking this DDL automatically, especially with no way to override it.

  2. Log in to comment