mssql aliasing of schema-tables in update

Issue #3424 resolved
Andrey Zholos created an issue

When updating a table with a specified schema name, some parts of the generated statement refer to the table with an alias and some parts don't.

A subquery uses an alias that was not introduced (it could be introduced in an mssql "from" clause):

UPDATE [schema].sometable SET val=
(SELECT [#other].newval FROM [#other] WHERE sometable_1.sym = [#other].sym)

An alias is introduced but the "where" clause doesn't use it:

UPDATE [schema].sometable SET val=[#other].newval
FROM [schema].sometable AS sometable_1, [#other]
WHERE [schema].sometable.sym = [#other].sym

Both of these statements fail. Complete program attached.

Comments (14)

  1. Mike Bayer repo owner

    So there's a rule in the MSSQL dialect that requires all tables that are schema-qualified to be referred to as alias names. It's a really old rule so it takes place in a very crude way, but the core of it is that SQL Server, at least the 2005/2008 versions, cannot handle a query as follows:

    SELECT [schema].t2.a, [schema].t2.b, [schema].t2.c, (SELECT a1.a FROM [schema].t2 AS a1) AS anon_1 FROM [schema].t2
    

    the thing here that's pushing in that alias name is turning the above query into:

    SELECT t2_1.a, t2_1.b, t2_1.c, 
    (SELECT a1.a FROM [schema].t2 AS a1) 
    AS anon_1 FROM [schema].t2 AS t2_1
    

    the idea is that by making an alias "t2_1", the "[schema].t2" token isn't everywhere because it is observed that SQL Server disallows this syntax in many areas (or at least it did).

    in this case, if we turn the logic off entirely, just as a hack:

    diff --git a/lib/sqlalchemy/dialects/mssql/base.py b/lib/sqlalchemy/dialects/mssql/base.py
    index b073af6..c029fb7 100644
    --- a/lib/sqlalchemy/dialects/mssql/base.py
    +++ b/lib/sqlalchemy/dialects/mssql/base.py
    @@ -1055,6 +1055,7 @@ class MSSQLCompiler(compiler.SQLCompiler):
                 return compiler.SQLCompiler.visit_select(self, select, **kwargs)
    
         def _schema_aliased_table(self, table):
    +        raise NotImplementedError()
             if getattr(table, 'schema', None) is not None:
                 if table not in self.tablealiases:
                     self.tablealiases[table] = table.alias()
    @@ -1063,7 +1064,7 @@ class MSSQLCompiler(compiler.SQLCompiler):
                 return None
    
         def visit_table(self, table, mssql_aliased=False, iscrud=False, **kwargs):
    -        if mssql_aliased is table or iscrud:
    +        if True: #mssql_aliased is table or iscrud:
                 return super(MSSQLCompiler, self).visit_table(table, **kwargs)
    
             # alias schema-qualified tables
    @@ -1092,9 +1093,9 @@ class MSSQLCompiler(compiler.SQLCompiler):
                     % self.preparer.format_savepoint(savepoint_stmt))
    
         def visit_column(self, column, add_to_result_map=None, **kwargs):
    -        if column.table is not None and \
    -                (not self.isupdate and not self.isdelete) or \
    -                self.is_subquery():
    +        if False:# and column.table is not None and \
    +                #(not self.isupdate and not self.isdelete) or \
    +                #self.is_subquery():
                 # translate for schema-qualified table aliases
                 t = self._schema_aliased_table(column.table)
                 if t is not None:
    

    your statements come out like this:

    UPDATE [schema].sometable SET val=(SELECT [#other].newval 
    FROM [#other] 
    WHERE [schema].sometable.sym = [#other].sym)
    
    UPDATE [schema].sometable SET val=[#other].newval FROM [schema].sometable, [#other] WHERE [schema].sometable.sym = [#other].sym
    

    first thing you can do to help. Can you confirm that both SQL statements above are executable successfully on your database?

    Can I please know what version of SQL Server this is?

    the fix here would involve altering the rules such that they don't take place within the context of an UPDATE, or even just turning off the rules entirely.

    It would take a lot of testing to determine how far back we can take these rules; for SQLA 1.1 I'd want to target SQL Server 2008 and above.

  2. Mike Bayer repo owner
    • changed milestone to 1.1
    • changed component to mssql

    this is a tenative milestone as we need to determine what forms of SQL are acceptable on various SQL Server backends and it may involve changes that are not compatible with older versions.

  3. Andrey Zholos reporter

    Yes, those two statements work. The commented-out statements in my program that include a text() fragment or literal SQL also all work.

    I'm using SQL Server 2014 – it's on one of the standard EC2 images.

    There might be more complex cases where aliases are still needed, like a self-join in an update.

    I also just noticed that this works too:

    UPDATE [schema].sometable SET val=[#other].newval
    FROM [#other]
    WHERE [schema].sometable.sym = [#other].sym
    

    update_from_clause() always includes the updated table, but if were omitted as above, that would fix this statement.

    But forcing an update from clause with just the updated table and its alias would fix the other statement :)

    I.e., this works:

    UPDATE [schema].sometable SET val=
    (SELECT [#other].newval FROM [#other] WHERE sometable_1.sym = [#other].sym)
    FROM [schema].sometable as sometable_1
    
  4. Mike Bayer repo owner

    I tried to find the origin of this feature, and it is from the most ancient beginnings - the original mssql.py file had these features in it already, in version 0.2, with no information as to why these translations were needed.

    I've tried all the statements that we test for against SQL Server 2012 and there is no issue using the schema name normally. so I am leaning towards turning this aliasing logic into a legacy thing, enabled by a flag, and off by default in 1.1. I'd guess that at most it was appeasing some SQL Server 2005 type of caes.

    if you can come up with an example SQL statement that still fails if [schema].tablename is not aliased, please post it, that would change this plan significantly.

  5. Andrey Zholos reporter

    Perhaps the original idea was to make queries shorter to work around driver limits? Programmatically-generated queries can get quite long...

    The self-joins I mentioned before might look something like this:

    UPDATE [schema].sometable SET val=
    (SELECT sometable_1.val FROM [schema].sometable AS sometable_1
    WHERE [schema].sometable.sym = reverse(sometable_1.sym))
    WHERE [schema].sometable.val IS NULL
    
    UPDATE [schema].sometable SET val=sometable_1.val
    FROM [schema].sometable, [schema].sometable AS sometable_1
    WHERE [schema].sometable.sym = reverse(sometable_1.sym)
    AND [schema].sometable.val IS NULL
    

    These are working versions. The instance of the table being updated isn't aliased, but an alias is needed for the other instance. I'm not sure if this fits your plan or not.

  6. Mike Bayer repo owner

    Ok that's a normal use of aliases, because the table is referred to twice (e.g. has nothing to do with the schema being present). the reason we had this thing was because SQL server didn't like seeing "x.y.z" in certain places; e.g. "y.z." was fine, "x.y.z", boom. Clear bug in their parser, if only I had documented the case (unfortunately back in 2006 I wasn't as good at keeping track of these things). I'm going to definitely make the whole thing into a flag, "legacy_schema_aliasing", you can turn it off which will fix your issue, then in 1.1 it'll be defaulted to off. which is great, we've slowly been yanking out these ill-conceived "magic helper" features in the mssql dialect for some time.

  7. Andrey Zholos reporter

    Sounds good. Thanks a lot!

    There's an SQL Server 2005 image on EC2 under "Community AMIs". I tried out some queries like my update and the select you mentioned at the beginning – works fine without aliases. Maybe they fixed the bug in a service pack, or maybe the bug was in an even more ancient version.

  8. Mike Bayer repo owner

    per #3430 also add tests for hints:

    diff --git a/test/dialect/mssql/test_compiler.py b/test/dialect/mssql/test_compiler.py
    index 0eb369e..2f3538c 100644
    --- a/test/dialect/mssql/test_compiler.py
    +++ b/test/dialect/mssql/test_compiler.py
    @@ -31,6 +31,14 @@ class CompileTest(fixtures.TestBase, AssertsCompiledSQL):
             self.assert_compile(t.select().with_hint(t, 'WITH (NOLOCK)'),
                                 'SELECT sometable.somecolumn FROM sometable WITH (NOLOCK)')
    
    +    def test_select_with_nolock_schema(self):
    +        m = MetaData()
    +        t = Table(
    +            'sometable', m, Column('somecolumn', Integer), schema="foo")
    +        self.assert_compile(
    +            t.select().with_hint(t, 'WITH (NOLOCK)'),
    +            'SELECT sometable.somecolumn FROM foo.sometable WITH (NOLOCK)')
    +
         def test_join_with_hint(self):
             t1 = table('t1',
                 column('a', Integer),
    
  9. Mike Bayer repo owner
    • Added a new dialect flag to the MSSQL dialect legacy_schema_aliasing which when set to False will disable a very old and obsolete behavior, that of the compiler's attempt to turn all schema-qualified table names into alias names, to work around old and no longer locatable issues where SQL server could not parse a multi-part identifier name in all circumstances. The behavior prevented more sophisticated statements from working correctly, including those which use hints, as well as CRUD statements that embed correlated SELECT statements. Rather than continue to repair the feature to work with more complex statements, it's better to just disable it as it should no longer be needed for any modern SQL server version. The flag defaults to True for the 1.0.x series, leaving current behavior unchanged for this version series. In the 1.1 series, it will default to False. For the 1.0 series, when not set to either value explicitly, a warning is emitted when a schema-qualified table is first used in a statement, which suggests that the flag be set to False for all modern SQL Server versions. fixes #3424 fixes #3430

    → <<cset 915791101353>>

  10. Andrey Zholos reporter

    Thanks for fixing this! By the way, your documentation is awesome: really clear and easy to navigate.

  11. Log in to comment