Complex CTE expressions misorder bound parameters

Issue #2521 resolved
Former user created an issue

I have developed a slightly complex request, using three CTE expressions before the actual query. The development was done on an MS Sql 2012 server, but the bug that I found reproduces easily on SQLite, as the attached demo will show.

The context is Kubuntu 12.04, with SQL Alchemy 0.7.8-1 (imported by me from Debian sid, ubuntu is still at 0.7.4 and it does not have CTEs).

The problem is easily demonstrates by running the attached script; it causes a run-time failure, OperationalError. The error message also demonstrates the nature of the problem: SQL Alchemy obviously chooses to replace constant values, used in clauses in the CTEs, with substitution parameters - but when the expression is sufficiently complex, it forgets to provide the actual parameters when the complete request is sent to the database server.

I regret that I cannot deliver a simpler case, but every attempt to simplify the request has caused the failure to disappear. I think that for the failure to happen, you need to have a CTE with parameters, a CTE without parameters, and then a final request with parameters. And the case is a serious request to a production database - I try to locate intervals with zero production from a time series of production figures.

Of course, it is entirely conceivable that I have made a mistake somewhere myself - I am quite new to SQL Alchemy. If that is the case, accept my apology in advance. But I do not think so - the request generated by SQL Alchemy looks exactly as I wish it to look, only there are too few actual substitution parameters.

Comments (8)

  1. Mike Bayer repo owner

    so far this doesn't seem like a SQLAlchemy issue. The SQL log you've attached was run against SQLite, which does not support common table expressions, so that is the source of that particular OperationalError.

    When run on SQL server, the error is "Statement(s) could not be prepared. ", which is a catchall error SQL server invokes when there is a syntactical problem. Running in on SQL server directly I get some more specifics, such as that when you call upon "datediff" you need to send "hour" as a literal:

    func.datediff(literal_column('hour'), x, y)
    

    it has some other issues with the structure, but it could be just that this query is too complex for SQL server. I tried running on PG which has a superior parser, and it doesn't report any parsing errors, though it still fails since functions like datediff() aren't present on PG.

    The number of bound parameters is correct. There are ten question marks and ten values, you can see this even in the log you attached:

    2012-06-22 11:09:22,571 INFO sqlalchemy.engine.base.Engine ('hour', 1.0, 0, 1, 1.0, 0, 1, 'hour', 4, 0)
    

    You need to work with the SQL directly and get it to work with SQL Server via the SQL console. Structurally it seems correct.

    The full statement is below with bound parameters inline. Please alter this statement directly to determine why SQL server can't run it - I think it's just hitting limitations of the database.

    Reopen this ticket when you identify the specific syntax that SQLAlchemy isn't converting correctly, so far it seems to be doing exactly what it's told.

    WITH 
        [changeTrack](changeTrack) AS(
                SELECT 
                    [srcTableName](srcTableName).[EngineId](EngineId) AS [EngineId](EngineId), 
                    CASE WHEN ([srcTableName](srcTableName).[ProductionMean](ProductionMean) >= 'hour') THEN 1.0 ELSE 0 END AS is_producing, 
                    lag(CASE WHEN ([srcTableName](srcTableName).[ProductionMean](ProductionMean) >= 1) THEN 1.0 ELSE 0 END) 
                        OVER (PARTITION BY [srcTableName](srcTableName).[EngineId](EngineId) ORDER BY [srcTableName](srcTableName).[TimeStamp](TimeStamp) ASC) AS was_producing,
    
                    [srcTableName](srcTableName).[TimeStamp](TimeStamp) AS [TimeStamp](TimeStamp), 
                    [srcTableName](srcTableName).[ProductionMean](ProductionMean) AS [ProductionMean](ProductionMean)FROM [srcTableName](srcTableName)
                    ),
    
        changes AS(
                SELECT 
                    [changeTrack](changeTrack).[EngineId](EngineId) AS [EngineId](EngineId), 
                    [changeTrack](changeTrack).is_producing AS is_producing, 
                    [changeTrack](changeTrack).[TimeStamp](TimeStamp) AS [TimeStamp](TimeStamp)
                FROM [changeTrack](changeTrack)
                WHERE [changeTrack](changeTrack).is_producing != [changeTrack](changeTrack).was_producing
            ),
    
        interval AS(
                SELECT 
                    changes.[EngineId](EngineId) AS [EngineId](EngineId), 
                    changes.is_producing AS is_producing, 
                    changes.[TimeStamp](TimeStamp) AS interval_start, 
                    lead(changes.[TimeStamp](TimeStamp)) OVER (PARTITION BY changes.[EngineId](EngineId) ORDER BY changes.[TimeStamp](TimeStamp) ASC) AS interval_end 
    FROM changes)
    
        SELECT 
            interval.[EngineId](EngineId), 
            interval.is_producing, 
            interval.interval_start, 
            interval.interval_end, 
            datediff(1, interval.interval_start, interval.interval_end) AS interval_lengthFROM interval
        WHERE datediff('hour', interval.interval_start, interval.interval_end) > 4 
            AND interval.is_producing = 0 
        ORDER BY interval.[EngineId](EngineId), interval.interval_start
    
  2. Former user Account Deleted

    Hi Mike, thank you for taking your time with this report.

    First thing, your hint about using the "literal_column" on the "hour" argument for "datediff" solved my actual problem: the request is now working correctly on the MS SQL server. So on the face of it, it really was a beginner's problem.

    On the other hand, I think that there may still be an SQL Alchemy problem hidden somewhere. As you correctly point out, the request has 10 question marks and 10 actual parameters. But the sequence of those parameters are wrong, that is what I observed, and did not think of counting them. You can actually see it in the version of the request with with parameters inline that you have constructed - there are two calls of the "datediff" function, but only one of them gets its "hour" parameter - the other "hour" parameter ends up in a comparison with production mean which is bound to fail.

    So the title of this ticket actually ought to be something like "Complex CTE expressions reorder substitution parameters".

    I cannot figure out whether this reordering is a side effect of the fact that I use the "hour" parameter wrongly, or whether it is a real problem. Certainly I have not been able to produce a test case that provokes a reordering without having the faulty parameter to "datediff".

    So unless you feel like investigating further, I suggest we leave the ticket as closed.

  3. Former user Account Deleted

    Hi Mike - don't apologize. I am just happy that I can help in hunting down a fixable bug in an obviously recent piece of code (the support of CTEs). I like the SQL Alchemy a lot - I think about it as kind of like the Python world's answer to LINQ, which I like a lot. And I definitely could have phrased my original ticket somewhat more precisely. Just happy that it ended up with a real fix of a real problem.

    Thanks for the good work.

    best regards

  4. Log in to comment