1. Michael Bayer
  2. sqlalchemy

Issues

Issue #2551 resolved

apparently INSERTs (and UPDATE, DELETE) can have CTEs in pg 9.1. , plus UPDATE, SELECT, DELETE in CTEs.

Michael Bayer
repo owner created an issue

apparently its all possible now. might as well make a new syntax bonanza out of it:

WITH upsert AS (
  UPDATE metric k SET k.count = k.count + 5
  WHERE event = "foo" AND interval = "D" and date = "whatever"
  RETURNING k.*
)
INSERT INTO metric (event, interval, date, count) 
SELECT ("foo", "D", "whatever", 5)
WHERE NOT EXISTS (
  SELECT 1 FROM upsert
);

http://www.postgresql.org/docs/9.1/static/sql-insert.html

Comments (38)

  1. Anonymous

    (original author: ejo) This is currently attached to component 'sqlite'; should be changed to postgresql.

  2. Michael Bayer reporter

    it's a little odd for this to be a "blocker" considering that CTE's in DML as well as CTEs in general are only recently supported by most databases :), but anyway, the general steps are:

    1. add cte() method to dml.py -> UpdateBase

    2. modify visit_cte() if necessary, might not be

    3. add rendering steps for ctes into compiler.py -> visit_insert, visit_update, visit_delete

    4. add lots of tests to test/sql/test_cte.py

  3. Robin T

    I made a quick, incomplete attempt at an implementation. It no longer raises an exception, but it's not correct: the CTE isn't placed at the start of the SQL statement. The compiler's .isupdate and .isinsert state attributes assume there's only a single DML statement to compile, so in the provided test case with an UPDATE CTE used by an INSERT, the compiler gets confused about whether the statement under compilation is an insert or update. I think a separate class ModifyingCTE(Alias), in the .dml module, works better than complicating the existing CTE class. Modifying CTEs have extra restrictions (no RECURSIVE, only UpdateBase as contained element), and having it in .dml avoids an import cycle with .selectable.

    https://github.com/robin900/sqlalchemy/compare/rt-writable-cte?expand=1

  4. Michael Bayer reporter

    when we deal with CRUD that compiles to a nested CRUD, (e.g. Insert -> Update, etc), can we just branch off a new Compiler for that sub-element somehow ? or branch off a "thing that has .insert .isupdate on it" ? The isinstance() checks there will kill us.

    also can ModifyingCTE be a CTE subclass? (maybe DMLCTE?)

  5. Robin T

    I bet ModifyingCTE(CTE) would be workable, if CTE.__init__ and Alias.__init__ no longer assume a Selectable element to contain. I was wondering if nested compilers were a good idea or not; I'll give that a try. Are there any existing cases where a nested compiler is used?

  6. Michael Bayer reporter

    the only thing that resembles a "nested" compiler is when a DDLCompiler calls into a SQLCompiler, which happens a lot. But these are very different kinds of compilers. A nested compiler in this case would most importantly need to make sure bound parameter state and anonymized identifiers are tracked between the two. hence this is really more of an idea to add a little more of a "stacked" behavior to inserts/updates/deletes. We could perhaps even use the current stack as a place to track these flags within compilation.

  7. Robin T

    Made an attempt at using the "subquery stack" and avoiding a nested compiler. One interesting thing is that compiler.py on master pushes DELETE and UPDATE statements onto self.stack and pops at end of visit method, yet visit_insert does not push the insert stmt onto self.stack. When I changes visit_insert to push inserts onto the stack (and pop at end of visit method), I got very spooky test errors: SQLiteExecutionContext has no attribute inserted_primary_keys, etc. I'm going to defer to you to tell me how best I might proceed integrating the stack:

    https://github.com/robin900/sqlalchemy/compare/rt-stacktuple?expand=1

  8. Stefan Urbanek

    Just would like to throw in my vote for that having WITH CTEs in DELETE (my current issue) and UPDATE is very useful in ETLs for better optimised queries. I haven't gone that deep as Robin T did, but I support what he wrote here.

  9. Fred Cox

    Should my original code from 2012 (CTE.sqlalchemy) work with 0.9.9? I'm finally upgrading from 0.7.8, and was hoping I wouldn't need it any more.

    I suppose, the next best thing would be that I can use the old technique.

    Please advise.

  10. Fred Cox

    Regarding Robin T's comment on 2014-10-08 stating that recursive CTEs need not be allowed for DML, I would like to say that our application does use recursive WITHs with DML.

    We have a hierarchical org structure where we specify a root org where we expect all rows associated with the suborgs to be updated in a single statement.

  11. Robin T

    I had based my comments on the documentation for CTEs at http://www.postgresql.org/docs/current/static/queries-with.html

    "Recursive self-references in data-modifying statements are not allowed."

    "Data-modifying statements in WITH are executed exactly once, and always to completion."

    Perhaps I'm misunderstanding the documentation. Is there a usage of DML in a WITH clause, with RECURSIVE, that somehow avoids a self-reference? Or is the documentation merely wrong?

  12. Robin T

    Fred Cox perhaps your DML is outside the recursive CTE itself? The postgres doc page I referenced above does mention such a technique to avoid the prohibition of DML in a recursive CTE, something like:

    WITH RECURSIVE included_parts(sub_part, part) AS (
        SELECT sub_part, part FROM parts WHERE part = 'our_product'
      UNION ALL
        SELECT p.sub_part, p.part
        FROM included_parts pr, parts p
        WHERE p.part = pr.sub_part
      )
    INSERT INTO collected_parts
      SELECT part FROM included_parts;
    

    The above example does use DML, but not in the recursive CTE itself. This sqlalchemy issue proposes ModifyingCTE only for CTE definitions that include DML within; DML outside the WITH clause would be handled as a regular DML statement.

  13. Makers_F

    I would like to add an extension of the first case which you can use to upsert multiple rows. I'm not sure if currently we can generate the new_data table.

    WITH
      new_data (id, value) AS (
        VALUES (1, 2), (3, 4), ...
      ),
      updated AS (
        UPDATE table t set
          value = t.value + new_data.value
        FROM new_data
        WHERE t.id = new_data.id
        RETURNING t.*
      )
    INSERT INTO table (id, value)
      SELECT id, value
      FROM new_data
      WHERE NOT EXISTS (
        SELECT 1 FROM updated WHERE updated.id = new_data.id
      )
    
  14. Michael Bayer reporter

    the VALUES construct is available through the following recipe: https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/PGValues It subclasses FromClause so can be fed into a CTE object. Without changes it produces "WITH foo AS (VALUES (1, 'textA', 99), (2, 'textB', 88)) AS myvalues (mykey, mytext, myint)", which might already be acceptable to PG. Otherwise the recipe will need changes. The update/insert/delete modifications here are fairly straightforward, support for a built-in PG VALUES() construct is a TODO.

  15. Michael Bayer reporter

    OK I have a working INSERT/UPDATE/DELETE as-cte-or-has-cte with tests in the ticket_2551 branch. A full patch of master...ticket_2551 is attached. I'll be adding migrations notes and merging soon but if people want to test out the patch (against master) please do so!

  16. Michael Bayer reporter

    wow I can feel the excitement among all 11 watchers and 7 voters for this issue! guess nobody tried it yet. my time is scant today but I'll see if I can get it closer to merged.

  17. inklesspen

    Yeah, sorry; see, I don't really have a test case sitting around for this, because it's been kind of a while since I asked about this, and I had to find other ways of doing the things.

    I still think it's a valuable addition, but I don't know how soon I can find a non-contrived use for it.

  18. Makers_F

    I have a test case, but no tests are setup to check if it's correct :feelsashamed:

    I might patch and see if the generated SQL matches the one I'm outputting now. I should be able to do it before one week

  19. Robin T

    Two reasons I have no urgent need for CTEs containing DML:

    1) In Postgres, DML CTEs are not execute atomically with the DML outside the CTE in the same statement, so a CTE with DML can't be employed to implement a safe upsert. I still have to obtain an explicit share-row-exclusive lock. So it's acceptable for my applications to issue the two DML statements separately in the context of a share-row-exclusive lock.

    2) With Postgres 9.5, there is now true upsert with the ON CONFLICT clause. My applications rather have support for ON CONFLICT in INSERT statements than any enhancements to CTEs to support DML.

  20. Michael Bayer reporter
    • CTE functionality has been expanded to support all DML, allowing INSERT, UPDATE, and DELETE statements to both specify their own WITH clause, as well as for these statements themselves to be CTE expressions when they include a RETURNING clause. fixes #2551

    → <<cset e5f1a3fb7dc1>>

  21. Log in to comment