1.1.0b3 Core PostgreSQL INSERT CTE rollback bug with engine.execute and default pool_reset_on_return

Issue #3805 resolved
Christopher Wilson created an issue

I'm using the new insert CTEs to implement a select or insert in one statement using the engine.execute() approach. The default reset_on_return works fine for separate select() followed by an insert() as the insert transaction is COMMITted. However, using an insert CTE, the transaction is ROLLBACKed whenever the connection is returned to the pool.

MWE: using PostgreSQL 9.3.

import sqlalchemy as sa

metadata = sa.MetaData()

exTable = sa.Table('the_table_name',metadata,
        sa.Column('id', sa.Integer, primary_key = True),
        sa.Column('text', sa.TEXT, unique=True),
        )

theText = "some text here"
returningCols = [exTable.c.id]

connStr = 'postgresql+psycopg2://user:pass@server.example.com/the_example_db'

engine = sa.create_engine(connStr,
                         #pool_reset_on_return = 'commit', #uncomment this line for working example
                         )
lookup = exTable.select().with_only_columns(returningCols).where(exTable.c.text==theText)
lookupCTE = lookup.cte('selq')
insertQuery = exTable.insert().returning(*returningCols).from_select([exTable.c.yaml],sa.select([sa.literal(theText)]).where(~sa.exists(sa.select([sa.text('*')]).select_from(lookupCTE))))
insertCTE = insertQuery.cte('insq')

selParams = [sa.text('*')]
lookupSel = sa.select(selParams).select_from(lookupCTE)
lookupIns = sa.select(selParams).select_from(insertCTE)
query = lookupSel.union_all(lookupIns)
idres = engine.execute(query)
theId = None
for item in idres:
  print item
  theId = item['id']

idLookup = engine.execute(lookup)
print ("Should find ID: %d..." % (theId))
for item in idLookup:
  print "item found: %d" % item['id']

prints:

(1,)
Should find ID: 1...

Setting pool_reset_on_return = 'commit':

(2,)
Should find ID: 2...
item found: 2

This leads to confusing behavior: * if the data is present, the correct ID is returned, * if the data is not present, then the data is inserted, the corresponding ID returned, and then that ID is invalid as the transaction is rolled back at the end of the execute() call.

Yes, I can arbitrarily use the pool_reset_on_return='commit' option for the engine or manually wrap it in a transaction, but the expected behavior would be a COMMIT transaction on the use of INSERT, UPDATE, or DELETE CTEs, since that is the behavior of a separate INSERT command with the default pool_reset_on_return behavior:

idres = engine.execute(insertQuery)
theId = None
for item in idres:
  print item
  theId = item['id']

idLookup = engine.execute(lookup)
print ("Looking up row %d..." % (theId))
for item in idLookup:
  print "item found: %d" % item['id']

Gives:

(3,)
Looking up row 3...
item found: 3

Comments (13)

  1. Mike Bayer repo owner

    sure this is because the statement you're running is ultimately a SELECT, and a SELECT doesn't signal autocommit=True when a transaction is not otherwise being used. You can work around this using my_stmt = my_stmt.execution_options(autocommit=True).

    if you can let me know that resolves, we can work on possibly setting the flag automatically when a DML CTE is embedded into a SELECT.

  2. Christopher Wilson reporter

    Ooops, I was editing that description when you responded. My bad. Thanks for the work around and the fantastic product!

  3. Christopher Wilson reporter

    Thanks! query=query.execution_options(autocommit=True) does work. However, I do think the better option is an autocommit with a DML CTE in use.

  4. Christopher Wilson reporter

    Interesting behavior on the workaround. Specifying a query with execution_options(autocommit=True) as a cte ignores the execution_options. For example:

    insertQuery = exTable.insert().returning(*returningCols).from_select([exTable.c.text],sa.select([sa.literal(theText)]).where(~sa.exists(sa.select([sa.text('*')]).select_from(lookupCTE)))).execution_options(autocommit=True)
    insertCTE = insertQuery.cte('insq')
    

    At first inspection, I would expect the autocommit (if explicitly set) to be propagated to whatever query used the CTE. I realize there could be some interesting implementation edge cases if two CTEs were explicitly given different autocommit options.

    Of course, this could probably be mitigated by simply doing automatic inspection on the final query and setting the commit flag if the query contains a DML CTE.

  5. Mike Bayer repo owner

    IIUC we're talking here about autocommit on an inner query object propagating out to the outermost. So right now nothing does that. To fix this, we'd be looking to propagate autocommit=True to containing queries in all cases.

  6. Mike Bayer repo owner

    yeah and as far as if they conflict, shrugs, will probably just push towards True and not towards False. autocommit is not "harmful", theoretically everthing could just say COMMIT at the end rather than ROLLBACK.

  7. Christopher Wilson reporter

    Yes. You understand correctly. It could be an implementation to get the autocommit working for DML CTE's. Inserts/Updates/Deletes automatically set the autocommit as true in UpdateBase, allowing CTE to contain/propagate that seems a straightforward approach, depending on how easy it is to lift the execution_options from subqueries, and to prefer autocommit=True over autocommit = False.

    Are there any other cases besides autocommit where it would be useful to propagate options up from a subquery to the parent query unless explicitly overwritten on the parent query?

  8. Mike Bayer repo owner

    Propagate execution_options at compile stage

    Compiler can now set up execution options and additionally will propagate autocommit from embedded CTEs.

    Change-Id: I19db7b8fe4d84549ea95342e8d2040189fed1bbe Fixes: #3805

    → <<cset 71030d67bd7c>>

  9. Log in to comment