- changed milestone to blue sky
implement MERGE (and/or PG ON CONFLICT and/or MySQL REPLACE etc. etc.)
Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:
MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES (values)
Dialect support differs pretty widely. A quick & likely inaccurate poll:
- Oracle 9+ has a MERGE
- t-sql 2008 has a MERGE, earlier can maybe do
IF EXISTS(SELECT ...)
- MySQL is limited to a key violation condition, and can do either
INSERT ... ON DUPLICATE KEY UPDATE
orREPLACE INTO
, INSERT being preferable - SQLite is limited to a key violation condition, and has
INSERT .. ON CONFLICT REPLACE
Official response
Comments (31)
-
repo owner -
repo owner Here's a recipe I made for the SQLA tutorial:
from sqlalchemy.sql.expression import Insert class replace(Insert): pass # REPLACE is just like INSERT. Below is the easy road out: import re @compiles(replace, 'sqlite', 'mysql') def compile_replace(replace, compiler, **kw): stmt = compiler.sql_compiler.visit_insert(replace) return re.sub(r'^INSERT', 'REPLACE', stmt) # Or, take the hard road: from sqlalchemy.ext.compiler import compiles from sqlalchemy import bindparam @compiles(replace, 'sqlite', 'mysql') def compile_replace(replace, compiler, **kw): colspecs = {} for col in replace.table.c: # table defaults if col.default is not None: colspecs[col](col) = col.default.arg if replace.parameters: # statement parameters for key, value in replace.parameters.iteritems(): col = replace.table.c[key](key) colspecs[col](col) = bindparam(key, value) for k in compiler.column_keys: # names sent to execute() col = replace.table.c[k](k) colspecs.setdefault(col, bindparam(col.key)) return "REPLACE INTO %s (%s) VALUES (%s)" % ( replace.table.name, ",".join(c.name for c in colspecs), ",".join(compiler.process(v) for v in colspecs.values()) )
-
repo owner - changed milestone to 0.9.0
possible ORM flow:
obj = MyObject(id=1, data='data') session.replace(obj) # session.server_merge()? session.upsert() ? session.add(obj, merge=True) ? session.add(obj, upsert=True)? # 1. object must contain a full primary key. error if not. # 2. persistence.py treats these as INSERTs, since we have to assume an INSERT will occur.
-
repo owner session.merge(obj, hard=True)
?
-
repo owner - changed milestone to 0.x.xx
-
repo owner - changed milestone to 1.x.xx
-
MySQL's REPLACE is not an appropriate equivalent for MERGE.
REPLACE does a DELETE+INSERT which has different semantics than MERGE.
It also breaks when foreign keys exist (if ON DELETE CASCADE is on it will have unwanted cascaded deletes in other tables and if ON DELETE RESTRICT is on, the REPLACE will fail.)
-
repo owner the idea is that MERGE would be implemented directly but some kind of splitting-the-difference layer would need to be supported in order to provide a cross-database upsert.
-
I don't disagree with the approach, on the contrary.
Just mentioning that for MySQL there is no easy road - and REPLACE should probably not be used.
-
repo owner That's why this is probably the oldest issue still open :). I want nothing to do with it, really.
-
repo owner PG has added something for this, so we should attempt to support a limited UPSERT system.
PG has implemented upsert as an INSERT or UPDATE - e.g. for MySQL this is ON DUPLICATE KEY UPDATE. Unfortunately this leaves out SQLite has the only outlier that only has REPLACE.
we should also try to get some wisdom from the HN discussion at http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65.
We will need to deliver the "default" expressions as well as the "onupdate" values simultaneously to such a statement.
I'm a little puzzled though as far as "upsert" vs. "merge". The upserts we see in PG, MySQL and SQLite all work first and foremost with an INSERT that has brand new data in the VALUES clause. But everything I can find with MERGE does not use a VALUES clause, it can only MERGE from another select statement. Unless those backends support ad-hoc lists of values, I don't quite see at the moment how "merge" is even a superset of "upsert", considering "upsert" supports ad-hoc values.
-
repo owner - changed title to implement MERGE (and/or PG ON CONFLICT and/or MySQL REPLACE etc. etc.)
- edited description
-
repo owner Issue
#3529was marked as a duplicate of this issue. -
Postgres 9.5 is now out, with ON CONFLICT support. It would be good to be able to easily use this via SqlAlchemy.
-
repo owner sure, have an API? none of the various merge/upsert options act the same across databases. use cases are unknown. I personally never use upsert so I don't have good judgment on it.
-
This would be great. Working on a script now that could make use of this functionality.
-
I'm wondering what the best approach to implementing Postgres 9.5 ON CONFLICT would be for the expression layer (not the ORM). The specifics of ON CONFLICT make it seem to me that this is best kept specific to the postgresql dialect module, rather than attempting a dialect-independent addition to Insert. What would be the best way to attach an postgresql dialect ON CONFLICT clause to an dialect-independent Insert?
Quick idea 1:
from sqlalchemy.dialects.postgresql import on_conflict_do_update, on_conflict_do_nothing myvals = {'col1': 1, 'col2': 'foo'} myinsert = mytable.insert().values(**myvals) myupsert = on_conflict_do_update(myinsert, target=mytable.c.primary_key_col).values(**myvals)
Quick idea 2:
from sqlalchemy.dialects.postgresql import OnConflictDoUpdate, OnConflictDoNothing myvals = {'col1': 1, 'col2': 'foo'} on_conflict_clause = OnConflictDoUpdate(target=mytable.c.primary_key_col).values(**myvals) myupsert = mytable.insert(postgresql_on_conflict=on_conflict_clause).values(**myvals)
http://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT
Some notes:
- ON CONFLICT clause available only to INSERT statements, which aligns with other dialects' upserts such as MySQL ON DUPLICATE KEY, etc.
- ON CONFLICT clause supports many important expressions:
- conflict target: index, or constraint, or index expression, to use to detect conflicts; even an optional index predicate WHERE clause to target partial unique indexes is supported.
- conflict target even supports COLLATE clause and an opclass name for indexes where simple equality operator isn't appropriate.
- ON CONFLICT DO UPDATE SET (...) accepts the entire set of clauses that are accepted by a regular UPDATE foo SET (...). One can provide a WHERE clause to filter rows to update when a conflict is detected; one can perform scalar subqueries to produce values for assignment.
-
I created a working version of PG 9.5's INSERT ... ON CONFLICT on my fork:
https://bitbucket.org/robin900/sqlalchemy/branch/ticket_3529#diff
Since it's dialect-specific, I think maybe ticket
#3529should no longer be a duplicate of this ticket; let#3529be for the PG-dialect-specific addon for ON CONFLICT, and leave this ticket #960 for any dialect-agnostic MERGE/upsert feature. -
repo owner the theme of this issue is to implement the backend equivalents of MERGE including MySQL's "REPLACE" and PG's new feature. It is a completely safe assumption that the moment an API is introduced to support that of one backend, the entire planet will be breaking down the door to write code that works identically on the other. This is a very hard problem to solve and should be solved at least for MySQL / Postgresql before we go too far down any one road. The SQLAlchemy SQL API is extensible so recipes / examples / 3rd party packages on pypi that provide PG or MySQL's feature are all fine, but for inclusion in SQLA we need to at least make an attempt to address both backends. Adding a new issue just for PG's specific syntax IMO doesn't really help towards the goal of getting a good feature implemented in SQLA core (and even ORM).
-
repo owner - edited description
- changed status to wontfix
we support PG's ON CONFLICT REPLACE now in
#3529, that's the one that had the most requests. I've never seen anyone request Oracle's MERGE ever. MySQL, the request here would be for "ON DUPLICATE KEY UPDATE", not "REPLACE" as mentioned previously. However, this would also be implemented as a MySQL-specific construct the way we did for Postgresql, that is, trying to make a generic MERGE out of this is not something I think we are doing. if folks still want ON DUPLICATE KEY UPDATE then please open another bug for that. -
repo owner Issue
#3985was marked as a duplicate of this issue. -
repo owner - changed status to open
Current plan is to emulate the PG approach:
-
create a MySQL ON DUPLICATE KEY UPDATE variant of insert() - this is
#4009 -
create a SQLite INSERT OR REPLACE variant of insert() - this is #4010
-
if someone really cares, make a MERGE() statement that only works on SQL Server, Oracle (but is part of main sql package since its SQL standard)
-
people that want platform agnostic merge/upsert, they can combine these constructs into their own
@compiles
recipe:
class MyMerge(UpdateBase): def __init__(self, whatever, whaevr...) @compiles(MyMerge, "postgresql") def _pg_merge(element, compiler, **kw): construct = postgresql.insert(element.whtever).on_confict_do_update(whatever, ...) return compiler.process(construct, **kw)
-
@zzzeek If I inherit from
UpdateBase
I get the following errorAttributeError: 'Merge' object has no attribute '_returning'
. I've tried to inherit fromInsert
instead and it works, but feels like a hack.class Merge(Insert): def __init__(self, table, values, keys=None): super(Merge, self).__init__(table, values) self.keys = keys self.table = table self.values = values @compiles(Merge, 'postgresql') def postgres_merge(merge_stmt, compiler, **kwargs): stmt = postgresql.insert(merge_stmt.table, merge_stmt.values) column_names = next(iter(stmt.parameters)).keys() stmt = stmt.on_conflict_do_update( index_elements=merge_stmt.keys or stmt.table.primary_key, set_={name: getattr(stmt.excluded, name) for name in column_names}, ) return compiler.visit_insert(stmt)
Can you explain how to do that properly? In the docs there is example with inheriting from
UpdateBase
, but there's nothing about_returning
there. -
repo owner @warrior2031 I suggest trying to emulate the Postgresql ON CONFLICT patch for general guidance on how to make this work (and yes they subclass Insert). The patch and lots of discussion about it as well as the history of how it was developed can be seen here: https://gerrit.sqlalchemy.org/#/c/54/
-
@zzzeek I've implemented
ON DUPLICATE KEY UPDATE
for mysql and tested it manually. I'm reading tests foron_conflict_do_update
it seems to use some internal things like inheriting from fixture,@classmethod
that defines tables etc. Do I have to use (and understand subsequently) all that or can I use mypytest
to write tests? By the way - you can track progress and review the code in fork https://github.com/purpleP/sqlalchemy/tree/on_duplicate_key. -
repo owner @warrior2031 it would be best if you could try emulating some of the existing tests. For help on running them see the README.unittests.rst file.
-
@zzzeek I've added the test, but after thinking for a while I've came to conclusion that it would be easier (and it actually what really should be tested) to test that the statement is compiled correctly. I don't know about you and other contributors, but in past if I had to test that my sqlalchemy queries work I always tested it by running against artificially created data. And this always seemed wrong to me, because before writing sqlalchemy query I'm always writing sql query first and testing it on real data. So what I was testing is just that sqlalchemy will produce the same sql. But because sqlalchemy can produce more than one isomorphic sql from it's query I couldn't test it properly (because for example
VALUES
and columns in different order)But this time I've decided to try approach that seems more correct to me (we're testing compiler in the end, right? So why not test just that?). https://github.com/purpleP/sqlalchemy/blob/on_duplicate_key/test/dialect/mysql/test_on_duplicate.py
This test relies a bit on the order of iteration of items of dictionary passes to
update
parameter. So it kind of depends on implementation for now. But in case someone will broke that one can always parse theexpected_sql
a bit and test that things likebar = VALUES(bar), baz = VALUES(baz)
is the same asbaz = VALUES(baz), bar = VALUES(bar)
. This would take less code than creating all this artificial data and running queries against it and testing the results.What do you think about it?
-
repo owner for a while I've came to conclusion that it would be easier (and it actually what really should be tested) to test that the statement is compiled correctly.
I referred to https://gerrit.sqlalchemy.org/#/c/54 which has all the testing styles you need if you look at the code. Testing the compiled SQL is the primary kind of testing we do, in my link above, see that here: https://gerrit.sqlalchemy.org/#/c/54/30/test/dialect/postgresql/test_compiler.py
The tests that do round trips are also important. The round trip tests are important not because we're testing that the database works, but that all the statement execution mechanics around INSERT statements are exercised as not crashing or malfunctioning, including that all parameters given are correctly passed and processed, features like inserted_primary_keys don't crash, etc. For examples of round trip tests that had to be added after the fact to test bugs that occurred in the PG on conflict feature within the realm of parameter and execution handling (and not just "the right SQL"), see https://gerrit.sqlalchemy.org/#/c/370/ and https://gerrit.sqlalchemy.org/#/c/285/. you don't need to write all these tests as many of them won't apply but rudimental round trip tests are necessary.
-
@zzzeek Seems like I can't use mysql dialect insert inside of
@compiles
to provide customMerge
.!#python class Foo(Base): __tablename__ = 'foos' id = Column(Integer, primary_key=True) a = Column(String(10)) b = Column(String(10)) class Merge(Insert): def __init__(self, table, values): super(Merge, self).__init__(table, values) self.table = table self.values = values @compiles(Merge, 'mysql') def mysql_merge(merge_stmt, compiler, **kwargs): stmt = mysql.insert(merge_stmt.table, merge_stmt.values) update = {name: getattr(stmt.vals, name) for name in stmt.parameters[0]} stmt = stmt.on_duplicate_key_update(update=update) return compiler.process(stmt, **kwargs)
And this falls in
visit_on_duplicate_key_update
where I try to get columns from table to which values would be insertedcols = self.statement.table.c
but for some reasonself.statement
isMerge
instead ofInsert
. Which is probably because at some point earlier compiler started compilingMerge
and haven't done that yet, because it doesn't know thatMerge
is just a dummy statement. -
@zzzeek I've tried to rewrite tests using your framework.
from sqlalchemy.testing.assertions import eq_, assert_raises from sqlalchemy.testing import fixtures from sqlalchemy import testing from sqlalchemy.dialects.mysql import insert class OnDuplicateTest(fixtures.TablesTest): __only_on__ = 'mysql', __backend__ = True run_define_tables = 'each' @classmethod def define_tables(cls, metadata): Table( 'foos', MetaData(), Column('id', Integer, primary_key=True), Column('bar', String(10)), Column('baz', String(10)), ) def test_bad_args(self): assert_raises( ValueError, insert(self.tables.foos, values={}).on_duplicate_key_update ) def test_on_duplicate_key_update(self): foos = self.tables.foos with testing.db.connect() as conn: conn.execute(insert(foos, dict(id=1, bar='b'))) stmt = insert(foos, [dict(id=1, bar='ab'), dict(id=2, bar='b')]) stmt = stmt.on_duplicate_key_update(bar=stmt.vals.bar) result = conn.execute(stmt) eq_(result.inserted_primary_key [1]) eq_( conn.execute(foos.select().where(foos.c.id == 1).fetchall()), [(1, 'ab', None)] )
But they skipped for some reason.
est/dialect/mysql/test_on_duplicate.py::OnDuplicateTest::test_bad_args SKIPPED ============================================================================================================================ short test summary info ============================================================================================================================= SKIP [1] /home/michael/code/my_projects/sqlalchemy/test/../lib/sqlalchemy/testing/config.py:96: 'OnDuplicateTest' unsupported for implementation '('mysql',)'
I can't spot a difference from postgresql
OnConflictTest
and don't understand why are they skipped. -
repo owner Please ensure that you've famliarized with:
then please continue discussion re: writing tests and such on the SQLAlchemy mailing list at: https://groups.google.com/forum/#!forum/sqlalchemy
thanks!
- Log in to comment
Current plan is to emulate the PG approach:
create a MySQL ON DUPLICATE KEY UPDATE variant of insert() - this is
#4009create a SQLite INSERT OR REPLACE variant of insert() - this is #4010
if someone really cares, make a MERGE() statement that only works on SQL Server, Oracle (but is part of main sql package since its SQL standard)
people that want platform agnostic merge/upsert, they can combine these constructs into their own
@compiles
recipe: