1. Michael Bayer
  2. sqlalchemy
  3. Issues


Issue #960 wontfix

implement MERGE (and/or PG ON CONFLICT and/or MySQL REPLACE etc. etc.)

jason kirtland
created an issue

Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:

MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)

Dialect support differs pretty widely. A quick & likely inaccurate poll:

  • Oracle 9+ has a MERGE
  • t-sql 2008 has a MERGE, earlier can maybe do IF EXISTS(SELECT ...)
  • MySQL is limited to a key violation condition, and can do either INSERT ... ON DUPLICATE KEY UPDATE or REPLACE INTO, INSERT being preferable
  • SQLite is limited to a key violation condition, and has INSERT .. ON CONFLICT REPLACE

Comments (20)

  1. Michael Bayer repo owner

    Here's a recipe I made for the SQLA tutorial:

    from sqlalchemy.sql.expression import Insert
    class replace(Insert):
    # REPLACE is just like INSERT.   Below is the easy road out:
    import re
    @compiles(replace, 'sqlite', 'mysql')
    def compile_replace(replace, compiler, **kw):
        stmt = compiler.sql_compiler.visit_insert(replace)
        return re.sub(r'^INSERT', 'REPLACE', stmt)
    # Or, take the hard road:
    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy import bindparam
    @compiles(replace, 'sqlite', 'mysql')
    def compile_replace(replace, compiler, **kw):
        colspecs = {}
        for col in replace.table.c:  # table defaults
            if col.default is not None:
                colspecs[col](col) = col.default.arg
        if replace.parameters:   # statement parameters
            for key, value in replace.parameters.iteritems():
                col = replace.table.c[key](key)
                colspecs[col](col) = bindparam(key, value)
        for k in compiler.column_keys:  # names sent to execute()
            col = replace.table.c[k](k)
            colspecs.setdefault(col, bindparam(col.key))
        return "REPLACE INTO %s (%s) VALUES (%s)" % (
            ",".join(c.name for c in colspecs),
            ",".join(compiler.process(v) for v in colspecs.values())
  2. Michael Bayer repo owner
    • changed milestone to 0.9.0

    possible ORM flow:

    obj = MyObject(id=1, data='data')
    session.replace(obj)   # session.server_merge()?  session.upsert() ?  session.add(obj, merge=True) ? session.add(obj, upsert=True)?
    # 1. object must contain a full primary key.   error if not.
    # 2. persistence.py treats these as INSERTs, since we have to assume an INSERT will occur.
  3. Pantelis Theodosiou

    MySQL's REPLACE is not an appropriate equivalent for MERGE.

    REPLACE does a DELETE+INSERT which has different semantics than MERGE.

    It also breaks when foreign keys exist (if ON DELETE CASCADE is on it will have unwanted cascaded deletes in other tables and if ON DELETE RESTRICT is on, the REPLACE will fail.)

  4. Michael Bayer repo owner

    the idea is that MERGE would be implemented directly but some kind of splitting-the-difference layer would need to be supported in order to provide a cross-database upsert.

  5. Michael Bayer repo owner

    PG has added something for this, so we should attempt to support a limited UPSERT system.

    PG has implemented upsert as an INSERT or UPDATE - e.g. for MySQL this is ON DUPLICATE KEY UPDATE. Unfortunately this leaves out SQLite has the only outlier that only has REPLACE.


    we should also try to get some wisdom from the HN discussion at http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65.

    We will need to deliver the "default" expressions as well as the "onupdate" values simultaneously to such a statement.

    I'm a little puzzled though as far as "upsert" vs. "merge". The upserts we see in PG, MySQL and SQLite all work first and foremost with an INSERT that has brand new data in the VALUES clause. But everything I can find with MERGE does not use a VALUES clause, it can only MERGE from another select statement. Unless those backends support ad-hoc lists of values, I don't quite see at the moment how "merge" is even a superset of "upsert", considering "upsert" supports ad-hoc values.

  6. Michael Bayer repo owner

    sure, have an API? none of the various merge/upsert options act the same across databases. use cases are unknown. I personally never use upsert so I don't have good judgment on it.

  7. Robin T

    I'm wondering what the best approach to implementing Postgres 9.5 ON CONFLICT would be for the expression layer (not the ORM). The specifics of ON CONFLICT make it seem to me that this is best kept specific to the postgresql dialect module, rather than attempting a dialect-independent addition to Insert. What would be the best way to attach an postgresql dialect ON CONFLICT clause to an dialect-independent Insert?

    Quick idea 1:

    from sqlalchemy.dialects.postgresql import on_conflict_do_update, on_conflict_do_nothing
    myvals = {'col1': 1, 'col2': 'foo'}
    myinsert = mytable.insert().values(**myvals)
    myupsert = on_conflict_do_update(myinsert, target=mytable.c.primary_key_col).values(**myvals)

    Quick idea 2:

    from sqlalchemy.dialects.postgresql import OnConflictDoUpdate, OnConflictDoNothing
    myvals = {'col1': 1, 'col2': 'foo'}
    on_conflict_clause = OnConflictDoUpdate(target=mytable.c.primary_key_col).values(**myvals)
    myupsert = mytable.insert(postgresql_on_conflict=on_conflict_clause).values(**myvals)


    Some notes:

    • ON CONFLICT clause available only to INSERT statements, which aligns with other dialects' upserts such as MySQL ON DUPLICATE KEY, etc.
    • ON CONFLICT clause supports many important expressions:
      • conflict target: index, or constraint, or index expression, to use to detect conflicts; even an optional index predicate WHERE clause to target partial unique indexes is supported.
      • conflict target even supports COLLATE clause and an opclass name for indexes where simple equality operator isn't appropriate.
    • ON CONFLICT DO UPDATE SET (...) accepts the entire set of clauses that are accepted by a regular UPDATE foo SET (...). One can provide a WHERE clause to filter rows to update when a conflict is detected; one can perform scalar subqueries to produce values for assignment.
  8. Michael Bayer repo owner

    the theme of this issue is to implement the backend equivalents of MERGE including MySQL's "REPLACE" and PG's new feature. It is a completely safe assumption that the moment an API is introduced to support that of one backend, the entire planet will be breaking down the door to write code that works identically on the other. This is a very hard problem to solve and should be solved at least for MySQL / Postgresql before we go too far down any one road. The SQLAlchemy SQL API is extensible so recipes / examples / 3rd party packages on pypi that provide PG or MySQL's feature are all fine, but for inclusion in SQLA we need to at least make an attempt to address both backends. Adding a new issue just for PG's specific syntax IMO doesn't really help towards the goal of getting a good feature implemented in SQLA core (and even ORM).

  9. Michael Bayer repo owner
    • edited description
    • changed status to wontfix

    we support PG's ON CONFLICT REPLACE now in #3529, that's the one that had the most requests. I've never seen anyone request Oracle's MERGE ever. MySQL, the request here would be for "ON DUPLICATE KEY UPDATE", not "REPLACE" as mentioned previously. However, this would also be implemented as a MySQL-specific construct the way we did for Postgresql, that is, trying to make a generic MERGE out of this is not something I think we are doing. if folks still want ON DUPLICATE KEY UPDATE then please open another bug for that.

  10. Log in to comment