Issues

Issue #960 new

implement MERGE

jason kirtland
created an issue

Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:

MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES (values)

Dialect support differs pretty widely. A quick & likely inaccurate poll:

  • Oracle 9+ has a MERGE
  • t-sql 2008 has a MERGE, earlier can maybe do IF EXISTS(SELECT ...)
  • MySQL is limited to a key violation condition, and can do either INSERT ... ON DUPLICATE KEY UPDATE or REPLACE INTO, INSERT being preferable
  • SQLite is limited to a key violation condition, and has INSERT .. ON CONFLICT REPLACE

Comments (11)

  1. Mike Bayer repo owner

    Here's a recipe I made for the SQLA tutorial:

    from sqlalchemy.sql.expression import Insert
    
    class replace(Insert):
        pass
    
    # REPLACE is just like INSERT.   Below is the easy road out:
    
    import re
    @compiles(replace, 'sqlite', 'mysql')
    def compile_replace(replace, compiler, **kw):
    
        stmt = compiler.sql_compiler.visit_insert(replace)
        return re.sub(r'^INSERT', 'REPLACE', stmt)
    
    # Or, take the hard road:
    
    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy import bindparam
    
    @compiles(replace, 'sqlite', 'mysql')
    def compile_replace(replace, compiler, **kw):
        colspecs = {}
    
        for col in replace.table.c:  # table defaults
            if col.default is not None:
                colspecs[col](col) = col.default.arg
    
        if replace.parameters:   # statement parameters
            for key, value in replace.parameters.iteritems():
                col = replace.table.c[key](key)
                colspecs[col](col) = bindparam(key, value)
    
        for k in compiler.column_keys:  # names sent to execute()
            col = replace.table.c[k](k)
            colspecs.setdefault(col, bindparam(col.key))
    
        return "REPLACE INTO %s (%s) VALUES (%s)" % (
            replace.table.name,
            ",".join(c.name for c in colspecs),
            ",".join(compiler.process(v) for v in colspecs.values())
        )
    
  2. Mike Bayer repo owner
    • changed milestone to 0.9.0

    possible ORM flow:

    obj = MyObject(id=1, data='data')
    session.replace(obj)   # session.server_merge()?  session.upsert() ?  session.add(obj, merge=True) ? session.add(obj, upsert=True)?
    
    # 1. object must contain a full primary key.   error if not.
    # 2. persistence.py treats these as INSERTs, since we have to assume an INSERT will occur.
    
  3. Pantelis Theodosiou

    MySQL's REPLACE is not an appropriate equivalent for MERGE.

    REPLACE does a DELETE+INSERT which has different semantics than MERGE.

    It also breaks when foreign keys exist (if ON DELETE CASCADE is on it will have unwanted cascaded deletes in other tables and if ON DELETE RESTRICT is on, the REPLACE will fail.)

  4. Mike Bayer repo owner

    the idea is that MERGE would be implemented directly but some kind of splitting-the-difference layer would need to be supported in order to provide a cross-database upsert.

  5. Mike Bayer repo owner

    PG has added something for this, so we should attempt to support a limited UPSERT system.

    PG has implemented upsert as an INSERT or UPDATE - e.g. for MySQL this is ON DUPLICATE KEY UPDATE. Unfortunately this leaves out SQLite has the only outlier that only has REPLACE.

    http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65

    we should also try to get some wisdom from the HN discussion at http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65.

    We will need to deliver the "default" expressions as well as the "onupdate" values simultaneously to such a statement.

    I'm a little puzzled though as far as "upsert" vs. "merge". The upserts we see in PG, MySQL and SQLite all work first and foremost with an INSERT that has brand new data in the VALUES clause. But everything I can find with MERGE does not use a VALUES clause, it can only MERGE from another select statement. Unless those backends support ad-hoc lists of values, I don't quite see at the moment how "merge" is even a superset of "upsert", considering "upsert" supports ad-hoc values.

  6. Log in to comment