1. Michael Bayer
  2. sqlalchemy
  3. Issues

Issues

Issue #960 open

implement MERGE (and/or PG ON CONFLICT and/or MySQL REPLACE etc. etc.)

jek
created an issue

Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:

MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES (values)

Dialect support differs pretty widely. A quick & likely inaccurate poll:

  • Oracle 9+ has a MERGE
  • t-sql 2008 has a MERGE, earlier can maybe do IF EXISTS(SELECT ...)
  • MySQL is limited to a key violation condition, and can do either INSERT ... ON DUPLICATE KEY UPDATE or REPLACE INTO, INSERT being preferable
  • SQLite is limited to a key violation condition, and has INSERT .. ON CONFLICT REPLACE

Official response

  • Michael Bayer repo owner
    • changed status to open

    Current plan is to emulate the PG approach:

    1. create a MySQL ON DUPLICATE KEY UPDATE variant of insert() - this is #4009

    2. create a SQLite INSERT OR REPLACE variant of insert() - this is #4010

    3. if someone really cares, make a MERGE() statement that only works on SQL Server, Oracle (but is part of main sql package since its SQL standard)

    4. people that want platform agnostic merge/upsert, they can combine these constructs into their own @compiles recipe:

    class MyMerge(UpdateBase):
       def __init__(self, whatever, whaevr...)
    
    @compiles(MyMerge, "postgresql")
    def _pg_merge(element, compiler, **kw):
       construct = postgresql.insert(element.whtever).on_confict_do_update(whatever, ...)
       return compiler.process(construct, **kw)
    

Comments (31)

  1. Michael Bayer repo owner

    Here's a recipe I made for the SQLA tutorial:

    from sqlalchemy.sql.expression import Insert
    
    class replace(Insert):
        pass
    
    # REPLACE is just like INSERT.   Below is the easy road out:
    
    import re
    @compiles(replace, 'sqlite', 'mysql')
    def compile_replace(replace, compiler, **kw):
    
        stmt = compiler.sql_compiler.visit_insert(replace)
        return re.sub(r'^INSERT', 'REPLACE', stmt)
    
    # Or, take the hard road:
    
    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy import bindparam
    
    @compiles(replace, 'sqlite', 'mysql')
    def compile_replace(replace, compiler, **kw):
        colspecs = {}
    
        for col in replace.table.c:  # table defaults
            if col.default is not None:
                colspecs[col](col) = col.default.arg
    
        if replace.parameters:   # statement parameters
            for key, value in replace.parameters.iteritems():
                col = replace.table.c[key](key)
                colspecs[col](col) = bindparam(key, value)
    
        for k in compiler.column_keys:  # names sent to execute()
            col = replace.table.c[k](k)
            colspecs.setdefault(col, bindparam(col.key))
    
        return "REPLACE INTO %s (%s) VALUES (%s)" % (
            replace.table.name,
            ",".join(c.name for c in colspecs),
            ",".join(compiler.process(v) for v in colspecs.values())
        )
    
  2. Michael Bayer repo owner
    • changed milestone to 0.9.0

    possible ORM flow:

    obj = MyObject(id=1, data='data')
    session.replace(obj)   # session.server_merge()?  session.upsert() ?  session.add(obj, merge=True) ? session.add(obj, upsert=True)?
    
    # 1. object must contain a full primary key.   error if not.
    # 2. persistence.py treats these as INSERTs, since we have to assume an INSERT will occur.
    
  3. Pantelis Theodosiou

    MySQL's REPLACE is not an appropriate equivalent for MERGE.

    REPLACE does a DELETE+INSERT which has different semantics than MERGE.

    It also breaks when foreign keys exist (if ON DELETE CASCADE is on it will have unwanted cascaded deletes in other tables and if ON DELETE RESTRICT is on, the REPLACE will fail.)

  4. Michael Bayer repo owner

    the idea is that MERGE would be implemented directly but some kind of splitting-the-difference layer would need to be supported in order to provide a cross-database upsert.

  5. Michael Bayer repo owner

    PG has added something for this, so we should attempt to support a limited UPSERT system.

    PG has implemented upsert as an INSERT or UPDATE - e.g. for MySQL this is ON DUPLICATE KEY UPDATE. Unfortunately this leaves out SQLite has the only outlier that only has REPLACE.

    http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65

    we should also try to get some wisdom from the HN discussion at http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65.

    We will need to deliver the "default" expressions as well as the "onupdate" values simultaneously to such a statement.

    I'm a little puzzled though as far as "upsert" vs. "merge". The upserts we see in PG, MySQL and SQLite all work first and foremost with an INSERT that has brand new data in the VALUES clause. But everything I can find with MERGE does not use a VALUES clause, it can only MERGE from another select statement. Unless those backends support ad-hoc lists of values, I don't quite see at the moment how "merge" is even a superset of "upsert", considering "upsert" supports ad-hoc values.

  6. Michael Bayer repo owner

    sure, have an API? none of the various merge/upsert options act the same across databases. use cases are unknown. I personally never use upsert so I don't have good judgment on it.

  7. Robin T

    I'm wondering what the best approach to implementing Postgres 9.5 ON CONFLICT would be for the expression layer (not the ORM). The specifics of ON CONFLICT make it seem to me that this is best kept specific to the postgresql dialect module, rather than attempting a dialect-independent addition to Insert. What would be the best way to attach an postgresql dialect ON CONFLICT clause to an dialect-independent Insert?

    Quick idea 1:

    from sqlalchemy.dialects.postgresql import on_conflict_do_update, on_conflict_do_nothing
    
    myvals = {'col1': 1, 'col2': 'foo'}
    myinsert = mytable.insert().values(**myvals)
    myupsert = on_conflict_do_update(myinsert, target=mytable.c.primary_key_col).values(**myvals)
    

    Quick idea 2:

    from sqlalchemy.dialects.postgresql import OnConflictDoUpdate, OnConflictDoNothing
    
    myvals = {'col1': 1, 'col2': 'foo'}
    on_conflict_clause = OnConflictDoUpdate(target=mytable.c.primary_key_col).values(**myvals)
    myupsert = mytable.insert(postgresql_on_conflict=on_conflict_clause).values(**myvals)
    

    http://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT

    Some notes:

    • ON CONFLICT clause available only to INSERT statements, which aligns with other dialects' upserts such as MySQL ON DUPLICATE KEY, etc.
    • ON CONFLICT clause supports many important expressions:
      • conflict target: index, or constraint, or index expression, to use to detect conflicts; even an optional index predicate WHERE clause to target partial unique indexes is supported.
      • conflict target even supports COLLATE clause and an opclass name for indexes where simple equality operator isn't appropriate.
    • ON CONFLICT DO UPDATE SET (...) accepts the entire set of clauses that are accepted by a regular UPDATE foo SET (...). One can provide a WHERE clause to filter rows to update when a conflict is detected; one can perform scalar subqueries to produce values for assignment.
  8. Michael Bayer repo owner

    the theme of this issue is to implement the backend equivalents of MERGE including MySQL's "REPLACE" and PG's new feature. It is a completely safe assumption that the moment an API is introduced to support that of one backend, the entire planet will be breaking down the door to write code that works identically on the other. This is a very hard problem to solve and should be solved at least for MySQL / Postgresql before we go too far down any one road. The SQLAlchemy SQL API is extensible so recipes / examples / 3rd party packages on pypi that provide PG or MySQL's feature are all fine, but for inclusion in SQLA we need to at least make an attempt to address both backends. Adding a new issue just for PG's specific syntax IMO doesn't really help towards the goal of getting a good feature implemented in SQLA core (and even ORM).

  9. Michael Bayer repo owner
    • edited description
    • changed status to wontfix

    we support PG's ON CONFLICT REPLACE now in #3529, that's the one that had the most requests. I've never seen anyone request Oracle's MERGE ever. MySQL, the request here would be for "ON DUPLICATE KEY UPDATE", not "REPLACE" as mentioned previously. However, this would also be implemented as a MySQL-specific construct the way we did for Postgresql, that is, trying to make a generic MERGE out of this is not something I think we are doing. if folks still want ON DUPLICATE KEY UPDATE then please open another bug for that.

  10. Michael Bayer repo owner
    • changed status to open

    Current plan is to emulate the PG approach:

    1. create a MySQL ON DUPLICATE KEY UPDATE variant of insert() - this is #4009

    2. create a SQLite INSERT OR REPLACE variant of insert() - this is #4010

    3. if someone really cares, make a MERGE() statement that only works on SQL Server, Oracle (but is part of main sql package since its SQL standard)

    4. people that want platform agnostic merge/upsert, they can combine these constructs into their own @compiles recipe:

    class MyMerge(UpdateBase):
       def __init__(self, whatever, whaevr...)
    
    @compiles(MyMerge, "postgresql")
    def _pg_merge(element, compiler, **kw):
       construct = postgresql.insert(element.whtever).on_confict_do_update(whatever, ...)
       return compiler.process(construct, **kw)
    
  11. Михаил Доронин

    Michael Bayer If I inherit from UpdateBase I get the following error AttributeError: 'Merge' object has no attribute '_returning'. I've tried to inherit from Insert instead and it works, but feels like a hack.

    class Merge(Insert):
        def __init__(self, table, values, keys=None):
            super(Merge, self).__init__(table, values)
            self.keys = keys
            self.table = table
            self.values = values
    
    @compiles(Merge, 'postgresql')
    def postgres_merge(merge_stmt, compiler, **kwargs):
        stmt = postgresql.insert(merge_stmt.table, merge_stmt.values)
        column_names = next(iter(stmt.parameters)).keys()
        stmt = stmt.on_conflict_do_update(
            index_elements=merge_stmt.keys or stmt.table.primary_key,
            set_={name: getattr(stmt.excluded, name) for name in column_names},
        )
        return compiler.visit_insert(stmt)
    

    Can you explain how to do that properly? In the docs there is example with inheriting from UpdateBase, but there's nothing about _returning there.

  12. Михаил Доронин

    Michael Bayer I've implemented ON DUPLICATE KEY UPDATE for mysql and tested it manually. I'm reading tests for on_conflict_do_update it seems to use some internal things like inheriting from fixture, @classmethod that defines tables etc. Do I have to use (and understand subsequently) all that or can I use my pytest to write tests? By the way - you can track progress and review the code in fork https://github.com/purpleP/sqlalchemy/tree/on_duplicate_key.

  13. Михаил Доронин

    Michael Bayer I've added the test, but after thinking for a while I've came to conclusion that it would be easier (and it actually what really should be tested) to test that the statement is compiled correctly. I don't know about you and other contributors, but in past if I had to test that my sqlalchemy queries work I always tested it by running against artificially created data. And this always seemed wrong to me, because before writing sqlalchemy query I'm always writing sql query first and testing it on real data. So what I was testing is just that sqlalchemy will produce the same sql. But because sqlalchemy can produce more than one isomorphic sql from it's query I couldn't test it properly (because for example VALUES and columns in different order)

    But this time I've decided to try approach that seems more correct to me (we're testing compiler in the end, right? So why not test just that?). https://github.com/purpleP/sqlalchemy/blob/on_duplicate_key/test/dialect/mysql/test_on_duplicate.py

    This test relies a bit on the order of iteration of items of dictionary passes to update parameter. So it kind of depends on implementation for now. But in case someone will broke that one can always parse the expected_sql a bit and test that things like bar = VALUES(bar), baz = VALUES(baz) is the same as baz = VALUES(baz), bar = VALUES(bar). This would take less code than creating all this artificial data and running queries against it and testing the results.

    What do you think about it?

  14. Michael Bayer repo owner

    for a while I've came to conclusion that it would be easier (and it actually what really should be tested) to test that the statement is compiled correctly.

    I referred to https://gerrit.sqlalchemy.org/#/c/54 which has all the testing styles you need if you look at the code. Testing the compiled SQL is the primary kind of testing we do, in my link above, see that here: https://gerrit.sqlalchemy.org/#/c/54/30/test/dialect/postgresql/test_compiler.py

    The tests that do round trips are also important. The round trip tests are important not because we're testing that the database works, but that all the statement execution mechanics around INSERT statements are exercised as not crashing or malfunctioning, including that all parameters given are correctly passed and processed, features like inserted_primary_keys don't crash, etc. For examples of round trip tests that had to be added after the fact to test bugs that occurred in the PG on conflict feature within the realm of parameter and execution handling (and not just "the right SQL"), see https://gerrit.sqlalchemy.org/#/c/370/ and https://gerrit.sqlalchemy.org/#/c/285/. you don't need to write all these tests as many of them won't apply but rudimental round trip tests are necessary.

  15. Михаил Доронин

    Michael Bayer Seems like I can't use mysql dialect insert inside of @compiles to provide custom Merge.

    !#python
    class Foo(Base):
        __tablename__ = 'foos'
        id = Column(Integer, primary_key=True)
        a = Column(String(10))
        b = Column(String(10))
    
    class Merge(Insert):
        def __init__(self, table, values):
            super(Merge, self).__init__(table, values)
            self.table = table
            self.values = values
    
    
    @compiles(Merge, 'mysql')
    def mysql_merge(merge_stmt, compiler, **kwargs):
        stmt = mysql.insert(merge_stmt.table, merge_stmt.values)
        update = {name: getattr(stmt.vals, name) for name in stmt.parameters[0]}
        stmt = stmt.on_duplicate_key_update(update=update)
        return compiler.process(stmt, **kwargs)
    

    And this falls in visit_on_duplicate_key_update where I try to get columns from table to which values would be inserted cols = self.statement.table.c but for some reason self.statement is Merge instead of Insert. Which is probably because at some point earlier compiler started compiling Merge and haven't done that yet, because it doesn't know that Merge is just a dummy statement.

  16. Михаил Доронин

    Michael Bayer I've tried to rewrite tests using your framework.

    from sqlalchemy.testing.assertions import eq_, assert_raises
    from sqlalchemy.testing import fixtures
    from sqlalchemy import testing
    from sqlalchemy.dialects.mysql import insert
    
    
    class OnDuplicateTest(fixtures.TablesTest):
        __only_on__ = 'mysql',
        __backend__ = True
        run_define_tables = 'each'
    
        @classmethod
        def define_tables(cls, metadata):
            Table(
                'foos', MetaData(),
                Column('id', Integer, primary_key=True),
                Column('bar', String(10)),
                Column('baz', String(10)),
            )
    
        def test_bad_args(self):
            assert_raises(
                ValueError,
                insert(self.tables.foos, values={}).on_duplicate_key_update
            )
    
        def test_on_duplicate_key_update(self):
            foos = self.tables.foos
            with testing.db.connect() as conn:
                conn.execute(insert(foos, dict(id=1, bar='b')))
                stmt = insert(foos, [dict(id=1, bar='ab'), dict(id=2, bar='b')])
                stmt = stmt.on_duplicate_key_update(bar=stmt.vals.bar)
                result = conn.execute(stmt)
                eq_(result.inserted_primary_key [1])
                eq_(
                    conn.execute(foos.select().where(foos.c.id == 1).fetchall()),
                    [(1, 'ab', None)]
                )
    

    But they skipped for some reason.

    est/dialect/mysql/test_on_duplicate.py::OnDuplicateTest::test_bad_args SKIPPED
    ============================================================================================================================ short test summary info =============================================================================================================================
    SKIP [1] /home/michael/code/my_projects/sqlalchemy/test/../lib/sqlalchemy/testing/config.py:96: 'OnDuplicateTest' unsupported for implementation '('mysql',)'
    

    I can't spot a difference from postgresql OnConflictTest and don't understand why are they skipped.

  17. Log in to comment