Bulk save doesn't executemany() when model kwargs are different

Issue #3863 wontfix
Antoine Catton created an issue

Let's consider this dialect in test.py:

import sqlalchemy as sa
from sqlalchemy.dialects import registry
from sqlalchemy.dialects.sqlite.pysqlite import SQLiteDialect_pysqlite
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker


class CustomSQLiteDialect(SQLiteDialect_pysqlite):
    def do_executemany(self, cursor, statement, parameters, context=None):
        print("executemany(): statement={s!r}, params={p!r}".format(s=statement, p=parameters))
        super(CustomSQLiteDialect, self).do_executemany(cursor, statement, parameters, context)

    def do_execute(self, cursor, statement, parameters, context=None):
        print("EXECUTE!!(): statement={s!r}, params={p!r}".format(s=statement, p=parameters))
        super(CustomSQLiteDialect, self).do_execute(cursor, statement, parameters, context)

And this model:

Base = declarative_base()


class Model(Base):
    __tablename__ = 'model'

    id = sa.Column(sa.Integer, primary_key=True)
    foo = sa.Column(sa.String, default='foo')
    bar = sa.Column(sa.String, default='bar')

And run the code:

registry.register("sqlite.custom", "test", "CustomSQLiteDialect")


if __name__ == '__main__':
    engine = sa.create_engine('sqlite+custom:///:memory:')

    Base.metadata.create_all(engine)

    Session = sessionmaker(bind=engine)
    session = Session()


    bulk1 = [Model(foo='a', bar='b'), Model(foo='c', bar='d')]
    print("== Saving first bulk:")
    session.bulk_save_objects(bulk1)


    bulk2 = [Model(foo='a'), Model(bar='b')]
    print("== Saving second bulk:")
    session.bulk_save_objects(bulk2)

Here's the output:

EXECUTE!!(): statement=u"SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1", params=()
EXECUTE!!(): statement=u"SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1", params=()
EXECUTE!!(): statement=u'PRAGMA table_info("model")', params=()
EXECUTE!!(): statement=u'\nCREATE TABLE model (\n\tid INTEGER NOT NULL, \n\tfoo VARCHAR, \n\tbar VARCHAR, \n\tPRIMARY KEY (id)\n)\n\n', params=()
== Saving first bulk:
executemany(): statement=u'INSERT INTO model (foo, bar) VALUES (?, ?)', params=(('a', 'b'), ('c', 'd'))
== Saving second bulk:
EXECUTE!!(): statement=u'INSERT INTO model (foo, bar) VALUES (?, ?)', params=('a', 'bar')
EXECUTE!!(): statement=u'INSERT INTO model (foo, bar) VALUES (?, ?)', params=('foo', 'b')

I don't see a reason why the second bulk should do many execute().

Comments (1)

  1. Mike Bayer repo owner

    Hi, thanks for posting this.

    The reason the executemany doesn't occur here is because the ORM and Core are separate components where an attempt is made to minimize the intermingling of concerns as much as possible. In this case, the "default" on a Column is handled by the Core as to how it will be handled; it may be a simple scalar value as in your example, or it may be a SQL expression that would alter the form of the INSERT statement, or it may be a Python function that even itself may need to access the current statement execution context, that is, it has to happen in a specific place. For the ORM to organize these statements to the Core as appropriate for an executemany(), it would need to pre-populate the row with the default Python value, meaning it has to look into the Column default and interpret the value in this way, which already is stepping over the bounds into the job of the Core.

    The contract of the bulk methods is that the user would provide a pre-organized set of homogeneous structures, and the ORM will bypass a large amount of its normal bookkeeping logic in order to insert / update them with much less in-Python overhead. To begin adding new rules to reduce the work on the outside works against this goal. Simple fix here is just to look at those Column defaults on your end and pre-populate the values into the objects.

  2. Log in to comment