Use column expressions with session.bulk_update_mappings, session.bulk_insert_mappings

Fabio Sussetto reporter

edited description

2015-10-27T16:57:09+00:00

Mike Bayer repo owner

the purpose of bulk_update_mappings is to get as much raw Python speed as possible while still having some degree of ORM features. This particular feature would require scanning through every value in every dictionary given and calling isinstance(value, ClauseElement) on it, which would really place a burden on that, unless we made some kind of rule that we only scan the first dictionary and assume the rest have that set up. If a subsequent dictionary doesn't have an expression there, it looks like it might send it out normally anyway just not in bulk.

But you'd still get an error if you had a SQL expression in the middle of the list.

Here's a patch

diff --git a/lib/sqlalchemy/orm/persistence.py b/lib/sqlalchemy/orm/persistence.py
index 1f9d0a2..0f9c7c9 100644
--- a/lib/sqlalchemy/orm/persistence.py
+++ b/lib/sqlalchemy/orm/persistence.py
@@ -429,6 +429,9 @@ def _collect_update_commands(

     """

+    if bulk:
+        bulk_value_keys = None
+
     for state, state_dict, mapper, connection, \
             update_version_id in states_to_update:

@@ -442,12 +445,27 @@ def _collect_update_commands(
         propkey_to_col = mapper._propkey_to_col[table]

         if bulk:
+            if bulk_value_keys is None:
+                bulk_value_keys = set([
+                    key for key in state_dict
+                    if isinstance(state_dict[key], sql.ClauseElement)
+                ])
+
             params = dict(
                 (propkey_to_col[propkey].key, state_dict[propkey])
                 for propkey in
                 set(propkey_to_col).intersection(state_dict).difference(
-                    mapper._pk_keys_by_table[table])
+                    mapper._pk_keys_by_table[table]).
+                difference(bulk_value_keys)
             )
+            if bulk_value_keys:
+                value_params.update(
+                    (propkey_to_col[propkey].key, state_dict[propkey])
+                    for propkey in
+                    set(propkey_to_col).intersection(state_dict).difference(
+                        mapper._pk_keys_by_table[table]).
+                    intersection(bulk_value_keys)
+                )
         else:
             params = {}
             for propkey in set(propkey_to_col).intersection(

with a test script as follows:

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base


def eq_(x, y):
    assert x == y

Base = declarative_base()


class A(Base):
    __tablename__ = 'a'
    id = Column(Integer, primary_key=True)
    x = Column(Integer)
    y = Column(Integer)

e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)

s = Session(e)
s.add_all([A(id=1, x=1, y=1), A(id=2, x=1, y=1), A(id=3, x=1, y=1)])
s.commit()

s.bulk_update_mappings(
    A,
    [
        {"id": 1, "x": 2, "y": A.y + 1},
        {"id": 2, "x": 2, "y": A.y + 1},
        {"id": 3, "x": 2, "y": A.y + 1},
    ]
)


eq_(
    s.query(A.id, A.x, A.y).order_by(A.id).all(),
    [(1, 2, 2), (2, 2, 2), (3, 2, 2)]
)


s.bulk_update_mappings(
    A,
    [
        {"id": 1, "x": 3, "y": A.y + 1},
        {"id": 2, "x": 3, "y": 10},
        {"id": 3, "x": 3, "y": 11},
    ]
)

eq_(
    s.query(A.id, A.x, A.y).order_by(A.id).all(),
    [(1, 3, 3), (2, 3, 10), (3, 3, 11)]
)


s.bulk_update_mappings(
    A,
    [
        {"id": 1, "x": 3, "y": 12},
        {"id": 2, "x": 3, "y": 13},
        {"id": 3, "x": 3, "y": 14},
    ]
)

we see three sets of updates. The first shows the feature working:

015-10-27 13:25:50,442 INFO sqlalchemy.engine.base.Engine UPDATE a SET x=?, y=(a.y + ?) WHERE a.id = ?
2015-10-27 13:25:50,443 INFO sqlalchemy.engine.base.Engine (2, 1, 1)
2015-10-27 13:25:50,443 INFO sqlalchemy.engine.base.Engine UPDATE a SET x=?, y=(a.y + ?) WHERE a.id = ?
2015-10-27 13:25:50,443 INFO sqlalchemy.engine.base.Engine (2, 1, 2)
2015-10-27 13:25:50,443 INFO sqlalchemy.engine.base.Engine UPDATE a SET x=?, y=(a.y + ?) WHERE a.id = ?
2015-10-27 13:25:50,443 INFO sqlalchemy.engine.base.Engine (2, 1, 3)

the second shows that if subsequent params aren't expressions, it still "works":

2015-10-27 13:25:50,445 INFO sqlalchemy.engine.base.Engine UPDATE a SET x=?, y=(a.y + ?) WHERE a.id = ?
2015-10-27 13:25:50,445 INFO sqlalchemy.engine.base.Engine (3, 1, 1)
2015-10-27 13:25:50,446 INFO sqlalchemy.engine.base.Engine UPDATE a SET x=?, y=? WHERE a.id = ?
2015-10-27 13:25:50,446 INFO sqlalchemy.engine.base.Engine (3, 10, 2)
2015-10-27 13:25:50,446 INFO sqlalchemy.engine.base.Engine UPDATE a SET x=?, y=? WHERE a.id = ?
2015-10-27 13:25:50,446 INFO sqlalchemy.engine.base.Engine (3, 11, 3)

for contrast, a pure mass update looks like:

2015-10-27 13:25:50,447 INFO sqlalchemy.engine.base.Engine UPDATE a SET x=?, y=? WHERE a.id = ?
2015-10-27 13:25:50,447 INFO sqlalchemy.engine.base.Engine ((3, 12, 1), (3, 13, 2), (3, 14, 3))

I guess though that bulk_update_mappings is still saving you the overhead of building/loading whole objects and all that so allowing SQL expressions may be worth it.

But then we also have to support them for inserts.

there's no plan to get into this anytime soon unless folks want to work on pull requests (with tests!!!)

2015-10-27T17:33:51+00:00

Mike Bayer repo owner

changed milestone to 1.2
changed title to Use column expressions with session.bulk_update_mappings, session.bulk_insert_mappings
marked as enhancement
changed component to orm

2015-10-27T17:34:28+00:00

Mike Bayer repo owner

this is in the docs too, not supported:

http://docs.sqlalchemy.org/en/rel_1_0/orm/session_api.html?highlight=bulk_update_mappings#sqlalchemy.orm.session.Session.bulk_update_mappings

Warning

The bulk update feature allows for a lower-latency UPDATE of rows at the expense of most other unit-of-work features. Features such as object management, relationship handling, and SQL clause support are silently omitted in favor of raw UPDATES of records.

2015-10-27T17:37:05+00:00

Mike Bayer repo owner

changed milestone to 1.x.xx

this was never intended as a feature, will consider pull requests though would have to evaluate the performance impact (it is probably fine).

2015-10-27T17:38:01+00:00

Fabio Sussetto reporter

Wow amazing reply, thanks. Will see if I can contribute to this shortly.

2015-10-27T17:53:22+00:00

Comments (6)