multirow inserts

Issue #2623 resolved
Mike Bayer repo owner created an issue

Comments (15)

  1. idank

    Replying to zzzeek:

    evaluate pullreq at https://bitbucket.org/sqlalchemy/sqlalchemy/pull-request/31/compiler-add-support-for-multirow-inserts/diff [BR][BR] this looks pretty good, but will need a lot of scrutiny. in particular id want to add some "round trip" tests for it, probably in test/sql/test_query.py, to make sure the bound parameter targeting is working. but this looks pretty straightforward.

    Not sure what you mean here exactly. If it's something I can pick up easily and do it myself, let me know so we can move forward with this.

  2. Mike Bayer reporter

    but round trips for this feature would be in test/sql/, not in the dialect test suite since it isn't testing a feature that individual dialects need to implement.

  3. idank

    Ok, I added a test and there seems to be a problem with the binding when the dialect doesn't support named params. I enabled multirow_insert in sqlite (seems to support since v3.7.11) and values are passed in the wrong order:

    diff --git a/lib/sqlalchemy/dialects/sqlite/base.py b/lib/sqlalchemy/dialects/sqlite/base.py
    --- a/lib/sqlalchemy/dialects/sqlite/base.py
    +++ b/lib/sqlalchemy/dialects/sqlite/base.py
    @@ -601,6 +601,7 @@
         supports_default_values = True
         supports_empty_insert = False
         supports_cast = True
    +    supports_multirow_insert = True
    
         default_paramstyle = 'qmark'
         execution_ctx_cls = SQLiteExecutionContext
    diff --git a/test/sql/test_query.py b/test/sql/test_query.py
    --- a/test/sql/test_query.py
    +++ b/test/sql/test_query.py
    @@ -48,6 +48,16 @@
         def teardown_class(cls):
             metadata.drop_all()
    
    +    def test_insert_multirow(self):
    +        users.insert(values=[           (7, 'jack'),
    +            (8, 'ed')](
    +)).execute()
    +
    +        # this produces the correct order
    +        users.insert(values=[           (7, 8),
    +            ('jack', 'ed')](
    +)).execute()
    +
         def test_insert_heterogeneous_params(self):
             """test that executemany parameters are asserted to match the
             parameter set of the first."""
    

    fails with:

    IntegrityError: (IntegrityError) datatype mismatch u'INSERT INTO query_users (user_id, user_name) VALUES (
    ?, ?), (?, ?)' (7, 8, 'jack', 'ed')
    

    It probably has something to do with this:

    diff --git a/lib/sqlalchemy/sql/compiler.py b/lib/sqlalchemy/sql/compiler.py
    --- a/lib/sqlalchemy/sql/compiler.py
    +++ b/lib/sqlalchemy/sql/compiler.py
    @@ -1565,8 +1578,18 @@
                 if c.key in parameters and c.key not in check_columns:
                     value = parameters.pop(c.key)
                     if sql._is_literal(value):
    -                    value = self._create_crud_bind_param(
    -                                    c, value, required=value is required)
    +                    if isinstance(value, list):
    +                        self.multirow = True
    +                        _values = []
    +                        for i, v in enumerate(value):
    +                            _values.append(self._create_crud_bind_param(
    +                                        c, v,
    +                                        required=value is required,
    +                                        name=c.key + str(i)))
    +                        value = _values
    +                    else:
    +                        value = self._create_crud_bind_param(
    +                                        c, value, required=value is required)
                     elif c.primary_key and implicit_returning:
                         self.returning.append(c)
                         value = self.process(value.self_group())
    

    which groups the values by column.

  4. Mike Bayer reporter

    positional params are figured out by looking at the order in which bindparam() objects were encountered. so you'd need to modify your routine to visit the parameters in the order in which they'd correspond to the list being passed in.

  5. idank

    Replying to zzzeek:

    positional params are figured out by looking at the order in which bindparam() objects were encountered. so you'd need to modify your routine to visit the parameters in the order in which they'd correspond to the list being passed in.

    I pushed a revised version here, in particular see this.

    It's still not perfect (errors out on missing values, etc.), but just to know that this is the right direction.

  6. Mike Bayer reporter

    OK, I've looked. here's how to do this. instead of flipping the per-VALUES parameters into per-column buckets in _process_colparams(), _process_colparams() should instead detect that it's being called in "multivalue" style, then take all the additional sets of parameters and stick them on an attribute right on the Insert object: "multi_parameters". The first set of parameters goes into the "parameters" dictionary just like it always does; multi_parameters is then always a list of all the additional dictionaries (the tuple-to-dict conversion happens for each of them as well inside of _process_colparams()).

    Then in the compiler, we just look for "if stmt.multi_parameters" down where you have the "if leftovers" thing. the isinstance() check and extra logic in the main table-column loop goes away - an isinstance() in that loop is more expense than I'd want to be adding. We can just test "if self.stmt.multi_parameters" wherever that logic is needed rather than adding a per-compiler "self.multirow" flag.

    Overall this is an impressive effort so far for a difficult feature add, keep up the good work !

  7. idank

    Alright, here's the revised series after your notes: #1, #2 and #3 (there's some profiling test failures that I haven't looked at yet).

    I have to say that the 'stmt.multi_parameters' thingy is a little funny. We could instead just return a list of rows and replace 'if stmt.multi_parameters' with 'if stmt.parameters > 1'.

  8. Mike Bayer reporter

    going to look now. "if stmt.parameters > 1", you mean, "if len(stmt.parameters) > 1", then that len() call shows up in profiling. those are those profiling tests that are already showing that performance overhead has been added.

  9. Mike Bayer reporter

    OK, I've reworked this a bit in 927b9859834096dd77182f935ff611351407f0dc and pushed to the main repo. Major aspects of this include that there's no function call overhead added to compiler.py, and also the functionality of Column-level defaults and server defaults is maintained when multi-valued inserts are used. lots of argument checks added to ValuesBase.values(), new documentation, and more tests to get values() covered completely.

    A diff of the full feature vs. prior to it is:

    diff:@f8caf05593a00a61d5ef6467c334c1e594fbae86:927b9859834096dd77182f935ff611351407f0dc

    Thanks for all the legwork on this one ! feature looks great.

  10. Anthony Tran

    I ran into an issue with multirow inserts today. Does this work correctly with sql function expressions? Doing something like the following: http://pastebin.com/2TGUc5fg, I noticed that in the error message, the generated query is a bit strange. The first set of values seems to be bound correctly (notice the GeomFromText and the "POINT(37.782551 -122.445368)" but the following values don't seem to have been set up correctly (notice that the GeomFromText is missing from the following values).

    EDIT: Also, is this the preferred way to communicate issues or is github the preferred way?

  11. Mike Bayer reporter

    new issues come onto bitbucket.

    However for an old issue like this, sometimes it's better just to post on the mailing list.

    Are you sure this isn't issue #3069? was fixed in 0.9.7. please confirm it still exists.

  12. Log in to comment