sqlalchemy does not correctly use executemany in oursql library

Mike Bayer repo owner

changed status to wontfix

I think you're misreading the documentation. "Oursql will only run an execute statement as an executemany if the input parameters are oursql wrapper types" is not at all what it says. It says exactly:

oursql also provides a number of ways to insert large amounts of data without buffering it all in memory.

...

Additionally, executemany() is lazy; if passed a generator or any other iterator which does produces values lazily, values will only be taken from the iterator immediately before they are sent to the database.

that is, this area of the documentation only refers to the fact that executemany can optionally use an iterator, in which case it will not buffer the Python iterator into memory and thus allowing a dataset larger than memory to be streamed through, nothing more. In fact, this phrase regarding executemany() is not even connected to the previous mentions of the IterWrapper, FileWrapper, etc. classes - those classes are specifically meant to stream ''values'', not rows. I think OurSQL's docs should probably be clarified here.

There is no mention whatsoever of the notion that executemany "doesn't work" if not passed an iterator:

http://packages.python.org/oursql/api.html#oursql.Cursor.executemany

Execute the same query with different sets of parameters. This is more efficient than calling execute 
many times with the same query.

It says plainly, "this is more efficient". There is no mention of the requirement to use an iterator.

The SQLAlchemy OurSQL dialect was contributed by the author of OurSQL itself, so I'm pretty confident that there's no bypassing of batch behavior here.

A separate enhancement might be for SQLAlchemy's execute() method to actually accept an iterator, which is passed through to DBAPIs that support such a feature. This would be doable in theory, though most DBAPIs don't actually support this usage and it would be kind of a major change for SQLAlchemy itself. If someone really wants to insert a number of rows larger than which fit into memory, it would be almost the same to just chunk 10000 rows at a time so there's not much rationale for such an enhancement.

2012-06-20T10:41:01+00:00

Comments (2)

Former user Account Deleted
- attached oursqlTest.py
Example of insert problem, needs mysql db to connect to.
- 2012-06-20T09:04:28+00:00
Mike Bayer repo owner
- changed status to wontfix
I think you're misreading the documentation. "Oursql will only run an execute statement as an executemany if the input parameters are oursql wrapper types" is not at all what it says. It says exactly:

oursql also provides a number of ways to insert large amounts of data without buffering it all in memory.

...

Additionally, executemany() is lazy; if passed a generator or any other iterator which does produces values lazily, values will only be taken from the iterator immediately before they are sent to the database.

that is, this area of the documentation only refers to the fact that executemany can optionally use an iterator, in which case it will not buffer the Python iterator into memory and thus allowing a dataset larger than memory to be streamed through, nothing more. In fact, this phrase regarding executemany() is not even connected to the previous mentions of the IterWrapper, FileWrapper, etc. classes - those classes are specifically meant to stream ''values'', not rows. I think OurSQL's docs should probably be clarified here.

There is no mention whatsoever of the notion that executemany "doesn't work" if not passed an iterator:
```
http://packages.python.org/oursql/api.html#oursql.Cursor.executemany

Execute the same query with different sets of parameters. This is more efficient than calling execute 
many times with the same query.
```
It says plainly, "this is more efficient". There is no mention of the requirement to use an iterator.

The SQLAlchemy OurSQL dialect was contributed by the author of OurSQL itself, so I'm pretty confident that there's no bypassing of batch behavior here.

A separate enhancement might be for SQLAlchemy's execute() method to actually accept an iterator, which is passed through to DBAPIs that support such a feature. This would be doable in theory, though most DBAPIs don't actually support this usage and it would be kind of a major change for SQLAlchemy itself. If someone really wants to insert a number of rows larger than which fit into memory, it would be almost the same to just chunk 10000 rows at a time so there's not much rationale for such an enhancement.
- 2012-06-20T10:41:01+00:00
Log in to comment