session.add_all() fails when working with large datasets

Issue #1383 resolved
Former user created an issue

I tried to write 736,475 rows to my SQLite DB and got this error. I'm running Python 2.5.1 on Mac OS 10.5.6

Justins-Macbook-Pro:python code jxxxxxx$ Python(4232,0xa0363720) malloc: mmap(size=2097152) failed (error code=12) error: can't allocate region set a breakpoint in malloc_error_break to debug Python(4232,0xa0363720) malloc: mmap(size=2097152) failed (error code=12) error: can't allocate region set a breakpoint in malloc_error_break to debug Traceback (most recent call last): File "insertJobs.py", line 60, in <module> main() File "insertJobs.py", line 57, in main session.commit() File "/Library/Python/2.5/site-packages/SQLAlchemy-0.5.3-py2.5.egg/sqlalchemy/orm/session.py", line 673, in commit self.transaction.commit() File "/Library/Python/2.5/site-packages/SQLAlchemy-0.5.3-py2.5.egg/sqlalchemy/orm/session.py", line 378, in commit self._prepare_impl() File "/Library/Python/2.5/site-packages/SQLAlchemy-0.5.3-py2.5.egg/sqlalchemy/orm/session.py", line 362, in _prepare_impl self.session.flush() File "/Library/Python/2.5/site-packages/SQLAlchemy-0.5.3-py2.5.egg/sqlalchemy/orm/session.py", line 1351, in flush self._flush(objects) File "/Library/Python/2.5/site-packages/SQLAlchemy-0.5.3-py2.5.egg/sqlalchemy/orm/session.py", line 1428, in _flush transaction.rollback() MemoryError

Comments (3)

  1. Mike Bayer repo owner

    your system ran out of memory. If you have thousands, much less hundreds of thousands, of objects to be persisted, you should only deal with smaller chunks of a few thousand at a time. Furthermore, using the ORM for hundreds of thousands of rows is needlessly wasteful, as it does a lot of work related to having those objects intelligently available within the session, which is plainly unnecessary for a bulk insert of nearly a million rows. You should construct your data into plain dictionaries and use direct SQL expressions to insert them, thereby eliminating a tremendous amount of needless overhead. But no matter which approach, batching is still needed, as 700k rows would require a huge allocation of memory that will tax the system regardless of approach.

  2. Former user Account Deleted

    Replying to zzzeek:

    your system ran out of memory. If you have thousands, much less hundreds of thousands, of objects to be persisted, you should only deal with smaller chunks of a few thousand at a time. Furthermore, using the ORM for hundreds of thousands of rows is needlessly wasteful, as it does a lot of work related to having those objects intelligently available within the session, which is plainly unnecessary for a bulk insert of nearly a million rows. You should construct your data into plain dictionaries and use direct SQL expressions to insert them, thereby eliminating a tremendous amount of needless overhead. But no matter which approach, batching is still needed, as 700k rows would require a huge allocation of memory that will tax the system regardless of approach.

    Thank you very much zzzeek. I appreciate your feedback and thoughtful suggestions. I will break up the problem to reduce the load. Have a nice night!

  3. Log in to comment