Performance problem when saving new objects

Issue #1176 resolved
Former user created an issue

I have a test case that demonstrates a peculiar performance issue that arose in 0.5beta and did not appear in earlier versions of SA. The issue is that when I have several thousand objects in memory (fetched via prior user interaction with the application), saving newly created object appears to take longer and longer, in correspondence to the number of objects in memory. In the worst case (100,000 objects in memory) saving a newly created object takes nearly 20 seconds to commit! This is running against a very high performance FreeBSD server hosting PostgreSQL v8.3. The application is a Cocoa app running on Leopard 10.5.5 (intel) using the PyObjC bridge, but our test case shows that the problem exists irrespective of the PyObjC bridge.

In pseudo code, the unit test goes like this:

records = session.query(SomeTable).limit(1000).all()
new_record = SomeTable()
session.save(new_record)
session.commit()   #0.37s to save

records = session.query(SomeTable).limit(100000).all()
new_record = SomeTable()
session.save(new_record)
session.commit()   #19.72s to save

Any hints, clues or suggestions regarding how to ameliorate the effects of this bug are much appreciated!

Comments (6)

  1. Mike Bayer repo owner

    this behavior has not changed very little across all versions of SA, I'd be curious to know exactly what versions and what exact test code was in use where this was not observed. Adding 100000 objects to a session and only flushing once at the very end will take exponentially longer since the session performs a topological sort of all dirty/new objects - the larger the list of objects to sort, the slower the flush operation. You always need to issue a flush() every 1000 objects or so.

    closing this ticket since this is more of a mailing list issue.

    If you can provide a full test case, version info, and a dramatic difference in results, you can reopen this ticket.

  2. Mike Bayer repo owner

    also 20 seconds for the session to commit 100K objects is pretty fast, actually...you really should use a single insert() with executemany syntax for large bulk insert operations (which would complete in a couple of seconds typically).

  3. Former user Account Deleted

    Replying to zzzeek:

    also 20 seconds for the session to commit 100K objects is pretty fast, actually...you really should use a single insert() with executemany syntax for large bulk insert operations (which would complete in a couple of seconds typically).

    Awesome! Thanks for the rapid feedback. What is the mailing list you speak of? :-)

  4. Mike Bayer repo owner

    OK, I need to greatly correct what I said earlier since I misread your example. In fact you are only committing one object, and the previous 100K are just loaded from the database. So this is not at all related to the size of the flush() or anything like that.

    Reopening pending testing.

  5. Mike Bayer repo owner

    OK, turn off expire_on_commit. Its described here: http://www.sqlalchemy.org/docs/05/session.html#unitofwork_using_committing

    Test case:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    import time
    
    
    m = MetaData()
    
    stuff = Table('stuff', m, 
        Column("id", Integer, primary_key=True),
        Column("data", String(20)),
    )
    
    engine = create_engine('postgres://scott:tiger@127.0.0.1/test')
    
    m.drop_all(engine)
    m.create_all(engine)
    
    print "INSERTING 100K OBJECTS"
    engine.execute(stuff.insert(), [#%d' % i} for i in xrange(100000)]({'data':'item))
    
    print "LOADING 100K OBJECTs"
    class Stuff(object):
        def __init__(self, data):
            self.data = data
    
    mapper(Stuff, stuff)
    
    sess = sessionmaker(bind=engine, expire_on_commit=False)()
    obj = sess.query(Stuff).limit(100000).all()
    
    now = time.time()
    print "INSERTING ONE OBJECT"
    
    sess.add(Stuff(data='some more stuff'))
    sess.commit()
    
    print "total time for add + commit %f" % (time.time() - now)
    

    the flush/commit takes .27 seconds. With expire on commit, it takes 2 seconds. I'm assuming your objects are more complex.

  6. Log in to comment