strange relation caching

Issue #129 resolved
Former user created an issue

When accessing a relation over a mapper, then updating a value in a row of the related table (not with the mapper) and then accessing again relation, the value is not updated.

Maybe that is expected behavior, I find it strange however. See the code below. I also saw that for the second access to the foo property no sql query was generated. So seemingly the Foo instance was cached somewhere.

from sqlalchemy import *

engine = create_engine('oracle://dsn=orcl&user=test&password=test')

foo = Table('foo', engine,
    Column('id', Integer, Sequence('foo_seq'), primary_key=True),
    Column('title', Unicode(255)))

bar = Table('bar', engine,
    Column('id', Integer, Sequence('bar_seq'), primary_key=True),
    Column('foo_id', Integer, ForeignKey('foo.id')))

all = bar, foo

for table in all:
    try: table.drop()
    except Exception, e: print e

for table in reversed(all):
    try: table.create()
    except Exception, e: print e

class Foo(object): pass
foos = mapper(Foo, foo)

class Bar(object): pass
bars = mapper(Bar, bar)
bars.add_property('foo', relation(foos))

foo.insert().execute(title='asfd')
bar.insert().execute(foo_id=1)

bars.selectfirst().foo
foo.update(foo.c.id==1).execute(title='sometitle')
assert bars.selectfirst().foo.title == 'sometitle' #this fails, uh?

Comments (12)

  1. Mike Bayer repo owner

    try adding a gc.collect() right before your test. now it passes. the attribute manager creates some circular references on objects, causing this one to stay around in the weak-valued identity map even though it had been de-referenced publically.

    To give you an option on this behavior, I added a function to the attributes package called "managed_attribute_dict()", which returns a blank dictionary. This dictionary is attached to every new object instance and is used very intensively for that object's attribute access. If you change this function to return a WeakValueDictionary, as in:

    import sqlalchemy.attributes
    sqlalchemy.attributes.managed_attribute_dict = lambda: weakref.WeakValueDictionary()
    

    then the circular reference goes away. However I am very hesitant to make this the default behavior as the attribute package has been shown to add a lot of overhead to the creation and manipulation of objects, such as for one guy who wanted to load 26000 objects in one shot, the difference between attribute-managed and plain objects is extremely significant. putting an extra weakref layer in there would push it even further out there.

    if this is really killing you, you cant restructure your program to work around it, and others complain, then we can reopen this one.

  2. Former user Account Deleted

    Hm, this is a significant problem for me. I look into your proposed workarounds.

    However, from a usage point of view, when I make the code give me back something from the database, I would expect it to do so regardless if there's still old instances around.

    I'm aware that this may lead to resource-leak problems, but to my recollection circular-reference is no problem for the automatic GC, just objects which override new are.

  3. Former user Account Deleted

    Actually, you never can rely on the garbage collector to do it's work in time for your application code to behave correctly, it is a particularity of CPython that he tends to do so.

    I don't really understand why ever you would return cached data rather then fresh one from the database.

    And by stating that you have to gc.collect() in order to get unreferenced objects to go away, in a long running application this will simply blow the ram.

    I had never to regularly do gc.collect() in any application that ran for ages with python...

  4. Former user Account Deleted

    If your library code relies on the garbage collector to work correctly, your library code should be the one operating the garbage collector...

  5. Mike Bayer repo owner

    i was not proposing explicit gc.collect() as a workaround, that was only to illustrate what the problem is. the workaround is the function I gave you which will eliminate the built-in circular reference SA places on objects. also there is another function I just added called "expunge" which can force the removal of any object from the current session:

       objectstore.expunge(obj)
    

    Anyway, if any of your objects have their own circular refs, youre going to have the same problem, even if you use the function I gave you to eliminate the implicit circular references.

    SA has no better option here since the usage of an identity map is essential.

    The reason people normally wouldnt have this problem is because they use session-based application development. everything that is loaded within a session is now "in the session". if you are sneaking behind the session's back and issuing insert/update statements, then you should clear out your session before loading data back in, or expunge/refresh/expire those objects that you need to be fresh.

    I try re-running my profiler tests with and without the weak-value adjustment to see how much more overhead it creates.

  6. Former user Account Deleted
    • changed status to open
    • removed status

    I understand this correctly, you cache objects within a "session".

    • What when you have two "sessions" for instance, in two threads?
    • What when you have two "sessions" on two different machines
    • What when you run data maintenance jobs not in any way connected to SA
    • What if you prefer to use mapping for some things, but not mapping for other stuff
    • What if you run a script doing some work on your database...
    • what if x

    I like totally fail to understand that at all.

  7. Former user Account Deleted

    In my opinion SA is plain broken here, and there's a major headache rolling your way.

    When you're automatically assuming all the world will never do any data sneaking past SAs sessions mapper feature you're like guaranteed 100% totally wrong. That's a childish and wrong assumption for anything remotely connected with reality.

  8. Former user Account Deleted

    Your data handling code must conform ACID and most importantly, it mustn't break when people decide not to use your data handling code for changing some data.

    When you break that apart you've not understood the first thing about data handling.

  9. Mike Bayer repo owner

    if you "sneak" past the mapper with raw SQL, then you should be responsible for expiring whatever objects it is that were affected by that SQL. SQLAlchemy certainly cant guess.

    if you would like to propose an API that meets your requirements and basically shuts off all identity map functions, that is a lot more productive than namecalling. if you would like a preview of such behavior, apply this patch to lib/sqlalchemy/mapping/mapper.py:

    Index: lib/sqlalchemy/mapping/mapper.py
    ===================================================================
    --- lib/sqlalchemy/mapping/mapper.py    (revision 1213)
    +++ lib/sqlalchemy/mapping/mapper.py    (working copy)
    @@ -815,6 +815,7 @@
             # look in main identity map.  if its there, we dont do anything to it,
             # including modifying any of its related items lists, as its already
             # been exposed to being modified by the application.
    +        populate_existing = True
             identitykey = self._identity_key(row)
             if objectstore.get_session().has_key(identitykey):
                 instance = objectstore.get_session()._get(identitykey)
    

    this will cause all select operations to refresh all the properties of an object each time it is encountered in a result set, blowing away whatever changes you have made.

    Please dont reopen this ticket; this is an issue tracking system, not a bulletin board for flamewars.

  10. Former user Account Deleted

    Assuming by default that your data is only ever going to be modified by a single process with a specific set of APIs is so obviously flawed, i'm outright baffled.

    Yes, I'm going to call objectstore.clear() then everywhere?

  11. Mike Bayer repo owner

    so flawed that its the central architecture to Hibernate, the most dominant and popular ORM in the world, as well as a key concept in Martin Fowlers "Patterns of Enterprise Architecture", one of the most popular books on enterprise patterns available. is it possible that the architecture is just so fundamentally flawed for any realistic environment, or you just have made no effort whatsoever to understand this enormously popular pattern ?

    seriously, nobody is forcing you to use SQLAlchemy, i fail to see your interest in a product that you are convinced is so flawed, and am similarly baffled by your endless liturgy of complaints with nary a productive patch or contribution of any kind. Please stop writing on this ticket.

  12. Log in to comment