fully weak referenced identity map/session

Issue #1398 resolved
Mike Bayer repo owner created an issue

in fact, the identity map weak referencing system is not weakly referenced in most cases, unless the session is rolled back or expired. InstanceState references the object's __dict__, which in turn references any objects loaded via relation(). It is of course a good thing that we store objects in dict as opposed to dynamically loading related objects each time, but this is a drawback to that.

the two solutions would be to not store related objects/collections directly in __dict__ but via weakref, which the AttributeImpl resolves for us, or for InstanceState to weakly reference __dict__.

storing related objects in __dict__ seems to have a lower performance impact until you consider that collections and related objects now fall out of scope and are continuously re-loaded, unless they have pending changes.

weak-referencing state.dict adds a tremendous call count overhead to virtually all operations.

Another option is then breaking out the idea of "instance_state(object)" into something more elaborate: "instance_dict(object)", where "instance_dict" is again an attrgetter by default, or if a custom dict routine has been established it becomes "instance_state(object).dict()", where "dict()" is a weak reference to the dictionary. This might not be that painful to implement, as the key places would be InstrumentedAttribute get/set/delete, mapper.populate_state, and strategies create_row_processor.

then again we could just say, hey the session will strongly reference some things until you expire/rollback/commit it, and not just things that are "dirty". But if we go that route I feel like a clearer policy of what is strongly referenced would be helpful, and at the very least this needs much more explicit documentation.

Comments (7)

  1. Mike Bayer reporter

    So here's a current method:

        def get_history(self, state, passive=PASSIVE_OFF):
            if self.key in state.dict:
                return History.from_attribute(self, state, state.dict[self.key](self.key))
    

    We can either go with something like this:

        def get_history(self, state, instance, passive=PASSIVE_OFF):
            dict = state.instance_dict(instance)
            if self.key in dict:
                return History.from_attribute(self, state, dict[self.key](self.key))
    

    where instance_dict defaults to attrgetter

    or just leave it the old way, and state.dict is just a weak reference.

    The issue comes down to, is attrgetter so much faster than weakref?

  2. Mike Bayer reporter

    we think that the issue only occurs when backrefs are at play (though I seem to recall seeing it happen without a backref being involved....). I think we might keep this tabled for now.

  3. Mike Bayer reporter

    This is still on my mind.

    The general method of no longer referencing __dict__ directly has the issue that when the object is dereferenced, we don't have access to __dict__ anymore and we can't resurrect the object, if any attributes have changed.

    The solution to that is to establish a strong reference upon the "modified" event. this would be great as we could throw away the whole resurrect idea completely.

    However, as usual mutable attributes make this method impossible. We can't get "modified" events for mutable attributes.

    Ways to work around would be:

    1. do get modified events for mutable attributes- this would require a wrapper that detects changes. This is pretty unfeasable as we make no assumptions about what people are storing in a "mutable" attribute and would be chock full of complexity and surprises.
    2. Leave the "resurrect" system in place for instance states that contain mutable attributes. This is a high degree of surprise as the behavior of an app changes as soon as a mutable attribute is added, and we have to keep all the complexity we had before.
    3. For mutable attributes, create a strong ref on access to the mutable attribute (i.e. __get__()). This still keeps the surprise of "my objects aren't getting auto-cleared" around but does limit it to a very specific and easy-to-identify scenario. the code remains simple as we don't need a resurrect phase.
    4. ignore "mutable attributes" as special. if you don't do a __set__(), and you change the contents of your mutable attribute, you have to maintain a reference to the object. this is probably worse than just removing support for "mutable attributes" entirely.
    5. remove "mutable attributes" entirely in 0.6. you have to do a __set__() or we don't even see it as changed. Again the advantage here is that its cut and dry, and works more like a relational database does anyway. Disadvantage is upgrade pain for users and the loss of a feature that is probably useful to some people.
    6. maintain a dict that only stores "mutable" values - so you do a __get__(), the value is added to the dict, then we still maintain a resurrect phase that resurrects only those attributes, and mapper._save_obj() knows how to handle an object like that. The only disadvantage here is that MapperExtension objects that implement before_update() or after_update() now get an instance with unexpected state (although, the state is not unlike the object's attributes had just been expired). We'd need to document this. Also, it's an elaborate system that almost never gets invoked, adding a potential surprise to mapperextension implementations and complexity to the codebase that almost never gets used. but this solution does solve the mutable/weakref problem completely.
  4. Mike Bayer reporter
    • changed milestone to 0.5.4

    more thoughts on this, I think this will be worth it in a few ways.

    the InstanceState and IdentityManagedState should be reorganized into InstanceState and MutableAttrInstanceState. The mapper's awareness of this and the "instance_state_factory" should be removed. The "mutable" version gets invoked directly by ClassManager when mutable_attributes is non empty. This allows us to then move all of the resurrection stuff to the "mutable" version of instancestate. None of it will occur for the regular instance state which will improve performance during synchronous GC events.

    Additionally, the check_modified() method goes away on InstanceState - direct attribute access will be used, and will be subclassed by a descriptor on the mutable version. This will enormously speed up flushes of large sets of objects since _dirty_states will usually be a single method call instead of N method calls. Other "mutable_attribute" conditional crap in InstanceState can be pushed out as needed to the subclass, like the extra work in unmodified for example.

    the two state classes should be in a new module sqlalchemy.orm.state, and a new set of unit tests specifically exercising everything about "mutable" state should be added.

  5. Mike Bayer reporter

    more comments.

    this is also good for 0.5.4 - no public APIs are changing and I'd like to see improved performance in the 0.5 series.

    I'd like to go for #6 as the approach to mutable attributes.

  6. Log in to comment