fully weak referenced identity map/session
in fact, the identity map weak referencing system is not weakly referenced in most cases, unless the session is rolled back or expired. InstanceState
references the object's __dict__
, which in turn references any objects loaded via relation()
. It is of course a good thing that we store objects in dict as opposed to dynamically loading related objects each time, but this is a drawback to that.
the two solutions would be to not store related objects/collections directly in __dict__
but via weakref, which the AttributeImpl resolves for us, or for InstanceState to weakly reference __dict__
.
storing related objects in __dict__
seems to have a lower performance impact until you consider that collections and related objects now fall out of scope and are continuously re-loaded, unless they have pending changes.
weak-referencing state.dict adds a tremendous call count overhead to virtually all operations.
Another option is then breaking out the idea of "instance_state(object)" into something more elaborate: "instance_dict(object)", where "instance_dict" is again an attrgetter by default, or if a custom dict routine has been established it becomes "instance_state(object).dict()", where "dict()" is a weak reference to the dictionary. This might not be that painful to implement, as the key places would be InstrumentedAttribute get/set/delete, mapper.populate_state, and strategies create_row_processor.
then again we could just say, hey the session will strongly reference some things until you expire/rollback/commit it, and not just things that are "dirty". But if we go that route I feel like a clearer policy of what is strongly referenced would be helpful, and at the very least this needs much more explicit documentation.
Comments (7)
-
reporter -
reporter - changed milestone to 0.6.xx
we think that the issue only occurs when backrefs are at play (though I seem to recall seeing it happen without a backref being involved....). I think we might keep this tabled for now.
-
reporter This is still on my mind.
The general method of no longer referencing
__dict__
directly has the issue that when the object is dereferenced, we don't have access to__dict__
anymore and we can't resurrect the object, if any attributes have changed.The solution to that is to establish a strong reference upon the "modified" event. this would be great as we could throw away the whole resurrect idea completely.
However, as usual mutable attributes make this method impossible. We can't get "modified" events for mutable attributes.
Ways to work around would be:
- do get modified events for mutable attributes- this would require a wrapper that detects changes. This is pretty unfeasable as we make no assumptions about what people are storing in a "mutable" attribute and would be chock full of complexity and surprises.
- Leave the "resurrect" system in place for instance states that contain mutable attributes. This is a high degree of surprise as the behavior of an app changes as soon as a mutable attribute is added, and we have to keep all the complexity we had before.
- For mutable attributes, create a strong ref on access to the mutable attribute (i.e.
__get__()
). This still keeps the surprise of "my objects aren't getting auto-cleared" around but does limit it to a very specific and easy-to-identify scenario. the code remains simple as we don't need a resurrect phase. - ignore "mutable attributes" as special. if you don't do a
__set__()
, and you change the contents of your mutable attribute, you have to maintain a reference to the object. this is probably worse than just removing support for "mutable attributes" entirely. - remove "mutable attributes" entirely in 0.6. you have to do a
__set__()
or we don't even see it as changed. Again the advantage here is that its cut and dry, and works more like a relational database does anyway. Disadvantage is upgrade pain for users and the loss of a feature that is probably useful to some people. - maintain a
dict
that only stores "mutable" values - so you do a__get__()
, the value is added to the dict, then we still maintain aresurrect
phase that resurrects only those attributes, andmapper._save_obj()
knows how to handle an object like that. The only disadvantage here is thatMapperExtension
objects that implementbefore_update()
orafter_update()
now get an instance with unexpected state (although, the state is not unlike the object's attributes had just been expired). We'd need to document this. Also, it's an elaborate system that almost never gets invoked, adding a potential surprise to mapperextension implementations and complexity to the codebase that almost never gets used. but this solution does solve the mutable/weakref problem completely.
-
reporter - changed milestone to 0.5.4
more thoughts on this, I think this will be worth it in a few ways.
the
InstanceState
andIdentityManagedState
should be reorganized intoInstanceState
andMutableAttrInstanceState
. The mapper's awareness of this and the "instance_state_factory" should be removed. The "mutable" version gets invoked directly byClassManager
whenmutable_attributes
is non empty. This allows us to then move all of the resurrection stuff to the "mutable" version of instancestate. None of it will occur for the regular instance state which will improve performance during synchronous GC events.Additionally, the
check_modified()
method goes away onInstanceState
- direct attribute access will be used, and will be subclassed by a descriptor on the mutable version. This will enormously speed up flushes of large sets of objects since_dirty_states
will usually be a single method call instead of N method calls. Other "mutable_attribute" conditional crap inInstanceState
can be pushed out as needed to the subclass, like the extra work inunmodified
for example.the two state classes should be in a new module sqlalchemy.orm.state, and a new set of unit tests specifically exercising everything about "mutable" state should be added.
-
reporter more comments.
this is also good for 0.5.4 - no public APIs are changing and I'd like to see improved performance in the 0.5 series.
I'd like to go for
#6as the approach to mutable attributes. -
reporter - changed status to resolved
-
reporter - removed milestone
Removing milestone: 0.5.4 (automated comment)
- Log in to comment
So here's a current method:
We can either go with something like this:
where
instance_dict
defaults toattrgetter
or just leave it the old way, and
state.dict
is just a weak reference.The issue comes down to, is
attrgetter
so much faster thanweakref
?