New objects do not have their properties populated automatically

Issue #137 resolved
Former user created an issue

Synopsis:

Creating a new instance and commiting the unit of work requires a refresh(instance) to populate mapped property objects.

Description:

Given a bound class with several mapped properties and creating a new instance in the form:

obj = MyObject()
obj.fk = 1
...
objectstore.get_session().commit()

This yields a commited object against a commited row or set of rows in the database.

If {{{MyObject}}} has a property {{{'other'}}} that relates to {{{obj.fk}}}, then post-commit {{{obj.other == None}}}. If you call {{{objectstore.get_session().refresh(obj)}}} then the property is populated correctly and {{{obj.other == <valid object>}}}.

Sample code is attached.

Comments (7)

  1. Former user Account Deleted
    • removed status
    • changed status to open

    Hey Mike -

    I think I understand that, and I definitely understand why things like backrefs and such aren't updated. However, since you are doing a SELECT to pull the primary key from the database anyways, why not .refresh() the object? It seems like it would be a perfect opportunity to ensure the object is formally correct.

    My problem is that I'm creating an object, and then later when I reload the form I'm getting the same partially created object again. It doesn't make sense for me to keep a cache of "new" objects when there is already a list of that generated by the UOW, right? --G

  2. Mike Bayer repo owner

    gambit -

    because SQLAlchemy tries to do as little work as possible on every operation, so that it does not introduce unwanted latency into an application. having it refresh all the properties on every object it saves could potentially be an application-killing operation, if the objects were not intended to have all their properties loaded in at the same time.

    the better operation would at least be to expire() the object, so that if it isnt touched again then no load occurs. but even that is a lot of overhead to add, for example in all of the code examples I have done this operation would be a total waste of resources, since I already have set up all the properties exactly as they are to be and then saved. I had intended that this would be the primary way for it to be used; to set up object relationships naturally without any explicit awareness of identifiers....otherwise your application conflates domain objects with their underlying storage mechanism.

    So I would ask:

    1. why do you have to manipulate IDs directly instead of attaching instances to each other
    2. why can't you identify when you are manipulating IDs and simply expire() the object instances
    3. if you are reloading a form, as in a reload on a web page, why are you not performing an objectstore.clear() at the start of each request ? this is definitely recommended.
    4. do you really want every single object saved to be fully refreshed() ? this could be yet another flag i guess on mapper, it would just be expire_on_save and do the expire() thing. but again, I am loathe to have to support a whole new crop of side effects that come from such a flag, which is why I hate adding features that support people using this other than the way I had intended....theres enough bugs and missing functionality without having to worry about that.

    anyway, this is at the very least an enhancement. also at the moment its "wontfix". if you convince me to add #4, I'll adjust the status.

  3. Former user Account Deleted

    I think you have some very good points here, especially on how you /expect/ the system to be used (namely through object manipulation, as different from my natural db-related instinct for passing around primary keys).

    First of all, I was manipulating ID's directly for two reasons. First, was (as I said above) habit. The second was that I was using ID's as my userdata attached to combo box's and other controls in the wxWidget context. I'm changing this around right now to use the objects themselves, so it's not critical.

    I agree with you that doing a refresh on all saves would be... expensive. Instead of saves, how about explicitly and only for inserts? Creating a new instance is something of a special case, since some fields may not be populated (or may be expected to have default values). Additionally, some SA properties may be generated via a composite of several other fields, or are lists, etc. Automatically refreshing a newly created instance places it into a known-good state. This seems like a good thing to me.

    I'm not sure that expecting the developer to manually track and refresh new objects makes sense. The situation I was running into, and directly related to the source code above, is like this:

    rel = Relationship()
    rel.pk = 1
    rel.info_fk_one = 1
    rel.info_fk_two = 3
    rel.rel_data = "1:1-3"
    
    objectstore.get_session().commit()
    
    rel = None
    rel = Relationship.mapper.get(1)
    # At this point, the following is true:
    # rel.info_one == None
    # rel.info_two == None
    

    At this point, the assertions that follow (namely, that .info_one and .info_two are still None) all pass. That is to say, the .get() retrieves the already existing instance from the cache.

    Should SA call .refresh on the instance automatically? Should the developer call it explicitly for each insertion? Is there a case where the developer would not want the refresh to occur (shouldn't he be using a lazy load in that case?), either for data integrity or for performance reasons?

    Back to you, Z. --G

  4. Mike Bayer repo owner

    the example you have there seems entirely natural to me. youve got this object "rel" in memory, you save a representation of it to the database, you look at it again and its exactly the same as you left it. that seems to me to be the most predictable and un-ambiguous behavior; it does nothing ! no magic, the explicitness that is better than the implicitness.

    also, its not apparent to me at all why a refresh() on insert is more significant from a refresh() on update. both situations may involve new foreign keys being set and producing a similar behavior. If i were to make a refresh() default on only inserts and not updates, there would be many dozens of new emails and tickets related to how confusing that is.

    but dont fret, theres two things that may affect this in the future:

    1. you may have noticed ill always add flags for people. so if i just give you the "refresh_on_save" flag, then you can just switch that on.

    2. I am beginning to seriously think about the list attribute used for mapper loads, and may start a project to majorly improve it; so maybe the "lazy loading" behavior will act more like what youre looking for here after i get through that.

    i gotta run now...will comment futher...

  5. Former user Account Deleted

    I think you're looking at this in a little too close focus, actually. The behavior in quesion isn't "save rel, look at it again, hey, it's still the same!" because, in that case, sure, you wouldn't expect it to change. Instead, the behaviors more like this: "save rel, go do something else, ask for record (1), hey, it's incomplete, wtf?" Once I discard my reference to it, the next time I .get(1) I would expect to get a fully formed instance.

    The reason I'm amenable to refresh() default on inserts and not updates is that updating is, by nature, something that should start with a known-good instance, whereas inserting could definitely, as is shown here, leave an object in a half-formed state.

    Plus, as you've pointed out elsewhere, refreshing on update could potentially incur a substantial /unexpected/ performance penalty. I'm perfectly willing to require refreshes on updates to be performed by the developer.

    That brings up an interesting idea, if being able to 'mark' a particualr instance to be updated on the next objectstore.commit() cycle. That way you can say "My dictionary object must be kept up-to-date for each commit" explicitly, without having to update the entire objectcache. I would consider this a low-mid priority feature, though.

    Is it possible to detect, via timestamps or object version numbers or some such, whether an object could be considered 'dirty'? Is this what the 'hibernate' thread in the mailing list is discussing?

  6. Mike Bayer repo owner

    I think you're looking at this in a little too close focus, actually. The behavior in quesion isn't "save rel, look at it again, hey, it's still the same!" because, in that case, sure, you wouldn't expect it to change. Instead, the behaviors more like this: "save rel, go do something else, ask for record (1), hey, it's incomplete, wtf?" Once I discard my reference to it, the next time I .get(1) I would expect to get a fully formed instance.

    OK, a few things, first of all when you discard a reference, it will eventually be garbage collected; SA doesnt hold onto it with a strong reference. the reason it is still hanging around is becuase theres some circular refs in it, and theres a workaround for that which I guess I should just commit despite my concerns over extra performance overhead (you change the function "managed_attribute_dict" in attributes.py to return a weakvaluedict). but if your objects have their own circular refs youd still have a problem.

    the next is, if your app is organized that way, i.e. save rel, go do other things, now im back and i expect everything to be fresh from the DB, it sounds like youre looking to start a new session anyway, but maybe not. just a thought.

    I still dont think SA should do any extra work by default. by setting FK values, you are tinkering; SA should not assume tinkering by default, and do a whole lot of extra work, which may very likely be an unreasonable amount of work in some cases, based on the assumption of such tinkering. similarly, conditionals which try to "detect" this condition is also extra work and complexity which is just going to break in unpredictable ways. the core SA must try to be as lightweight and predictable as possible (it already adds a huge dollop of overhead).

    so, you can currently get the behavior you want as follows:

    class InsertRefresher(MapperExtension):
         def after_insert(self, mapper, instance):
            """called after an object instance has been INSERTed"""
            objectstore.refresh(instance)   # or use expire()
    
    m = mapper(class, table, ext=InsertRefresher())
    

    I really want to push MapperExtensions as the way for everyone asking for all kinds of behavior to get what they want, without adding complexity to the core and behaviors that might have unwanted side effects or performance overhead in many cases. While the above is a little inconvenient for the moment, I believe there will be more automated ways to "wire up" SA with all kinds of built-in extensions. i am beginning to formulate a list of plugins that some people seem to want.

  7. Log in to comment