Flexible deferred interface

Issue #1418 resolved
Former user created an issue

The idea is simple. In a query you want to defer all the columns that of the mapped object, that are not in a certain list. You don't want to: 1. Do it manually, by deferring each column. Because then, you need to know all of your columns. 1. Write the code that examine the columns in the mapper and defer them if they are not on your "not to defer" list. Because, it's easier to ask the orm to do so.

The use-case is the following: You're writing a script that should be very efficient. In the script you need to load a certain mapped object, but you know that you only need some of its column attributes. Because it's a very efficient script, you want to defer all of those column attributes that aren't important for the script to work. You can query only the attributes that you're interested with (without querying for the entire object) - but you really would prefer to get your mapped object - with all it's relations and functions in place. The reason that the deferring of the columns is not done in the mapper itself, is that you're using the same mapper that is used by other parts of the applications - which are interested in different columns than your script does. With no other choise, you decide and defer all the other columns.

And I'm not sure about the name of the function.

Thanks, kobipe3@gmail.com

Comments (16)

  1. Mike Bayer repo owner

    Continuing from #2250 and #1359 there needs to be more flexibility available in deferreds.

    # existing:
    
    defer("some_attribute")
    defer(MyClass.some_attribute)
    defer("related.foobar.attr1")
    defer(MyClass.related, MyRelated.foobar, MyFooBar.attr1)
    
    # needed:
    
    # 1. multiple attributes on target
    defer_all("attr1", "attr2", "attr3")
    
    # 2. multiple attributes on indirect target
    defer_all("attr1", "attr2", "attr3", path="related.foobar")
    defer_all("attr1", "attr2", "attr3", path=(MyClass.related, MyRelated.foobar))
    
    # undefer_all() too
    
    # 3. deferral of groups
    defer_group("mygroup")  # e.g. ticket 1359
    undefer_group("mygroup")
    
    # 4. exclusionary defer
    defer_except("attr1", "attr2")
    defer_except("attr1", "attr2", path=(MyClass.related, MyRelated.foobar))
    
    # undefer_except() too
    

    we can start clean here and add a nice new test suite test/orm/test_deferred.py, move all the test_mapper stuff out to it.

  2. Mike Bayer repo owner
    • changed milestone to 0.9.0

    maybe if we enhance the path system a bit, we can work up compositions, so a "defer all but" would do this:

    sess.query(Entity).options(defer(Entity), undefer(Entity.foo), undefer(Entity.bar))
    

    that is we enhance the option system to somehow expand out given an entity class instead of an attribute. more syntax ideas:

    sess.query(Entity).options(defer(Entity).undefer("foo", "bar"))
    
    # can we figure out the damn *keys thing with defer()/undefer()?   separate strings
    # should really mean multiple keys...
    sess.query(Entity).options(undefer("foo", "bar"))
    
    # .. but then this doesn't work, sigh.  the problem is pathing.  
    # we've never worked out a coherent pathing system.  sucks
    sess.query(Entity).options(undefer(Entity.widget, Widget.bars))
    
    
    
    sess.query(as_entity(Entity.foo, Entity.bar))
    
    sess.query(defer(Entity).undefer("foo", "bar"))  # less typing, basically need to add some interface to Query
                                                     # that accepts an option and turns it into an entity + option
    
    # pathing has to keep working too
    sess.query(Entity).options(defer("widgets.bats").undefer("foo", "bar"))
    
  3. Mike Bayer repo owner

    this idea, replace the entire option functions entirely with a new kind of registry that's more explicit and stateful. the old system would express itself internally in terms of the new one. somehow get the Strategized system to be faster with this.

    options(load(Entity).defer("x", "y").joined("q", "p"))
    
    options(load(Entity).joined("q").of_type(SubQ))   # options(joinedload(Entity.q.of_type(SubQ))
    
    # but we still need property.of_type() for any(), has(), join(), so can we 
    # still use it here?  ok
    options(load(Entity).joined(Entity.q.of_type(SubQ))
    
    options(load(Entity).joined("subentity").joined("subsubentity"))
    
    # now attribute pathing is like this:
    options(load(Entity).joined(Entity.subentity).joined(SubEntity.subsubentity))
    
    options(load(Entity).joined("subentity").joined("subsubentity"))
    
    options(load(Entity).select("subentity").select("subsubentity").defer("x", "y"))
    
    options(load(Entity).only("x", "y"))
    
    options(load(Entity).from_query())   # contains eager
    
  4. Mike Bayer repo owner

    the challenge here is that when we create this load thing, there's some kind of object that the internals can query against, which is fast:

    q = query(Entity).options(load(Entity).only("x", "y"))
    
    # then internally, somehow this works:
    current_path = PathContext(Entity, "z")
    
    # is "deferred", without us needing to iterate through Entity four times.
    # maybe if only() and other methods could load up the full loader strategy 
    # internally - then we'd do away with the dictionary crap in StrategizedProperty.
    query.context.strategy(current_path)
    
  5. Mike Bayer repo owner

    by allowing a load() object to be used as an entity, we can push for this pattern:

    class MyEntity(Base):
       # ...
    
       @classproperty
       def index_cols(cls):
            return load(cls).only("fast_col_one", "fast_col_two")
    
    
    my_fast_query = sess.query(MyEntity.index_cols)
    
  6. Former user Account Deleted

    At one point we had hoped to keep this syntax consistent with relationships: ticket:2351#comment:26

    options(load(... strikes me as verbose and maybe confusing?

    I've often wondered if the .options() bit could be engineered out (as you hinted to, as well). For example:

    sess.query(Entity).options(lazyload('*')).all()

    I've often wondered if options couldn't be query methods:

    sess.query(Entity).lazyload('*').all()

    So, if you head that route, maybe you consider making load() a query method that is really a synonym for options(load())?

    I really think whatever the design is, it should be made syntactically equivalent for relationships for consistency. For example:
    If: sess.query(Entity).options(defer(Entity).undefer("foo", "bar")) Then also: sess.query(Entity).options(lazyloadall(Entity).joinedload("foo", "bar")) (I think you were agreeing, unless I didn't understand.)

    When I was a new sqlalchemy user I didn't understand why we'd need two sets of language in the first place: couldn't defer() and load() operate polymorphically equally on a relationship property ''or'' a column property? (I understand there are different loading strategies for relationships, etc. so maybe it isn't feasible, but I thought I'd bring it up because it was something I had thought of as a new user.)

    Sorry, I know that isn't much help for pathing problems; I haven't given that too much thought for how to better that (never really thought it was too broken).

    One other consideration: what about join points? If I have this:

    options(load(Entity).joined(Entity.subentity).joined(SubEntity.subsubentity))

    and if Entity.subentity and Entity.subentity.subsubentity refer to the same table, will you be able to add a filter and specify which joinpoint?

    As long as we are brainstorming, should filtering alternatively allow the full path? .filter(Entity.subentity.x==2) vs. .filter(Entity.subentity.subsubentity.x==2) or is that just too ugly to implement?

  7. Mike Bayer repo owner

    Replying to kentbower:

    options(load(... strikes me as verbose and maybe confusing?

    So, if you head that route, maybe you consider making load() a query method that is really a synonym for options(load())?

    the problem with this is that load() is itself method-chained and leads to a context that is specific to load(). putting it up on Query means that all of load()'s methods are now on Query, which makes it superfluous. this would be very confusing:

    query(Entity1).join(Entity2, Entity1.somerel).load(Entity1).joined(someotherrel).filter_by(q=5).defer("x", "y")...
    

    that is, mixing up load() methods and query() methods, like above where filter_by() is stuck in between joined() and defer() is a mess, which entity does filter_by() operate upon ? Entity1.somerel or Entity1.someotherrel, which context does it take place within?

    If query has a join() and a joinedload() method at the same level, that will be a constant source of confusion. Loader options are special. They are not specifically directives to compose a certain kind of SQL, they're behavioral directives consumed by the mechanics that link result sets to mapped objects.

    The remarkable thing about query.options() is that the proposal for load() actually justified it, as load() can have it's own method-chaining that is cleanly separated from that of the Query itself. It's like query.options() was made for a sub-chained object like load() all this time.

    I really think whatever the design is, it should be made syntactically equivalent for relationships for consistency. For example:
    If: sess.query(Entity).options(defer(Entity).undefer("foo", "bar")) Then also: sess.query(Entity).options(lazyloadall(Entity).joinedload("foo", "bar")) (I think you were agreeing, unless I didn't understand.)

    the idea is that you always start with load(), load(Entity). I guess your "star" thing would work out like "load(Entity).lazyload('*')". Which in itself is an argument to keep relationships and columns separate at least in most cases. Otherwise how do i specify lazy loading for all the relationships and not the columns? I can see that the only() method might want to work across both, if you said only("x", "y"), by default that would cancel out joined/subquery/immediate loads as well.

    When I was a new sqlalchemy user I didn't understand why we'd need two sets of language in the first place: couldn't defer() and load() operate polymorphically equally on a relationship property ''or'' a column property? (I understand there are different loading strategies for relationships, etc. so maybe it isn't feasible, but I thought I'd bring it up because it was something I had thought of as a new user.)

    this is kind of true for lazyload() and defer(). But for every other kind of loading it's not. if you're thinking in SQL-centric terms (which I recommend) I think they seem very different.

    One other consideration: what about join points? If I have this:

    options(load(Entity).joined(Entity.subentity).joined(SubEntity.subsubentity))

    and if Entity.subentity and Entity.subentity.subsubentity refer to the same table, will you be able to add a filter and specify which joinpoint?

    that kind of thing should already work now, joinedload() always makes an alias for things so if SubEntity.subsubentity were self-referential to SubEntity, the above would join twice to the table and populate results accordingly. I don't understand what we'd be filtering here, perhaps you're referring to the usage of query.join() to set up specific joins in conjunction with contains_eager() ?

    As long as we are brainstorming, should filtering alternatively allow the full path? .filter(Entity.subentity.x==2) vs. .filter(Entity.subentity.subsubentity.x==2) or is that just too ugly to implement?

    I talked about this in my email (ack you saw the email right? https://groups.google.com/forum/#!topic/sqlalchemy-devel/_Jgc5xC7tQY )

  8. Log in to comment