loader strategies at lazy load time + potential assoc proxy tie-in

Issue #2982 new
Ezra Epstein created an issue

I've encountered a use case where being able to override the default fetch strategy of the underlying relationships would come in handy.

In this case the underlying relationships are both lazy - a one-to-many (backref) to an intermediate join table, followed by a to-one to the target table for the association. There are good reasons for each to be lazy "joins" (separate select statement required).

However, when the association_proxy is accessed I would like to indicate that the second relationship should be fetched eagerly using an (inner) join. Without that the association_proxy ends up giving rise to the N+1 select problem.

I do have some work-arounds that make use of the org Query APIs. Would be very nice, however, to be able to do this directly via a mapped attribute.

Thanks for considering it. (Or clarifying what I may be doing wrong if this is already a feature!)

Comments (12)

  1. Mike Bayer repo owner

    there's no system that can specify loader options after the parent object already present.

    that is, if we have A.b -> B, B.c -> C, if you specify "lazy='joined'" on B.c, then you'll get joined loading of B.c when you access A.b. Or, if you specify joinedload(B.c) when you make a query for A, those particular A's will do joinedload on the B.c when you do A.b.

    But if you already have an A instance, and you're done with querying, there's no clean API that would say "joinedload B.c when I hit the .b attribute here".

    At the very least it would need to look like something ugly, like:

    a1 = sess.query(A).first()

    sess.query(A).options(immediateload(A.b).joinedload(B.c)).filter_by(id=a1.id).first()

    a.b

    The API that says "refresh the unloaded attributes of this A object" could be made slightly more succinct, somehow, but the options there are the options. There's no API that wraps around A.b, how could that look:

    from sqlalchemy.orm import magic_load

    magic_load(a, "b", joinedload(B.c))

    ?

    that's the most succinct it could possibly be.

    within the association proxy, you could maybe shove magic_load() in there somehow but this is all kind of vague and it's a big deal to add little things like this, not just because they are frustrating to work out all the edge cases but also because the presence of even more API confuses people a lot - it has to be really super-well thought out.

  2. Ezra Epstein reporter

    Hi Mike,

    Thanks for the prompt reply.

    As for the "something ugly" ... that's my current approach (though I fetch all() rather than first()). The other work-around I have is to just fetch from the target directly, reversing the query, filtering the intermediate table's rows by the source table's joined id.

    What I was imagining is a way to override the join strategy (or semantics) on a per-query basis. From that point of view the association_proxy is really a short-hand for how to construct a query. OF course if the objects are already in the sessions cache then things might look different, but currently, the first time the proxy is triggered it always goes to DB, and the substitute queries always go to DB as well. Borrowing for a different ORM... I thought I recollect something in Hibernate that allows one this type of per-query override (but I'm not sure).

    Thanks again for considering this.

  3. Mike Bayer repo owner

    SQLalchemy differs from a lot of ORMs (all the python ones, but actually not as much hibernate) in that the collection on an object is cached in memory, or more basically, if you have somea.bs, ".bs" is a plain Python list. With most other ORMs, saying "somea.bs" emits a query every single time. With SQLalchemy you can get that behavior if you use the "dynamic" loader strategy.

    Otherwise, somea.bs uses "lazy loading", meaning you get the SQL just once then its loaded.

    So it's already not that simple to just substitute "somea.bs" for "sess.query(B).with_parent(somea).all()". That will load you the same data, but it won't populate "somea.bs". this isn't just an "emit SQL" operation. it's, "conditionally emit SQL, populate a specific collection, and return the collection whether or not it was populated".

    hence the absolutely shortest explicit API for this, which retains the same end result, would have to name: a. the object in question, b. the name of the attribute, and c. the loader options.

  4. Ezra Epstein reporter

    I still think this is a useful feature and fits well in with what you share.

    I understand lazy-loading to be the feature where by some attribute - which, in this case represents associated entities (potential instances of a mapped class) - are not loaded until they are first accessed.

    The second property is that, once accessed, the fetched values are retained in memory.

    In Hibernate this is accomplished via their proxy objects (or that's what it was called back when I was using Hibernate.) And, to really date myself, these were known as "fault objects" in NeXTStep's EOF.

    In both cases, the fault/proxy holds the query to be performed on first access - which triggers the lazy loading of the associated records/entities. So, from that point of view, an association_proxy is both a) an indicator to create "fault/proxy" object (which may or may not be configured for lazy-loading), and b) indirect information about the form of the query - relevant only in the case when there is lazy-loading for that parameter. It's this latter part that I'm focusing on. In the case where accessing a lazily loaded parameter does trigger a SQL call, I'd like to be able to tell SQLAlchemy a bit more about the particular SQL to emit for that particular fetch. (Known as a "fault" or "fault action" in EOF.)

    That was the intention/concept behind the request.

    Thank you very much.

  5. Mike Bayer repo owner

    here's a quick recipe which does it, I'll see if it can be cleaned up to be a bit easier, assoc prox fights it a bit:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.ext.associationproxy import AssociationProxy
    from sqlalchemy.orm import attributes
    
    Base = declarative_base()
    
    class MyProx(AssociationProxy):
        def __get__(self, obj, class_):
            if obj is not None:
                state = attributes.instance_state(obj)
                mapper = getattr(class_, self.target_collection).property.mapper
                attr = getattr(mapper.class_, self.value_attr)
                state.load_options = state.load_options.union([joinedload(attr)])
            return super(MyProx, self).__get__(obj, class_)
    
    class A(Base):
        __tablename__ = 'a'
    
        id = Column(Integer, primary_key=True)
        bs = relationship("B")
    
        cs = MyProx("bs", "c")
    
    class B(Base):
        __tablename__ = 'b'
    
        id = Column(Integer, primary_key=True)
        a_id = Column(ForeignKey('a.id'))
        c_id = Column(ForeignKey('c.id'))
        c = relationship("C")
    
    class C(Base):
        __tablename__ = 'c'
    
        id = Column(Integer, primary_key=True)
    
    e = create_engine("sqlite://", echo=True)
    Base.metadata.create_all(e)
    
    sess = Session(e)
    c1, c2 = C(), C()
    
    sess.add_all([
        A(bs=[
            B(c=c1),
            B(c=c2),
            B(c=c1),
            B(c=c2)
        ]),
    ])
    sess.commit()
    
    a1 = sess.query(A).first()
    
    for c in a1.cs:
        print c
    
  6. Ezra Epstein reporter

    A further test reveals that it is close, but...

    If I first fetch: a1.bs, and then fetch a1.cs, SQLAlchmey will emit N Select statements. So under some conditions this approach still suffers from the N+1 query problem.

  7. Mike Bayer repo owner

    I don't see how that can be solved, given these two requirements:

    1. you dont want to just put "lazy=joined" on B.c. That is, a load of A.bs must not join B.c. In the vast majority of use cases, people just put lazy='joined' on B.c and they're done.

    2. The association proxy then uses the in-memory A.bs collection, and has no choice but to emit SQL for those B.c that link to an unloaded C.

    If you are to fetch a1.bs, and then a1.cs, exactly what SQL would you expect to see?

  8. Log in to comment