Eager load does not work as expected under special circumstances

Issue #3811 resolved
Dariusz Dzialak created an issue

My postgresql database keep huge tree thus I want to use eager load to make sure there is no separated query for parent. Unfortunately eager load does not work if object was previously returned as a parent.

Bug could be reproduce with attached script and with SQLAlchemy==1.1.0b3

I've used that version as I've though that issue is fixed by: https://bitbucket.org/zzzeek/sqlalchemy/issues/3431/object-eager-loaded-on-scalar-relationship

Currectly output is:

testing>  ./psql_eager.py 
delete from
create_all
SELECT parent_1.id AS parent_1_id, parent_1.name AS parent_1_name, parent_1.parent_id AS parent_1_parent_id, parent.id AS parent_id_1, parent.name AS parent_name, parent.parent_id AS parent_parent_id 
FROM parent LEFT OUTER JOIN parent AS parent_1 ON parent_1.id = parent.parent_id
Parent(1, p1, child of 2)  has parent:  Parent(2, p2, child of None)
Parent(2, p2, child of None)  has parent:   ------- ERR: 'Parent.parent' is not available due to lazy='raise'

Comments (5)

  1. Dariusz Dzialak reporter

    So my observation is that circumstances of reproduction are that some object X must:

    • not have parent itself
    • be returned as a parent of other object Y
  2. Mike Bayer repo owner
    • changed milestone to 1.1
    • changed component to orm
    • marked as major

    The issue is that the fix for #3431 only sets None for the attribute if the incoming row has never been seen before. If the row has been seen, then a value of None for the current row is still considered to be a skip. This is likely to keep the change as conservative as possible. The test cases in #3431 are favored towards the case where for one eager load, the columns that would load the entity are non-present, and for the other, they are. By setting the relationship attribute to None, that's now the official value of the attribute and it disallows any further load from setting a value. However, if the incoming columns really are dedicated towards this path then there is probably not a reason to not honor "None" as the official value of this attribute, and if in a subsequent row does have a value, that's a conflict. The only reason I can think of at the moment would be that existing applications are using contains_eager() incorrectly and NULL values are being delivered where they shouldn't, and their application works because lazy loads later on take care of the problem. At the moment I think it's OK to forego supporting that.

    The way that it works right now does not cause any additional SQL to be emitted, because the lazy loaders don't emit SQL when the source keys are None - which means your test only emits one SELECT if you do away with the lazy='raise'. However, because you're using the new "lazy='raise'" strategy, this does not distinguish between a lazy load that uses the Session but does not emit SQL vs. one that does. In this case, the lazy load will not emit SQL since the key given is already None, it just makes sure a Session is present. If you are only looking to not emit SQL, a new variant "raise_on_sql" would suffice, and a version of that is put up at https://gerrit.sqlalchemy.org/#/c/203/.

    The simpler change of just allowing None to be treated as a valid value coming back from the eager load is also likely feasible and that is at https://gerrit.sqlalchemy.org/204.

  3. Mike Bayer repo owner

    Honor additional row coming in with value of None

    The change in #3431 still checks that the instance() is non-None, deferring to other loading schemes if it is. These columns are dedicated towards the entity however, so if the value is None, we should set it. If it conflicts, we are detecting that in any case.

    Change-Id: I223768e2898e843f953e910da1f9564b137d95e4 Fixes: #3811

    → <<cset c3abfe50645a>>

  4. Dariusz Dzialak reporter

    Thanks for solving that issue.

    I've tested https://gerrit.sqlalchemy.org/#/c/203/ (raise-on-sql) and https://gerrit.sqlalchemy.org/204 (shorter issue - as I understand always update object even it already exists). I just wanted to mark that raise-on-sql is not enough because when I've added some attribute that is loaded as backref:

    class Attr(Base): tablename = 'attr'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    parent_id = Column(ForeignKey(u'node.id'))
    parent = relationship(u'Node', back_populates='attr', lazy='raise')
    

    then even changes all strategies from 'raise' to 'raise-on-sql' then problem was still present. After introduced shorter fix (id 204) the problem has gone.

    Once again Thanks Regards Darek

  5. Log in to comment