session.query(table).select_by() fumbles constraint on dependent table column

Issue #367 resolved
Former user created an issue

SQLAlchemy 0.3.0

Oracle 10g

cx_Oracle 4.2

Red Hat Enterprise Linux WS release 3 (Taroon Update 7)

If you have autoloaded tables Spam and Egg related by Spam has-many Egg like so:

engine = create_engine(...)
meta = BoundMetaData(engine)
tspam = Table("spam", meta, autoload = True)
tegg  = Table("egg", meta, autoload = True)
class Egg(object):
    pass
mapper(Egg, tegg)
class Spam(object):
    pass
mapper(Spam, tspam, properties=dict(eggs=relation(Egg,lazy=False)))

then the unconstrained select

session.query(Spam).select()

works properly but

session.query(Spam).select_by(col=value)

fails, where "col" is the name of a column present in Egg but not in Spam. The second query returns Egg objects with all values of "col". The only difference in result from the first query is that Spam objects lacking any associated Eggs are omitted.

The unconstrained query produces SQL like this:

SELECT
    egg_74ee.spam_id    AS egg_74ee_spam_id,
    egg_74ee.version    AS egg_74ee_version,
    egg_74ee.id         AS egg_74ee_id,
    egg_74ee.name       AS egg_74ee_name,
    spam.id             AS spam_id,
    spam.name           AS spam_name 
FROM
    spam
    LEFT OUTER JOIN egg egg_74ee
        ON spam.id = egg_74ee.spam_id
ORDER BY
    spam.rowid,
    egg_74ee.rowid

The second query should have been just like the first with the addition of a WHERE clause. Instead a useless new join is added with a constraint on ''it'' instead of on the old join:

SELECT
    egg_3216.spam_id     AS egg_3216_spam_id,
    egg_3216.version     AS egg_3216_version,
    egg_3216.id          AS egg_3216_id,
    egg_3216.name        AS egg_3216_name,
    spam.id              AS spam_id,
    spam.name            AS spam_name 
FROM 
    egg,
    spam
    LEFT OUTER JOIN egg   egg_3216
        ON spam.id = egg_3216.spam_id 
WHERE
        egg.col     = value
    AND spam.id     = egg.spam_id
ORDER BY
    spam.rowid,
    egg_3216.rowid

I see a similar problem if I omit the lazy=False from the call to relation().

By the way, it isn't clear to me whether the select_by() query should return Spam objects with empty Egg lists or omit them. A simple equality constraint would omit them since the comparision with NULL would fail; you'd have to use (table.col = value OR table.col is null) to get them.

Comments (7)

  1. Mike Bayer repo owner
    • changed component to orm
    • marked as major

    whats the bug here exactly? that the child objects are not being constrained by the criterion ? thats not how select_by or eager loading works.

    read the FAQ entry:

    http://www.sqlalchemy.org/trac/wiki/FAQ#ImusinglazyFalsetocreateaJOIN/OUTERJOINandSQLAlchemyisnotconstructingthequerywhenItrytoaddaWHEREORDERBYLIMITetc.whichreliesupontheOUTERJOIN

    also ive recently clarified this in the docs:

    http://www.sqlalchemy.org/docs/datamapping.myt#datamapping_selectrelations_eagerload

    in your case, your looking for a constrained relationship to the child items. For this I would recommend feeding your own joined query into instances():

    http://www.sqlalchemy.org/docs/adv_datamapping.myt#advdatamapping_resultset

    because if you have a spam with only some of its eggs loaded, that instance is in an incomplete state compared to the database. if you then append another egg to the spam that is already in the DB and then flush, youll get a constraint violation.

  2. Former user Account Deleted

    Yes, I was expecting the child objects to be constrained by the criterion and the parents marked read-only. I haven't read your references yet but I think I understand what you're getting at. Instead of

    ''Return parents with only those children with col=value.''

    the result from the select_by() is actually

    ''Return those parents (with all children) which have a child with col=value.''

    Sorry for the misunderstanding!

  3. Log in to comment