joinedload not working as expected ?

Issue #3750 closed
Mehdi Gmira created an issue

Hey,

I've been experiencing some problems when using association tables :

Here is the code to set up the database. Basically I have a school table that has a one-to-many relationship to students and sports. While students and sports are linked together by a many-to-many relationship through an association table.

from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func, Table, create_engine
from sqlalchemy.orm import relationship, backref, sessionmaker, joinedload
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

ods_legs_association_table = Table('students_sports', Base.metadata,
                                   Column('sport_id', Integer, ForeignKey('sport.id', ondelete='CASCADE')),
                                   Column('student_id', Integer, ForeignKey('student.id', ondelete='CASCADE'))
                                   )


class School(Base):
    __tablename__ = 'school'
    id = Column(Integer, primary_key=True)
    name = Column(String)


class Sport(Base):
    __tablename__ = 'sport'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    school_id = Column(Integer, ForeignKey('school.id'))

    school = relationship('School', backref='sports')
    students = relationship("Student", secondary=ods_legs_association_table, backref="sports")


class Student(Base):
    __tablename__ = 'student'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    school_id = Column(Integer, ForeignKey('school.id'))

    school = relationship('School', backref='students')


engine = create_engine('postgresql://xxxx@localhost/test', echo=True)

Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
Session = sessionmaker(engine)
session = Session()

school = School(name="Foo School")
students = [Student(name=letter) for letter in 'ABCD']
sports = [Sport(name='football'), Sport(name='tennis')]
for student in students:
    student.sports = sports
school.students = students
school.sports = sports
session.add(school)
session.commit()

Then I just query the school table with: - a joinedload on students - a joinedload on sports and a joinedload on student.sports

It is my understanding that doing this insures that none of the following statements should emit an sql query: -school.students -school.sports -school.sports[0].students -school.students[0].sports

But it does not work:

school = session.query(School).options(joinedload('sports').joinedload('students'), joinedload('students')).one()

school.sports[0].students  # doesn't emit SQL statement
school.students[0].sports  # emits SQL statement
school.sports[0].students[0].sports # also emits SQL

Also, the SQL statement looks weird (i would expected a join instead on this cross join):

SELECT sport.id AS sport_id, sport.name AS sport_name, sport.school_id AS sport_school_id 
FROM sport, students_sports 
WHERE %(param_1)s = students_sports.student_id AND sport.id = students_sports.sport_id

I don't understand why this is happening. For me, joinedload('sports').joinedload('students') tells SQLAlchemy to load students related to each sport, and thus sports related to each students.

Comments (13)

  1. Mehdi Gmira reporter

    Actually, this is is not related to association tables:

    from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func, Table, create_engine
    from sqlalchemy.orm import relationship, backref, sessionmaker, joinedload
    from sqlalchemy.ext.declarative import declarative_base
    
    Base = declarative_base()
    
    
    class School(Base):
        __tablename__ = 'school'
        id = Column(Integer, primary_key=True)
        name = Column(String)
    
    
    class Sport(Base):
        __tablename__ = 'sport'
        id = Column(Integer, primary_key=True)
        name = Column(String)
        school_id = Column(Integer, ForeignKey('school.id'))
    
        school = relationship('School', backref='sports')
    
    
    class Student(Base):
        __tablename__ = 'student'
        id = Column(Integer, primary_key=True)
        name = Column(String)
        school_id = Column(Integer, ForeignKey('school.id'))
        sport_id = Column(Integer, ForeignKey('sport.id'))
    
        school = relationship('School', backref='students')
        sport = relationship('Sport', backref='students')
    
    
    engine = create_engine('postgresql://xxxx@localhost/test', echo=True)
    
    Base.metadata.drop_all(engine)
    Base.metadata.create_all(engine)
    Session = sessionmaker(engine)
    session = Session()
    
    school = School(name="Foo School")
    students = [Student(name=letter) for letter in 'ABCD']
    sports = [Sport(name='football'), Sport(name='tennis')]
    for i, student in enumerate(students):
        student.sport = sports[i % 2]
    school.students = students
    school.sports = sports
    session.add(school)
    session.commit()
    
    
    school = session.query(School).options(joinedload('students').joinedload('sport'), joinedload('sports')).one()
    
    school.students[0].sport  # doesn't emit SQL statement
    school.sports[0].students  # emits SQL statement
    
    school = session.query(School).options(joinedload('sports').joinedload('students'), joinedload('students')).one()
    
    # everything works great
    school.students[0].sport  
    school.sports[0].students 
    

    changing joinedload('students').joinedload('sport') to joinedload('sports').joinedload('students') fixes the problem. This isn't the expected behaviour, right ?

  2. Mike Bayer repo owner

    Thanks very much for posting complete test cases and details. So I'm only looking at your second example, since you said this illustrates the issue. I see:

    school = session.query(School).options(joinedload('students').joinedload('sport'), joinedload('sports')).one()
    
    school.sports[0].students  # emits SQL statement
    

    Why would the "students" collection of any Sport object not be empty? That would require an additional joinedload of "students" off of "sports" or "sport". Also, when eager loading like this where the same entity can appear twice in a row, it's not deterministic which "path" will be chosen to populate the collection, so it is additionally difficult to determine exactly what will happen when that's the case.

    Setting create engine as:

    engine = create_engine('sqlite://', echo='debug')
    

    The first query illustrates:

    SELECT school.id AS school_id, school.name AS school_name, sport_1.id AS sport_1_id, sport_1.name AS sport_1_name, sport_1.school_id AS sport_1_school_id, sport_2.id AS sport_2_id, sport_2.name AS sport_2_name, sport_2.school_id AS sport_2_school_id, student_1.id AS student_1_id, student_1.name AS student_1_name, student_1.school_id AS student_1_school_id, student_1.sport_id AS student_1_sport_id 
    FROM school LEFT OUTER JOIN sport AS sport_1 ON school.id = sport_1.school_id LEFT OUTER JOIN student AS student_1 ON school.id = student_1.school_id LEFT OUTER JOIN sport AS sport_2 ON sport_2.id = student_1.sport_id
    2016-07-20 10:42:49,640 INFO sqlalchemy.engine.base.Engine ()
    2016-07-20 10:42:49,641 DEBUG sqlalchemy.engine.base.Engine Col ('school_id', 'school_name', 'sport_1_id', 'sport_1_name', 'sport_1_school_id', 'sport_2_id', 'sport_2_name', 'sport_2_school_id', 'student_1_id', 'student_1_name', 'student_1_school_id', 'student_1_sport_id')
    2016-07-20 10:42:49,643 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'football', 1, 1, u'A', 1, 1)
    2016-07-20 10:42:49,644 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'tennis', 1, 3, u'B', 1, 2)
    2016-07-20 10:42:49,644 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'football', 1, 2, u'C', 1, 1)
    2016-07-20 10:42:49,644 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'tennis', 1, 4, u'D', 1, 2)
    2016-07-20 10:42:49,644 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 1, u'football', 1, 1, u'A', 1, 1)
    2016-07-20 10:42:49,644 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 2, u'tennis', 1, 3, u'B', 1, 2)
    2016-07-20 10:42:49,645 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 1, u'football', 1, 2, u'C', 1, 1)
    2016-07-20 10:42:49,645 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 2, u'tennis', 1, 4, u'D', 1, 2)
    

    the second illustrates:

    SELECT school.id AS school_id, school.name AS school_name, sport_1.id AS sport_1_id, sport_1.name AS sport_1_name, sport_1.school_id AS sport_1_school_id, student_1.id AS student_1_id, student_1.name AS student_1_name, student_1.school_id AS student_1_school_id, student_1.sport_id AS student_1_sport_id, student_2.id AS student_2_id, student_2.name AS student_2_name, student_2.school_id AS student_2_school_id, student_2.sport_id AS student_2_sport_id 
    FROM school LEFT OUTER JOIN sport AS sport_1 ON school.id = sport_1.school_id LEFT OUTER JOIN student AS student_1 ON sport_1.id = student_1.sport_id LEFT OUTER JOIN student AS student_2 ON school.id = student_2.school_id
    2016-07-20 10:48:49,805 INFO sqlalchemy.engine.base.Engine ()
    2016-07-20 10:48:49,806 DEBUG sqlalchemy.engine.base.Engine Col ('school_id', 'school_name', 'sport_1_id', 'sport_1_name', 'sport_1_school_id', 'student_1_id', 'student_1_name', 'student_1_school_id', 'student_1_sport_id', 'student_2_id', 'student_2_name', 'student_2_school_id', 'student_2_sport_id')
    2016-07-20 10:48:49,807 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'A', 1, 1, 1, u'A', 1, 1)
    2016-07-20 10:48:49,807 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'A', 1, 1, 3, u'B', 1, 2)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'A', 1, 1, 2, u'C', 1, 1)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'A', 1, 1, 4, u'D', 1, 2)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'C', 1, 1, 1, u'A', 1, 1)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'C', 1, 1, 3, u'B', 1, 2)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'C', 1, 1, 2, u'C', 1, 1)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'C', 1, 1, 4, u'D', 1, 2)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 3, u'B', 1, 2, 1, u'A', 1, 1)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 3, u'B', 1, 2, 3, u'B', 1, 2)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 3, u'B', 1, 2, 2, u'C', 1, 1)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 3, u'B', 1, 2, 4, u'D', 1, 2)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 4, u'D', 1, 2, 1, u'A', 1, 1)
    2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 4, u'D', 1, 2, 3, u'B', 1, 2)
    2016-07-20 10:48:49,809 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 4, u'D', 1, 2, 2, u'C', 1, 1)
    2016-07-20 10:48:49,809 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 4, u'D', 1, 2, 4, u'D', 1, 2)
    

    the SQL is correct in each case given the order of joinedload() and in the second example, we can see that "sport.students" is loaded as instructed.

  3. Mehdi Gmira reporter

    Ok I think that I understand what you're saying. Indeed the ORM should send an extra query in the first case because nothing guarantees that every student in sport.students belongs to school.students. But what if i know that ? Is there a way to tell the ORM that he should use joinedload('students').joinedload('sport') to populate sport.students ?

  4. Mike Bayer repo owner

    because nothing guarantees that every student in sport.students belongs to school.students

    that's not why sport.students is not populated. It's not populated because there is no path "*.sports.students" indicated in the joinedload instructions. in the first case, you'd have to have said, joinedload("sports").joinedload("students"). collections don't get populated as an indirect effect of objects within those collection being loaded elsewhere; this will work for a simple many-to-one via the identity map but not for one-to-many etc.

  5. Mehdi Gmira reporter

    Ok, yes i understand what you're saying. But is there a way to still populate sport.students based on joinedload('students').joinedload('sport') ? Something like contains_eager but that works with joinedload instead of join ?

  6. Mehdi Gmira reporter

    Can you look at my first etst case with the association table ? I know that the right way to be sure that everything is eager_loaded is to do

    school = session.query(School).options(joinedload('sports').joinedload('students'), joinedload('students').joinedload('sports')).one()
    

    but that is inefficient because I'm joining twice. I would like to do something like

    school = session.query(School).options(joinedload('sports').joinedload('students'), joinedload('students').tell_orm_that_sports_have_already_been_loaded()).one()
    
  7. Mike Bayer repo owner

    as Student objects are loaded, you could peek inside of them yourself, look at the sport_id attribute, and line them up yourself, sure. SQLAlchemy will never do this directly because it's reinventing the SELECT that the database already does very well. For an example of how to populate collections directly see https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/DisjointEagerLoading. Here, you wouldn't even bother with a second query if you're sure that your original query has every object you want.

    from itertools import groupby
    
    students_by_sport = dict(
        (k, list(v)) for k, v in groupby(
            sorted(school.students, key=lambda student: student.sport_id),
            lambda student: student.sport_id
        )
    )
    
    # piece together !
    from sqlalchemy.orm import attributes
    
    for sport in school.sports:
        attributes.set_committed_value(
            sport, "students", students_by_sport.get(sport.id, ()))
    
  8. Mike Bayer repo owner

    if you're absolutely sure you have all the records needed in your results, if you use join() explicitly then these are easily targeted by contains_eager(). a joinedload() join is anonymous so that won't work w/ contains_eager().

    In the first example, this works, assuming the query itself returns all the student/sport combinations:

    from sqlalchemy.orm import contains_eager
    
    school = session.query(School).\
        join(School.sports).join(Sport.students).\
        options(
            contains_eager(School.sports).
            contains_eager(Sport.students).
            contains_eager(Student.sports),
            contains_eager(School.students)
        ).one()
    
    
    print "----------------"
    print school.sports[0].students
    print school.students[0].sports
    print school.sports[0].students[0].sports
    
  9. Mehdi Gmira reporter

    I thought of doing an explicit join, but in my real query I have a bunch of subquery_loads as well. And the join makes those subqueries less efficient. The first solution is great though. Anyway, thank you. You've been very helpful as always.

  10. Log in to comment