- edited description
joinedload not working as expected ?
Hey,
I've been experiencing some problems when using association tables :
Here is the code to set up the database. Basically I have a school table that has a one-to-many relationship to students and sports. While students and sports are linked together by a many-to-many relationship through an association table.
from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func, Table, create_engine
from sqlalchemy.orm import relationship, backref, sessionmaker, joinedload
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
ods_legs_association_table = Table('students_sports', Base.metadata,
Column('sport_id', Integer, ForeignKey('sport.id', ondelete='CASCADE')),
Column('student_id', Integer, ForeignKey('student.id', ondelete='CASCADE'))
)
class School(Base):
__tablename__ = 'school'
id = Column(Integer, primary_key=True)
name = Column(String)
class Sport(Base):
__tablename__ = 'sport'
id = Column(Integer, primary_key=True)
name = Column(String)
school_id = Column(Integer, ForeignKey('school.id'))
school = relationship('School', backref='sports')
students = relationship("Student", secondary=ods_legs_association_table, backref="sports")
class Student(Base):
__tablename__ = 'student'
id = Column(Integer, primary_key=True)
name = Column(String)
school_id = Column(Integer, ForeignKey('school.id'))
school = relationship('School', backref='students')
engine = create_engine('postgresql://xxxx@localhost/test', echo=True)
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
Session = sessionmaker(engine)
session = Session()
school = School(name="Foo School")
students = [Student(name=letter) for letter in 'ABCD']
sports = [Sport(name='football'), Sport(name='tennis')]
for student in students:
student.sports = sports
school.students = students
school.sports = sports
session.add(school)
session.commit()
Then I just query the school table with: - a joinedload on students - a joinedload on sports and a joinedload on student.sports
It is my understanding that doing this insures that none of the following statements should emit an sql query: -school.students -school.sports -school.sports[0].students -school.students[0].sports
But it does not work:
school = session.query(School).options(joinedload('sports').joinedload('students'), joinedload('students')).one()
school.sports[0].students # doesn't emit SQL statement
school.students[0].sports # emits SQL statement
school.sports[0].students[0].sports # also emits SQL
Also, the SQL statement looks weird (i would expected a join instead on this cross join):
SELECT sport.id AS sport_id, sport.name AS sport_name, sport.school_id AS sport_school_id
FROM sport, students_sports
WHERE %(param_1)s = students_sports.student_id AND sport.id = students_sports.sport_id
I don't understand why this is happening. For me, joinedload('sports').joinedload('students') tells SQLAlchemy to load students related to each sport, and thus sports related to each students.
Comments (13)
-
reporter -
reporter - edited description
-
reporter - changed title to joinedload not working as expected ?
-
reporter Actually, this is is not related to association tables:
from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func, Table, create_engine from sqlalchemy.orm import relationship, backref, sessionmaker, joinedload from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() class School(Base): __tablename__ = 'school' id = Column(Integer, primary_key=True) name = Column(String) class Sport(Base): __tablename__ = 'sport' id = Column(Integer, primary_key=True) name = Column(String) school_id = Column(Integer, ForeignKey('school.id')) school = relationship('School', backref='sports') class Student(Base): __tablename__ = 'student' id = Column(Integer, primary_key=True) name = Column(String) school_id = Column(Integer, ForeignKey('school.id')) sport_id = Column(Integer, ForeignKey('sport.id')) school = relationship('School', backref='students') sport = relationship('Sport', backref='students') engine = create_engine('postgresql://xxxx@localhost/test', echo=True) Base.metadata.drop_all(engine) Base.metadata.create_all(engine) Session = sessionmaker(engine) session = Session() school = School(name="Foo School") students = [Student(name=letter) for letter in 'ABCD'] sports = [Sport(name='football'), Sport(name='tennis')] for i, student in enumerate(students): student.sport = sports[i % 2] school.students = students school.sports = sports session.add(school) session.commit() school = session.query(School).options(joinedload('students').joinedload('sport'), joinedload('sports')).one() school.students[0].sport # doesn't emit SQL statement school.sports[0].students # emits SQL statement school = session.query(School).options(joinedload('sports').joinedload('students'), joinedload('students')).one() # everything works great school.students[0].sport school.sports[0].students
changing joinedload('students').joinedload('sport') to joinedload('sports').joinedload('students') fixes the problem. This isn't the expected behaviour, right ?
-
repo owner Thanks very much for posting complete test cases and details. So I'm only looking at your second example, since you said this illustrates the issue. I see:
school = session.query(School).options(joinedload('students').joinedload('sport'), joinedload('sports')).one() school.sports[0].students # emits SQL statement
Why would the "students" collection of any Sport object not be empty? That would require an additional joinedload of "students" off of "sports" or "sport". Also, when eager loading like this where the same entity can appear twice in a row, it's not deterministic which "path" will be chosen to populate the collection, so it is additionally difficult to determine exactly what will happen when that's the case.
Setting create engine as:
engine = create_engine('sqlite://', echo='debug')
The first query illustrates:
SELECT school.id AS school_id, school.name AS school_name, sport_1.id AS sport_1_id, sport_1.name AS sport_1_name, sport_1.school_id AS sport_1_school_id, sport_2.id AS sport_2_id, sport_2.name AS sport_2_name, sport_2.school_id AS sport_2_school_id, student_1.id AS student_1_id, student_1.name AS student_1_name, student_1.school_id AS student_1_school_id, student_1.sport_id AS student_1_sport_id FROM school LEFT OUTER JOIN sport AS sport_1 ON school.id = sport_1.school_id LEFT OUTER JOIN student AS student_1 ON school.id = student_1.school_id LEFT OUTER JOIN sport AS sport_2 ON sport_2.id = student_1.sport_id 2016-07-20 10:42:49,640 INFO sqlalchemy.engine.base.Engine () 2016-07-20 10:42:49,641 DEBUG sqlalchemy.engine.base.Engine Col ('school_id', 'school_name', 'sport_1_id', 'sport_1_name', 'sport_1_school_id', 'sport_2_id', 'sport_2_name', 'sport_2_school_id', 'student_1_id', 'student_1_name', 'student_1_school_id', 'student_1_sport_id') 2016-07-20 10:42:49,643 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'football', 1, 1, u'A', 1, 1) 2016-07-20 10:42:49,644 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'tennis', 1, 3, u'B', 1, 2) 2016-07-20 10:42:49,644 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'football', 1, 2, u'C', 1, 1) 2016-07-20 10:42:49,644 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'tennis', 1, 4, u'D', 1, 2) 2016-07-20 10:42:49,644 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 1, u'football', 1, 1, u'A', 1, 1) 2016-07-20 10:42:49,644 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 2, u'tennis', 1, 3, u'B', 1, 2) 2016-07-20 10:42:49,645 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 1, u'football', 1, 2, u'C', 1, 1) 2016-07-20 10:42:49,645 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 2, u'tennis', 1, 4, u'D', 1, 2)
the second illustrates:
SELECT school.id AS school_id, school.name AS school_name, sport_1.id AS sport_1_id, sport_1.name AS sport_1_name, sport_1.school_id AS sport_1_school_id, student_1.id AS student_1_id, student_1.name AS student_1_name, student_1.school_id AS student_1_school_id, student_1.sport_id AS student_1_sport_id, student_2.id AS student_2_id, student_2.name AS student_2_name, student_2.school_id AS student_2_school_id, student_2.sport_id AS student_2_sport_id FROM school LEFT OUTER JOIN sport AS sport_1 ON school.id = sport_1.school_id LEFT OUTER JOIN student AS student_1 ON sport_1.id = student_1.sport_id LEFT OUTER JOIN student AS student_2 ON school.id = student_2.school_id 2016-07-20 10:48:49,805 INFO sqlalchemy.engine.base.Engine () 2016-07-20 10:48:49,806 DEBUG sqlalchemy.engine.base.Engine Col ('school_id', 'school_name', 'sport_1_id', 'sport_1_name', 'sport_1_school_id', 'student_1_id', 'student_1_name', 'student_1_school_id', 'student_1_sport_id', 'student_2_id', 'student_2_name', 'student_2_school_id', 'student_2_sport_id') 2016-07-20 10:48:49,807 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'A', 1, 1, 1, u'A', 1, 1) 2016-07-20 10:48:49,807 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'A', 1, 1, 3, u'B', 1, 2) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'A', 1, 1, 2, u'C', 1, 1) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 1, u'A', 1, 1, 4, u'D', 1, 2) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'C', 1, 1, 1, u'A', 1, 1) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'C', 1, 1, 3, u'B', 1, 2) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'C', 1, 1, 2, u'C', 1, 1) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 1, u'football', 1, 2, u'C', 1, 1, 4, u'D', 1, 2) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 3, u'B', 1, 2, 1, u'A', 1, 1) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 3, u'B', 1, 2, 3, u'B', 1, 2) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 3, u'B', 1, 2, 2, u'C', 1, 1) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 3, u'B', 1, 2, 4, u'D', 1, 2) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 4, u'D', 1, 2, 1, u'A', 1, 1) 2016-07-20 10:48:49,808 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 4, u'D', 1, 2, 3, u'B', 1, 2) 2016-07-20 10:48:49,809 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 4, u'D', 1, 2, 2, u'C', 1, 1) 2016-07-20 10:48:49,809 DEBUG sqlalchemy.engine.base.Engine Row (1, u'Foo School', 2, u'tennis', 1, 4, u'D', 1, 2, 4, u'D', 1, 2)
the SQL is correct in each case given the order of joinedload() and in the second example, we can see that "sport.students" is loaded as instructed.
-
reporter Ok I think that I understand what you're saying. Indeed the ORM should send an extra query in the first case because nothing guarantees that every student in sport.students belongs to school.students. But what if i know that ? Is there a way to tell the ORM that he should use joinedload('students').joinedload('sport') to populate sport.students ?
-
repo owner because nothing guarantees that every student in sport.students belongs to school.students
that's not why sport.students is not populated. It's not populated because there is no path "*.sports.students" indicated in the joinedload instructions. in the first case, you'd have to have said, joinedload("sports").joinedload("students"). collections don't get populated as an indirect effect of objects within those collection being loaded elsewhere; this will work for a simple many-to-one via the identity map but not for one-to-many etc.
-
reporter Ok, yes i understand what you're saying. But is there a way to still populate sport.students based on joinedload('students').joinedload('sport') ? Something like contains_eager but that works with joinedload instead of join ?
-
reporter Can you look at my first etst case with the association table ? I know that the right way to be sure that everything is eager_loaded is to do
school = session.query(School).options(joinedload('sports').joinedload('students'), joinedload('students').joinedload('sports')).one()
but that is inefficient because I'm joining twice. I would like to do something like
school = session.query(School).options(joinedload('sports').joinedload('students'), joinedload('students').tell_orm_that_sports_have_already_been_loaded()).one()
-
repo owner as Student objects are loaded, you could peek inside of them yourself, look at the sport_id attribute, and line them up yourself, sure. SQLAlchemy will never do this directly because it's reinventing the SELECT that the database already does very well. For an example of how to populate collections directly see https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/DisjointEagerLoading. Here, you wouldn't even bother with a second query if you're sure that your original query has every object you want.
from itertools import groupby students_by_sport = dict( (k, list(v)) for k, v in groupby( sorted(school.students, key=lambda student: student.sport_id), lambda student: student.sport_id ) ) # piece together ! from sqlalchemy.orm import attributes for sport in school.sports: attributes.set_committed_value( sport, "students", students_by_sport.get(sport.id, ()))
-
repo owner if you're absolutely sure you have all the records needed in your results, if you use join() explicitly then these are easily targeted by contains_eager(). a joinedload() join is anonymous so that won't work w/ contains_eager().
In the first example, this works, assuming the query itself returns all the student/sport combinations:
from sqlalchemy.orm import contains_eager school = session.query(School).\ join(School.sports).join(Sport.students).\ options( contains_eager(School.sports). contains_eager(Sport.students). contains_eager(Student.sports), contains_eager(School.students) ).one() print "----------------" print school.sports[0].students print school.students[0].sports print school.sports[0].students[0].sports
-
repo owner - changed status to closed
we're well into question mark territory so closing this.
-
reporter I thought of doing an explicit join, but in my real query I have a bunch of subquery_loads as well. And the join makes those subqueries less efficient. The first solution is great though. Anyway, thank you. You've been very helpful as always.
- Log in to comment