Subquery loading off of an "already seen" row
Given the following relations:
Company
has manyemployees
and aparent
companyEmployee
has anemployer
And the following instances:
- Apple w/ employees Tim Cook & Jony Ive
- Beats w/ employees Dr. Dre & Jimmy Iovine
The following query does not load everything:
employees = session.query(Employee).options(
joinedload('employer').joinedload('parent').subqueryload('employees')
).filter().all()
It appears that it finds all employees and parents properly, but since all the companies get loaded into the session via the joined load, it seems to stop processing at this point and doesn't actually perform the subqueryload of employees
. This example may seem contrived since it's circling back around through the relationships, but it's a simplified version of something that w/ filters applied is a very real query in our system.
Below is some code to run to show the problem. Just change the connection string to be able to connect to your database, the code will populate a few records for you.
import logging
from contextlib import contextmanager
from sqlalchemy import (
Column,
ForeignKey,
Integer,
MetaData,
String,
Table,
create_engine,
inspect,
)
from sqlalchemy.orm import (
mapper,
relationship,
sessionmaker,
subqueryload,
joinedload,
)
class Model(object):
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, key, value)
class Company(Model): pass
class Employee(Model): pass
engine = create_engine(
'mysql+pymysql://root:password@localhost/test?charset=utf8mb4',
convert_unicode=True,
)
metadata = MetaData(bind=engine)
Session = sessionmaker(bind=engine)
company_table = Table(
'companies', metadata,
Column('id', Integer, primary_key=True, autoincrement=True),
Column('name', String(255)),
Column('parent_id', Integer, ForeignKey('companies.id')),
)
employee_table = Table(
'employees', metadata,
Column('id', Integer, primary_key=True, autoincrement=True),
Column('name', String(255)),
Column('employer_id', Integer, ForeignKey('companies.id')),
)
company_mapper = mapper(Company, company_table, properties={
'parent': relationship(
Company,
uselist=False,
remote_side=[company_table.columns.id],
),
'employees': relationship(
Employee,
uselist=True,
back_populates='employer',
),
})
employee_mapper = mapper(Employee, employee_table, properties={
'employer': relationship(
Company,
uselist=False,
back_populates='employees',
)
})
connection = engine.connect()
metadata.create_all(connection)
connection.close()
@contextmanager
def session_manager():
session = Session()
try: yield session; session.commit()
except: session.rollback(); raise
finally: session.close()
def _ensure_loaded(obj, key_path):
components = key_path.split('.')
for i, key in enumerate(components):
if key in inspect(obj).unloaded:
raise AttributeError('Attribute {} not loaded during query'.format(
'.'.join(components[:i+1])
))
obj = getattr(obj, key)
if obj is None:
break
models = [
Company(id=1, name='Apple'),
Company(id=2, name='Beats', parent_id=1),
Employee(id=1, name='Tim Cook', employer_id=1),
Employee(id=2, name='Jony Ive', employer_id=1),
Employee(id=3, name='Dr. Dre', employer_id=2),
Employee(id=4, name='Jimmy Iovine', employer_id=2),
]
for model in models:
with session_manager() as session:
try:
session.add(model)
session.commit()
except Exception as e: session.close()
logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)
with session_manager() as session:
employees = session.query(Employee).options(
joinedload('employer').joinedload('parent').subqueryload('employees')
).filter(Employee.id == 3).all()
for employee in employees:
_ensure_loaded(employee, 'employer.parent.employees')
# this block of code fails presumably because the parent employer is loaded into the session during
# the first joinedload of the employer which doesn't trigger the subqueryload for parent.employees
# for the company that actually does have a parent company.
with session_manager() as session:
employees = session.query(Employee).options(
joinedload('employer').joinedload('parent').subqueryload('employees')
).filter().all()
for employee in employees:
_ensure_loaded(employee, 'employer.parent.employees')
Comments (10)
-
repo owner -
repo owner it's not totally clear this should be changed (though probably), but in any case your analysis is correct, so it would follow that to get the load you want you'd need to do this:
with session_manager() as session: employees = session.query(Employee).options( joinedload('employer').joinedload('parent').subqueryload('employees'), defaultload('employer').subqueryload('employees') ).order_by(Employee.id).all()
otherwise, it's ambiguous when we come across a "Company" object, if it is to be loaded from two different contexts in the same result set, only one wins out. If one loader path set up joinedload for employer.employees and another subqueryload, it depends on the ORDER BY how that would work out.
-
repo owner - changed title to Subquery loading off of an "already seen" row
-
repo owner -
Account Deleted reporter - edited description
-
Account Deleted reporter @zzzeek yes, sorry for the mixup with the
employer_id
, I've fixed the example for clarity.And your suggestion for the load is actually how I've worked around the problem, though repeated the
joinedload
, so this, which is effectively the same:with session_manager() as session: employees = session.query(Employee).options( joinedload('employer').joinedload('parent').subqueryload('employees'), joinedload('employer').subqueryload('employees') ).order_by(Employee.id).all()
The issue with doing this is that you're loading data unnecessarily just to ensure the
subqueryload
is satisfied.The sequence
joinedload('employer').joinedload('parent').subqueryload('employees')
says to me, load all employers and and their parent company, and I need the employees of all parent companies. In our example, that's the employees of Apple when looking up employees Dr. Dre or Jimmy Iovine (since their employer has a parent company of Apple).Adding the sequence
joinedload('employer').subqueryload('employees')
says to load the employer's employees as well. That means loading all of Beats employees even though that relation may never be used.Again, I know this example feels a little contrived, but with our schema & the models/relations that we have set up, it's very real.
--
You mentioned
ORDER BY
as something that affects this query, but I'm not certain that seems relevant here (though in our code base, we have applied that properly). Thejoinedload
will always load the employer and the parent as part of the initial query, so that will always load those values into the result set first. So it's just a matter of how & what data's being returned in the result set whether that value will "seen" before reaching thesubqueryload
. But now re-reading your comment, I think you may have been suggesting something different. I'm going to leave this little comment in case it furthers the conversation in any way, though. -
Account Deleted reporter Also, thanks for the quick turnaround getting back to me!
-
repo owner - changed milestone to 1.2
-
repo owner this was reported separately in
#3854and a fix is moving through today. -
repo owner - changed status to duplicate
Duplicate of
#3854. - Log in to comment
I think you meant to map "company_id" and not "employer_id" (or vice versa), as it stands the program has None for all .employer attributes because the setup refers to a non-mapped name "company_id".
otherwise this is kind of a similar thing to
#3431, though more generally since it occurs for collections also.