- edited description
1.0.0b4: Eager-loading options result in wrong query
Hi,
The attached example runs fine with 0.9.7, but generates a wrong sub-select with 1.0.0b4:
(SELECT dns_record_1.id AS dns_record_1_id, branch_1.id AS branch_1_id, branch_1.name AS branch_1_name
FROM dns_record AS dns_record_1 JOIN branch AS branch_1 ON branch_1.id = host.branch_id, host)
Comments (14)
-
Account Deleted reporter -
repo owner this is part of
#3008. Behavior can be disabled:class Host(Base): __tablename__ = 'host' id = Column(Integer, ForeignKey(Machine.id), primary_key=True) branch_id = Column(Integer, ForeignKey(Branch.id), nullable=False) machine = relation(Machine, innerjoin=True) branch = relation(Branch, innerjoin="unnested")
-
repo owner -
repo owner So I have a way to make this "work" for now, though it requires that it degrades into "unnested" behavior in this kind of case.
We've built up an eager join from service_server to host, machine, and dns_record:
service_server LEFT OUTER JOIN ( host AS host_1 JOIN machine AS machine_1 ON machine_1.id = host_1.id ) ON host_1.id = service_server.host_id LEFT OUTER JOIN dns_record AS dns_record_1 ON dns_record_1.id = machine_1.primary_name_id
now we need to also join out to "branch". I guess the ideal with "nested" would be that we do this:
service_server LEFT OUTER JOIN ( host AS host_1 JOIN machine AS machine_1 ON machine_1.id = host_1.id JOIN branch ON branch_1.id = host_1.branch_id ) ON host_1.id = service_server.host_id LEFT OUTER JOIN dns_record AS dns_record_1 ON dns_record_1.id = machine_1.primary_name_id
unfortunately I don't know that digging into the clause like that is very simple, I'm not sure that the joined loading strategy is tracking enough information that we can poke around and figure out exactly where the join should hang off from without error-prone heuristics. So the easier way for now is just to break out of "nested innerjoin" more quickly, by adding a more pessimistic check, and we instead get this:
service_server LEFT OUTER JOIN ( host AS host_1 JOIN machine AS machine_1 ON machine_1.id = host_1.id ) ON host_1.id = service_server.host_id LEFT OUTER JOIN dns_record AS dns_record_1 ON dns_record_1.id = machine_1.primary_name_id LEFT OUTER JOIN branch AS branch_1 ON branch_1.id = host_1.branch_id
which at least is still maintaining one of the inner joins, so still fewer outer joins than not using "nested" at all. This issue really affects 0.9 as well, if someone were using all "nested" innerjoins which is only an option there (here it's the default).
the diff is:
diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py index 0b2672d..c5158ca 100644 --- a/lib/sqlalchemy/orm/strategies.py +++ b/lib/sqlalchemy/orm/strategies.py @@ -1333,7 +1333,8 @@ class JoinedLoader(AbstractRelationshipLoader): assert clauses.aliased_class is not None join_to_outer = innerjoin and isinstance(towrap, sql.Join) and \ - towrap.isouter + towrap.isouter and \ + towrap.right.is_derived_from(self.parent.mapped_table) if chained_from_outerjoin and \ join_to_outer and innerjoin != 'unnested':
probably needs a note in strategy_options -> def joinedload() that the "nested" feature will pessimistically degrade in the case of multiple paths from the same start point.
i may try to still make it figure out how to find the best spot within the inner join.
-
Account Deleted reporter Is there a way to control the behavior of innerjoin on a per-backend basis? SQLite is used mostly for testing purposes, so I don't really care about efficiency there; however, it would be nice to get all the benefits on more capable backends.
-
repo owner this isn't local to the sqlite join rewriting feature. the nesting is wrong on all backends, it just looks much uglier on sqlite.
-
repo owner - attached 3347.patch
well, this is a pretty impressive patch that generalizes the joined loader "splicing" a lot more, and could also have individual tests. However, I have to decide if I want to be conservative with the "beta4" thing and just roll back
#3008entirely for 1.0 and push it to 1.1, or if I can spend some time with this and feel confident it's good for 1.0. -
repo owner your original script using SQLite with the patch will give this query for the subqueryload:
SELECT service_server.id AS service_server_id, service_server.service_id AS service_server_service_id, service_server.host_id AS service_server_host_id, anon_1.service_id AS anon_1_service_id, dns_record_1.id AS dns_record_1_id, anon_2.machine_1_id AS machine_1_id, anon_2.machine_1_primary_name_id AS machine_1_primary_name_id, anon_2.branch_1_id AS branch_1_id, anon_2.branch_1_name AS branch_1_name, anon_2.host_1_id AS host_1_id, anon_2.host_1_branch_id AS host_1_branch_id FROM (SELECT service.id AS service_id FROM service) AS anon_1 JOIN service_server ON anon_1.service_id = service_server.service_id LEFT OUTER JOIN ( SELECT host_1.id AS host_1_id, host_1.branch_id AS host_1_branch_id, machine_1.id AS machine_1_id, machine_1.primary_name_id AS machine_1_primary_name_id, branch_1.id AS branch_1_id, branch_1.name AS branch_1_name FROM (host AS host_1 JOIN machine AS machine_1 ON machine_1.id = host_1.id) JOIN branch AS branch_1 ON branch_1.id = host_1.branch_id ) AS anon_2 ON anon_2.host_1_id = service_server.host_id LEFT OUTER JOIN dns_record AS dns_record_1 ON dns_record_1.id = anon_2.machine_1_primary_name_id ORDER BY anon_1.service_id
all your "innerjoins" are honored even with the sqlite rewriting, I think that's pretty cool.
-
repo owner - changed status to resolved
- Fixed a bug related to "nested" inner join eager loading, which
exists in 0.9 as well but is more of a regression in 1.0 due to
3008
which turns on "nested" by default, such that a joined eager load that travels across sibling paths from a common ancestor using innerjoin=True will correctly splice each "innerjoin" sibling into the appropriate part of the join, when a series of inner/outer joins are mixed together. fixes#3347
→ <<cset 1a56177c21cd>>
-
Account Deleted reporter Running our testsuite, I get a different failure with this change applied:
Traceback: <type 'exceptions.AssertionError'>: assertion failed attempting to produce joined eager loads sqlalchemy/orm/query.py:2482:one sqlalchemy/orm/loading.py:84:instances sqlalchemy/util/compat.py:199:raise_from_cause sqlalchemy/orm/loading.py:69:instances sqlalchemy/orm/loading.py:595:polymorphic_instance sqlalchemy/orm/loading.py:426:_instance sqlalchemy/orm/loading.py:495:_populate_full sqlalchemy/orm/strategies.py:1037:load_collection_from_subq sqlalchemy/orm/strategies.py:974:get sqlalchemy/orm/strategies.py:982:_load sqlalchemy/orm/query.py:2521:__iter__ sqlalchemy/orm/query.py:2965:_compile_context sqlalchemy/orm/strategies.py:1352:_create_eager_join sqlalchemy/orm/strategies.py:1413:_splice_nested_inner_join
I didn't have time to look into what kind of queries are throwing these exceptions yet.
-
Account Deleted reporter - attached test2.py
The assertion failure seems to be caused by the use of a non-primary mapper. Test case attached - it may be simplified more, but this version is reasonably close to the "real thing" we're using.
-
repo owner - changed status to open
that's actually kind of great you have a test that hits that assertion! out of however many hundreds of tests on my end none of them do.
-
repo owner - changed status to resolved
- further fixes for
#3347; track the mappers we're joining between fully and match on those, rather than trying to compare selectables; fixes#3347
→ <<cset dedc85ba9d8e>>
-
repo owner OK! try again! thanks
- Log in to comment