- edited description
allow for func.xyz() to be treated as a "binary" in terms of ORM relationships
I'm trying to follow the documentation as described here: http://docs.sqlalchemy.org/en/latest/orm/join_conditions.html#non-relational-comparisons-materialized-path
This works as expected, however when I try to use (any) func
function this fails. I noticed this feature is experimental, but is it possible to fix it for this use case?
Error:
ArgumentError: Relationship Venue.parents could not determine any unambiguous local/remote column pairs based on join condition and remote_side arguments. Consider using the remote() annotation to accurately mark those elements of the join condition that are on the remote side of the relationship.
The issue is also documented here: http://stackoverflow.com/questions/38006116/how-can-i-create-a-many-to-many-relationship-with-sqlalchemy-using-a-sql-functio
Minimal Test case:
import unittest
from sqlalchemy import (Column, Integer, String, func)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import (relationship, joinedload)
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from sqlalchemy.orm import foreign
from sqlalchemy.orm import remote
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = ("sqlite://")
db = SQLAlchemy(app)
Base = declarative_base()
class Venue(db.Model):
__tablename__ = 'venue'
id = Column(Integer, primary_key=True, nullable=False)
name = Column(String(254))
parents = relationship(
"Venue",
# doesn't work
primaryjoin=func.substring(remote(foreign(name)), name),
# works
# primaryjoin=remote(foreign(name)).like('%' + name + '%'),
viewonly=True,
order_by=remote(foreign(name))
)
db.drop_all()
db.create_all()
class TestSelfJoin(unittest.TestCase):
def test_self_join(self):
query = Venue.query.options(joinedload(Venue.parents))
import sqlalchemy.dialects.postgresql as postgresql
print query.statement.compile(dialect=postgresql.dialect())
Official response
Comments (31)
-
reporter -
reporter - edited description
-
repo owner you can use a function in a primaryjoin, but the primaryjoin needs to have a comparison operator between the two columns to be compared to each other (or the same col in the self-referential case). The example on the SO issue at least uses a comparison function, but within SQLAlchemy this is still not of the form "operand operator operand". Currently the primaryjoin has to have an expression including that basic geometry. The enhancement here would require that either the func.foo() can be converted into a "binary" expression or it otherwise gives some clue that it has a "left" and "right" side which would then be consumable by the relationship mechanics.
This can be made to work right now using a custom construct that provides for "binary" as the visitable name but still renders the SQL function at the compile level using @compiles. This would be hacky though.
I can think of a few ways this would be made to work internally, one is to replace all the visit_binary() throughout the ORM with some more generalized "visit_left_and_right_comparison" type of thing that would be also added to the sql package, that would be a very big deal. Another might be to create some weird operator that embeds into BinaryExpression but generates as a function instead - there are some similar behaviors like that right now. In either case, there would need to be some method on a Function that produces this construct, e.g. func.my_whatever(x, y, z).compares(left, right).
-
repo owner - changed milestone to 1.2
- marked as enhancement
- changed component to orm
- changed title to allow for func.xyz() to be treated as a "binary" in terms of ORM relationships
1.2 is only tentative milestone, lots of issues will need to be moved out into later releases at some point.
-
repo owner here's a proof of concept. If you can run this through the paces that might make this easier to turn into a patch.
from sqlalchemy import * from sqlalchemy.orm import * from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.sql import elements from sqlalchemy.ext.compiler import compiles from sqlalchemy.sql import operators class FunctionAsBinary(elements.BinaryExpression): def __init__(self, fn, left, right): super(FunctionAsBinary, self).__init__( left, right, operators.custom_op("fake", is_comparison=True)) self.fn = fn self.sql_function = fn(left, right) def _copy_internals(self, **kw): super(FunctionAsBinary, self)._copy_internals(**kw) self.sql_function = self.fn(self.left, self.right) @compiles(FunctionAsBinary) def _func_as_binary(elem, compiler, **kw): return compiler.process(elem.sql_function, **kw) Base = declarative_base() class Venue(Base): __tablename__ = 'venue' id = Column(Integer, primary_key=True, nullable=False) name = Column(String(254)) parents = relationship( "Venue", primaryjoin=FunctionAsBinary( func.substring, remote(foreign(name)), name ), viewonly=True, order_by=name ) s = Session() query = s.query(Venue).options(joinedload(Venue.parents)) print query
-
reporter Thank you very much for your quick reply @zzzeek!
Real world use case: We have
Region
objects with columnarea
(PostGIS polygon geometry) and we want to obtain allRegion
objects in relationship with containingarea
usingST_Contains
.I'm not sure why
remote(foreign(name)).like('%' + name + '%')
works butremote(foreign(area)).ST_Contains(area)
doesn't. According to what you wrote they should be treated similar? Or is there a specific exception for thelike
operator? I noticed thatilike
also doesn't work.I've currently implemented this for our use case using
column_property
, nested selects and json serialization provided by PostgreSQL itself. However this isn't great since we rely on relationships for generating queries automatically against object hierarchies and for dynamic json serialization. Hence it would be desirable to get this to work as a relationship.Which direction do you think would be most promising to investigate for getting this to work (hacky is fine for now)?
-
reporter And you were quicker than I was here. I'll take a look at the code you posted now!
-
repo owner POC
#2start with allowing custom_op to work with
@compiles
, which I'd like to do anyway:diff --git a/lib/sqlalchemy/sql/compiler.py b/lib/sqlalchemy/sql/compiler.py index b8c897c..a9ac704 100644 --- a/lib/sqlalchemy/sql/compiler.py +++ b/lib/sqlalchemy/sql/compiler.py @@ -957,6 +957,12 @@ class SQLCompiler(Compiled): return text def _get_operator_dispatch(self, operator_, qualifier1, qualifier2): + if hasattr(operator_, '_compiler_dispatch'): + def disp(binary, operator_, **kw): + return operator_._compiler_dispatch( + self, binary=binary, **kw) + return disp + attrname = "visit_%s_%s%s" % ( operator_.__name__, qualifier1, "_" + qualifier2 if qualifier2 else "")
then do it all inside of an operator:
class FunctionOp(operators.custom_op): def __init__(self, sql_function): super(FunctionOp, self).__init__(None, is_comparison=True) self.sql_function = sql_function @compiles(FunctionOp) def _func_as_binary(elem, compiler, binary, **kw): return compiler.process(elem.sql_function(binary.left, binary.right), **kw) Base = declarative_base() class Venue(Base): __tablename__ = 'venue' id = Column(Integer, primary_key=True, nullable=False) name = Column(String(254)) parents = relationship( "Venue", primaryjoin=FunctionOp(func.substring)(remote(foreign(name)), name), viewonly=True, order_by=name )
-
reporter This works. However there is an issue when the column is not actually returned from the object. Let me get a short example together here.
-
reporter In your proof of concept replace the line
query = s.query(Venue).options(joinedload(Venue.parents))
with
query = s.query(Venue).options(load_only("id")).options(joinedload(Venue.parents)).from_self()
this then results in the sql
SELECT anon_1.venue_id AS anon_1_venue_id, venue.name AS venue_name, venue_1.id AS venue_1_id, venue_1.name AS venue_1_name FROM venue, (SELECT venue.id AS venue_id FROM venue) AS anon_1 LEFT OUTER JOIN venue AS venue_1 ON substring(venue_1.name, venue.name) ORDER BY venue_1.name
which is incorrect since a different
venue
table is introduced. -
reporter Alternative solution for reference only:
For reference I'm posting a solution to this using column_property. For me this wasn't a great solution, but it might help others. The downside is that returned columns can not be defined dynamically:
@event.listens_for(mapper, "mapper_configured") def mapper_listener(mapper, cls): # reference: https://groups.google.com/forum/#!topic/sqlalchemy/BYEj42-vEO8 if not issubclass(cls, Market): return other = aliased(cls) order_by = func.ST_Area(other.area) parents = select([ other.id.label("id"), other.name.label("name") ]).where(and_( # only include areas that contain this area func.ST_Contains(other.area, cls.area), # do not select the market itself cls.id != other.id )).correlate_except(other).alias("parents") cls.parents = column_property( select([ func.array_agg( func.row_to_json(text("parents")), order_by=order_by ) ]).select_from( # Sub-select is needed to display the column names correctly parents ), deferred=True )
-
repo owner @lukas-gitl try the second recipe I have with the custom operator. that is much more localized to just the operator and your second query seems to work. For the first example, there's some more methods that would need to be on FunctionAsBinary that would probably allow it to be interpreted correctly in that case.
As I'm looking at the internals, for this feature to be worth it, it would have to not assume the SQL function is a simple fn(a, b). A SQL function might take any number of arguments but still be a comparison between any two arbitrary elements within its argument list. This sort of makes a real "feature" point more towards the custom BinaryExpression / FunctionElement subclass all over again. it would ideally look like:
parents = relationship( "Venue", primaryjoin=func.substring( remote(foreign(name)), name).compares_between(0, 1), viewonly=True, order_by=name )
the returned object would need to trip "visit_binary" within the context of relationship setup and "visit_function" within the context of SQL compilation. A new architectural feature to Visitable might be most appropriate here (new optional attribute compiler_visit_name most likely).
-
reporter We are still running SQLAlchemy 1.0.X, so the patch will probably look slightly different. Looking into how much work it would be to upgrade to 1.1.X (probably a lot). Would a similar patch be possible for 1.0.X?
-
repo owner @lukas-gitl this is a new feature so targeted @ 1.2 at the earliest.
-
reporter I was referring to the fact that
def _get_operator_dispatch(self, operator_, qualifier1, qualifier2):
is not present in Version 1.0.X. So I have no idea what I would have to change to get your second recipe working. -
repo owner I'm trying the completely standalone recipe and not seeing the "anon" query you're referring towards. I can get that query if I add say a limit(1), but it doesn't have the extra "venue" token in it.
-
reporter Weird that you don't get anon. I get the same query for both SQLAlchemy versions 1.0.12 and 1.1.2.
Here is the full example:
Query 1:
SELECT anon_1.venue_id AS anon_1_venue_id, venue.name AS venue_name, venue_1.id AS venue_1_id, venue_1.name AS venue_1_name FROM -- using venue in the on clause will return all existing venues in the relationship venue, (SELECT venue.id AS venue_id FROM venue) AS anon_1 LEFT OUTER JOIN venue AS venue_1 ON substring(venue_1.name, venue.name) ORDER BY venue_1.name
Query 2:
SELECT anon_1.venue_id AS anon_1_venue_id, anon_1.venue_name AS anon_1_venue_name, venue_1.id AS venue_1_id, venue_1.name AS venue_1_name FROM (SELECT venue.id AS venue_id, venue.name AS venue_name FROM venue) AS anon_1 LEFT OUTER JOIN venue AS venue_1 ON substring(venue_1.name, anon_1.venue_name) ORDER BY venue_1.name
Python Code:
from sqlalchemy import * from sqlalchemy.orm import * from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.sql import elements from sqlalchemy.ext.compiler import compiles from sqlalchemy.sql import operators class FunctionAsBinary(elements.BinaryExpression): def __init__(self, fn, left, right): super(FunctionAsBinary, self).__init__( left, right, operators.custom_op("fake", is_comparison=True)) self.fn = fn self.sql_function = fn(left, right) def _copy_internals(self, **kw): super(FunctionAsBinary, self)._copy_internals(**kw) self.sql_function = self.fn(self.left, self.right) @compiles(FunctionAsBinary) def _func_as_binary(elem, compiler, **kw): return compiler.process(elem.sql_function, **kw) Base = declarative_base() class Venue(Base): __tablename__ = 'venue' id = Column(Integer, primary_key=True, nullable=False) name = Column(String(254)) parents = relationship( "Venue", primaryjoin=FunctionAsBinary( func.substring, remote(foreign(name)), name ), viewonly=True, order_by=name ) s = Session() print "Query 1:" query = s.query(Venue).options(load_only("id")).options(joinedload(Venue.parents)).from_self() print query print "Query 2:" query = s.query(Venue).options(joinedload(Venue.parents)).from_self() print query
-
repo owner oh, "from_self()", that's important :)
-
repo owner That condition is not related here, I get that same mis-SQL not using any special features at all:
class Venue(Base): __tablename__ = 'venue' id = Column(Integer, primary_key=True, nullable=False) parent_id = Column(ForeignKey('venue.id')) parents = relationship( "Venue", remote_side=parent_id, ) s = Session() query = s.query(Venue).options(load_only("id")).options(joinedload(Venue.parents)).from_self() print query
and there's necessarily a "bug" there, if you want to do joinedload you can't limit the inner query to not include the columns you need:
s = Session() query = s.query(Venue).options(load_only("id", "parent_id")).options(joinedload(Venue.parents)).from_self() print query
In theory the load_only() would be "smart" like the joinedload() in some way but this is all totally outside of the func feature here.
-
reporter Ah, that makes so much sense now. I already ran into this before, but didn't connect the dots. The way we handle it currently is by force returning primary and foreign keys on the inner select. I obviously have to extend it to include any columns used in queried relationships.
Another positive side effect to this ticket is that I just finished the migration to 1.1.2. And nothing big broke. Very impressed!
Thank you again so much @zzzeek! Always extremely happy with your responses here!
-
reporter SQLAlchemy has some smarts here already:
The first query adds "other_id" in the inner select event though only the id is required. I'm assuming this is b/c it's marked as foreign key? Venue is added as a select table but then never used.
Not critical, but I'd love some insight what is happening here and if that could be generified. Or is it as simple as "foreign keys are always returned"?
SELECT anon_1.venue_id AS anon_1_venue_id, anon_1.venue_other_id AS anon_1_venue_other_id, venue.parent_id AS venue_parent_id, venue_1.id AS venue_1_id, venue_1.other_id AS venue_1_other_id FROM venue, (SELECT venue.id AS venue_id, venue.other_id AS venue_other_id FROM venue) AS anon_1 LEFT OUTER JOIN venue AS venue_1 ON anon_1.venue_other_id = venue_1.parent_id ----------- SELECT anon_1.venue_id AS anon_1_venue_id, anon_1.venue_other_id AS anon_1_venue_other_id, anon_1.venue_parent_id AS anon_1_venue_parent_id, venue_1.id AS venue_1_id, venue_1.other_id AS venue_1_other_id, venue_1.parent_id AS venue_1_parent_id FROM (SELECT venue.id AS venue_id, venue.other_id AS venue_other_id, venue.parent_id AS venue_parent_id FROM venue) AS anon_1 LEFT OUTER JOIN venue AS venue_1 ON anon_1.venue_other_id = venue_1.parent_id
Python Code
class Venue(Base): __tablename__ = 'venue' id = Column(Integer, primary_key=True, nullable=False) other_id = Column(Integer, primary_key=True, nullable=False) parent_id = Column(ForeignKey('venue.other_id')) parents = relationship("Venue", remote_side=parent_id) s = Session() query = s.query(Venue).options(load_only("id")).options(joinedload(Venue.parents).load_only("id")).from_self() print query print "-----------" query = s.query(Venue).options(joinedload(Venue.parents)).from_self() print query
-
reporter Could this be extended to force loading of
property.local_remote_pairs
for queried relationships? -
reporter I'm doing this now to force fetching of columns used in relationships and it works perfectly (so far):
@classmethod def expand_relationship_lookups(cls, fields): """ Expand references so that fields are fetched that are used for relationship lookups (e.g. http://tiny.cc/ccg6fy)""" result = [] for field in fields: if "." in field: class_ = cls chain = [] path = field.split(".") for attr in path[:-1]: prop = getattr(class_, attr).property for p in prop.local_remote_pairs: # force fetching of "both sides" if p[0].table == class_.__table__: result.append(".".join(chain + [p[0].name])) if p[1].table == prop.mapper.class_.__table__: result.append(".".join(chain + [attr, p[1].name])) chain.append(attr) class_ = prop.mapper.class_ result.append(field) return result
-
reporter I'm trying to use your second recipe now, and it works ok. However I can no longer filter by false to create an empty query. Any idea what is going on?
Error:
KeyError: <function istrue at 0x7febb8d4e2a8>
Code:
from sqlalchemy import * from sqlalchemy.orm import * from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.sql.compiler import SQLCompiler from sqlalchemy.ext.compiler import compiles from sqlalchemy.sql import operators, false # Patch compiler to allow for binary operators: http://tiny.cc/5ff6fy def patched_get_operator_dispatch(self, operator_, qualifier1, qualifier2): if hasattr(operator_, '_compiler_dispatch'): def disp(binary, operator_, **kw): return operator_._compiler_dispatch(self, binary=binary, **kw) return disp original_get_operator_dispatch(self, operator_, qualifier1, qualifier2) # noinspection PyProtectedMember original_get_operator_dispatch = SQLCompiler._get_operator_dispatch SQLCompiler._get_operator_dispatch = patched_get_operator_dispatch class FunctionOp(operators.custom_op): def __init__(self, sql_function): super(FunctionOp, self).__init__(None, is_comparison=True) self.sql_function = sql_function @compiles(FunctionOp) def _func_as_binary(elem, compiler, binary, **kw): return compiler.process(elem.sql_function(binary.left, binary.right), **kw) Base = declarative_base() class Venue(Base): __tablename__ = 'venue' id = Column(Integer, primary_key=True, nullable=False) name = Column(String(254)) parents = relationship( "Venue", primaryjoin=FunctionOp(func.substring)(remote(foreign(name)), name), viewonly=True, order_by=name ) s = Session() query = s.query(Venue).options(joinedload(Venue.parents)) print query print s.query(Venue).filter(false())
-
repo owner local_remote_pairs - Just add a column_property() with what you need?
2nd recipe - 1st recipe not working ? as Ive thought about this more the 1st one is closer to what I think I might do here. as far as how your monkeypatch might be failing the patch I made probably is missing something, running the test suite against my patch directly might show something.
-
reporter What I mean is this: Currently SQLAlchemy is "smart" enough to always load foreign keys and primary keys. I was wondering if we could make it even smarter to always load fields used in loaded relationships (the required information is available in local_remote_pairs).
Ah, from your response it sounded like the 2nd recipe was the desired way of doing this. The first recipe seems to be working fine now! Thank you again!
-
repo owner only the values in "local" for "local / remote pairs" would make any sense to be loaded locally and for the vast majority of relationships, those are just columns that are already loaded. As I said, for extremely rare / special circumstances where the "local" side is within some kind of function, use a column_property().
-
Account Deleted sqlalchemy.orm import * from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql.compiler import SQLCompiler from sqlalchemy.ext.compiler import compiles from sqlalchemy.sql import operators, false
Patch compiler to allow for binary operators: http://tiny.cc/5ff6fy
def patched_get_operator_dispatch(self, operator_, qualifier1, qualifier2): if hasattr(operator_, 'compiler_dispatch'): def disp(binary, operator, kw): return operator_._compiler_dispatch(self, binary=binary, kw) return disp original_get_operator_dispatch(self, operator_, qualifier1, qualifier2)
noinspection PyProtectedMember
original_get_operator_dispatch = SQLCompiler._get_operator_dispatch SQLCompiler._get_operator_dispatch = patched_get_operator_dispatch
class FunctionOp(operators.custom_op): def init(self, sql_function): super(FunctionOp, self).init(None, is_comparison=True) self.sql_function = sql_function
-
reporter I see, that makes sense.
IMPORTANT: For anyone reading this at a later point, consider using patch 1, not patch 2 as there are known issues with patch 2.
-
repo owner - changed milestone to 1.3
I'm assuming you have thing working w/ the recipes for now, I'm trying to expedite 1.2 along and this looks like a complicated feature add (but manageable as a recipe).
-
reporter Sounds good! We are using this successfully in production since my last comment.
- Log in to comment
here's a proof of concept. If you can run this through the paces that might make this easier to turn into a patch.