allow for func.xyz() to be treated as a "binary" in terms of ORM relationships

Issue #3831 new
Lukas Siemon created an issue

I'm trying to follow the documentation as described here: http://docs.sqlalchemy.org/en/latest/orm/join_conditions.html#non-relational-comparisons-materialized-path

This works as expected, however when I try to use (any) func function this fails. I noticed this feature is experimental, but is it possible to fix it for this use case?

Error:

ArgumentError: Relationship Venue.parents could not determine any unambiguous local/remote column pairs based on join condition and remote_side arguments. Consider using the remote() annotation to accurately mark those elements of the join condition that are on the remote side of the relationship.

The issue is also documented here: http://stackoverflow.com/questions/38006116/how-can-i-create-a-many-to-many-relationship-with-sqlalchemy-using-a-sql-functio

Minimal Test case:

import unittest
from sqlalchemy import (Column, Integer, String, func)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import (relationship, joinedload)
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from sqlalchemy.orm import foreign
from sqlalchemy.orm import remote

app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = ("sqlite://")
db = SQLAlchemy(app)

Base = declarative_base()


class Venue(db.Model):
    __tablename__ = 'venue'
    id = Column(Integer, primary_key=True, nullable=False)
    name = Column(String(254))

    parents = relationship(
        "Venue",
        # doesn't work
        primaryjoin=func.substring(remote(foreign(name)), name),
        # works
        # primaryjoin=remote(foreign(name)).like('%' + name + '%'),
        viewonly=True,
        order_by=remote(foreign(name))
    )

db.drop_all()
db.create_all()


class TestSelfJoin(unittest.TestCase):

    def test_self_join(self):
        query = Venue.query.options(joinedload(Venue.parents))

        import sqlalchemy.dialects.postgresql as postgresql
        print query.statement.compile(dialect=postgresql.dialect())

Official response

  • Mike Bayer repo owner

    here's a proof of concept. If you can run this through the paces that might make this easier to turn into a patch.

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    from sqlalchemy.sql import elements
    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy.sql import operators
    
    
    class FunctionAsBinary(elements.BinaryExpression):
        def __init__(self, fn, left, right):
            super(FunctionAsBinary, self).__init__(
                left, right, operators.custom_op("fake", is_comparison=True))
            self.fn = fn
            self.sql_function = fn(left, right)
    
        def _copy_internals(self, **kw):
            super(FunctionAsBinary, self)._copy_internals(**kw)
            self.sql_function = self.fn(self.left, self.right)
    
    
    @compiles(FunctionAsBinary)
    def _func_as_binary(elem, compiler, **kw):
        return compiler.process(elem.sql_function, **kw)
    
    Base = declarative_base()
    
    
    class Venue(Base):
        __tablename__ = 'venue'
        id = Column(Integer, primary_key=True, nullable=False)
        name = Column(String(254))
    
        parents = relationship(
            "Venue",
            primaryjoin=FunctionAsBinary(
                func.substring,
                remote(foreign(name)), name
            ),
            viewonly=True,
            order_by=name
        )
    
    s = Session()
    query = s.query(Venue).options(joinedload(Venue.parents))
    print query
    

Comments (31)

  1. Mike Bayer repo owner

    you can use a function in a primaryjoin, but the primaryjoin needs to have a comparison operator between the two columns to be compared to each other (or the same col in the self-referential case). The example on the SO issue at least uses a comparison function, but within SQLAlchemy this is still not of the form "operand operator operand". Currently the primaryjoin has to have an expression including that basic geometry. The enhancement here would require that either the func.foo() can be converted into a "binary" expression or it otherwise gives some clue that it has a "left" and "right" side which would then be consumable by the relationship mechanics.

    This can be made to work right now using a custom construct that provides for "binary" as the visitable name but still renders the SQL function at the compile level using @compiles. This would be hacky though.

    I can think of a few ways this would be made to work internally, one is to replace all the visit_binary() throughout the ORM with some more generalized "visit_left_and_right_comparison" type of thing that would be also added to the sql package, that would be a very big deal. Another might be to create some weird operator that embeds into BinaryExpression but generates as a function instead - there are some similar behaviors like that right now. In either case, there would need to be some method on a Function that produces this construct, e.g. func.my_whatever(x, y, z).compares(left, right).

  2. Mike Bayer repo owner

    here's a proof of concept. If you can run this through the paces that might make this easier to turn into a patch.

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    from sqlalchemy.sql import elements
    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy.sql import operators
    
    
    class FunctionAsBinary(elements.BinaryExpression):
        def __init__(self, fn, left, right):
            super(FunctionAsBinary, self).__init__(
                left, right, operators.custom_op("fake", is_comparison=True))
            self.fn = fn
            self.sql_function = fn(left, right)
    
        def _copy_internals(self, **kw):
            super(FunctionAsBinary, self)._copy_internals(**kw)
            self.sql_function = self.fn(self.left, self.right)
    
    
    @compiles(FunctionAsBinary)
    def _func_as_binary(elem, compiler, **kw):
        return compiler.process(elem.sql_function, **kw)
    
    Base = declarative_base()
    
    
    class Venue(Base):
        __tablename__ = 'venue'
        id = Column(Integer, primary_key=True, nullable=False)
        name = Column(String(254))
    
        parents = relationship(
            "Venue",
            primaryjoin=FunctionAsBinary(
                func.substring,
                remote(foreign(name)), name
            ),
            viewonly=True,
            order_by=name
        )
    
    s = Session()
    query = s.query(Venue).options(joinedload(Venue.parents))
    print query
    
  3. Lukas Siemon reporter

    Thank you very much for your quick reply @zzzeek!

    Real world use case: We have Region objects with column area (PostGIS polygon geometry) and we want to obtain all Region objects in relationship with containing area using ST_Contains.

    I'm not sure why remote(foreign(name)).like('%' + name + '%') works but remote(foreign(area)).ST_Contains(area) doesn't. According to what you wrote they should be treated similar? Or is there a specific exception for the like operator? I noticed that ilike also doesn't work.

    I've currently implemented this for our use case using column_property, nested selects and json serialization provided by PostgreSQL itself. However this isn't great since we rely on relationships for generating queries automatically against object hierarchies and for dynamic json serialization. Hence it would be desirable to get this to work as a relationship.

    Which direction do you think would be most promising to investigate for getting this to work (hacky is fine for now)?

  4. Mike Bayer repo owner

    POC #2

    start with allowing custom_op to work with @compiles, which I'd like to do anyway:

    diff --git a/lib/sqlalchemy/sql/compiler.py b/lib/sqlalchemy/sql/compiler.py
    index b8c897c..a9ac704 100644
    --- a/lib/sqlalchemy/sql/compiler.py
    +++ b/lib/sqlalchemy/sql/compiler.py
    @@ -957,6 +957,12 @@ class SQLCompiler(Compiled):
                 return text
    
         def _get_operator_dispatch(self, operator_, qualifier1, qualifier2):
    +        if hasattr(operator_, '_compiler_dispatch'):
    +            def disp(binary, operator_, **kw):
    +                return operator_._compiler_dispatch(
    +                    self, binary=binary, **kw)
    +            return disp
    +
             attrname = "visit_%s_%s%s" % (
                 operator_.__name__, qualifier1,
                 "_" + qualifier2 if qualifier2 else "")
    

    then do it all inside of an operator:

    class FunctionOp(operators.custom_op):
        def __init__(self, sql_function):
            super(FunctionOp, self).__init__(None, is_comparison=True)
            self.sql_function = sql_function
    
    
    @compiles(FunctionOp)
    def _func_as_binary(elem, compiler, binary, **kw):
        return compiler.process(elem.sql_function(binary.left, binary.right), **kw)
    
    Base = declarative_base()
    
    
    class Venue(Base):
        __tablename__ = 'venue'
        id = Column(Integer, primary_key=True, nullable=False)
        name = Column(String(254))
    
        parents = relationship(
            "Venue",
            primaryjoin=FunctionOp(func.substring)(remote(foreign(name)), name),
            viewonly=True,
            order_by=name
        )
    
  5. Lukas Siemon reporter

    This works. However there is an issue when the column is not actually returned from the object. Let me get a short example together here.

  6. Lukas Siemon reporter

    In your proof of concept replace the line

    query = s.query(Venue).options(joinedload(Venue.parents))

    with

    query = s.query(Venue).options(load_only("id")).options(joinedload(Venue.parents)).from_self()

    this then results in the sql

    SELECT
       anon_1.venue_id AS anon_1_venue_id,
       venue.name AS venue_name,
       venue_1.id AS venue_1_id,
       venue_1.name AS venue_1_name   
    FROM
       venue,
       (SELECT venue.id AS venue_id FROM venue) AS anon_1 
    LEFT OUTER JOIN
       venue AS venue_1 
       ON substring(venue_1.name, venue.name) 
    ORDER BY
       venue_1.name
    

    which is incorrect since a different venue table is introduced.

  7. Lukas Siemon reporter

    Alternative solution for reference only:

    For reference I'm posting a solution to this using column_property. For me this wasn't a great solution, but it might help others. The downside is that returned columns can not be defined dynamically:

    @event.listens_for(mapper, "mapper_configured")
    def mapper_listener(mapper, cls):
        # reference: https://groups.google.com/forum/#!topic/sqlalchemy/BYEj42-vEO8
        if not issubclass(cls, Market):
            return
    
        other = aliased(cls)
        order_by = func.ST_Area(other.area)
    
        parents = select([
            other.id.label("id"),
            other.name.label("name")
        ]).where(and_(
            # only include areas that contain this area
            func.ST_Contains(other.area, cls.area),
            # do not select the market itself
            cls.id != other.id
        )).correlate_except(other).alias("parents")
    
        cls.parents = column_property(
            select([
                func.array_agg(
                    func.row_to_json(text("parents")),
                    order_by=order_by
                )
            ]).select_from(
                # Sub-select is needed to display the column names correctly
                parents
            ),
            deferred=True
        )
    
  8. Mike Bayer repo owner

    @lukas-gitl try the second recipe I have with the custom operator. that is much more localized to just the operator and your second query seems to work. For the first example, there's some more methods that would need to be on FunctionAsBinary that would probably allow it to be interpreted correctly in that case.

    As I'm looking at the internals, for this feature to be worth it, it would have to not assume the SQL function is a simple fn(a, b). A SQL function might take any number of arguments but still be a comparison between any two arbitrary elements within its argument list. This sort of makes a real "feature" point more towards the custom BinaryExpression / FunctionElement subclass all over again. it would ideally look like:

        parents = relationship(
            "Venue",
            primaryjoin=func.substring(
                remote(foreign(name)), name).compares_between(0, 1),
            viewonly=True,
            order_by=name
        )
    

    the returned object would need to trip "visit_binary" within the context of relationship setup and "visit_function" within the context of SQL compilation. A new architectural feature to Visitable might be most appropriate here (new optional attribute compiler_visit_name most likely).

  9. Lukas Siemon reporter

    We are still running SQLAlchemy 1.0.X, so the patch will probably look slightly different. Looking into how much work it would be to upgrade to 1.1.X (probably a lot). Would a similar patch be possible for 1.0.X?

  10. Lukas Siemon reporter

    I was referring to the fact that def _get_operator_dispatch(self, operator_, qualifier1, qualifier2): is not present in Version 1.0.X. So I have no idea what I would have to change to get your second recipe working.

  11. Mike Bayer repo owner

    I'm trying the completely standalone recipe and not seeing the "anon" query you're referring towards. I can get that query if I add say a limit(1), but it doesn't have the extra "venue" token in it.

  12. Lukas Siemon reporter

    Weird that you don't get anon. I get the same query for both SQLAlchemy versions 1.0.12 and 1.1.2.

    Here is the full example:

    Query 1:

    SELECT
       anon_1.venue_id AS anon_1_venue_id,
       venue.name AS venue_name,
       venue_1.id AS venue_1_id,
       venue_1.name AS venue_1_name   
    FROM
       -- using venue in the on clause will return all existing venues in the relationship
       venue,
       (SELECT venue.id AS venue_id FROM venue) AS anon_1 
    LEFT OUTER JOIN
       venue AS venue_1 
       ON substring(venue_1.name, venue.name) 
    ORDER BY
       venue_1.name
    

    Query 2:

    SELECT
       anon_1.venue_id AS anon_1_venue_id,
       anon_1.venue_name AS anon_1_venue_name,
       venue_1.id AS venue_1_id,
       venue_1.name AS venue_1_name   
    FROM
       (SELECT venue.id AS venue_id, venue.name AS venue_name FROM venue) AS anon_1 
    LEFT OUTER JOIN
       venue AS venue_1 
       ON substring(venue_1.name, anon_1.venue_name) 
    ORDER BY
       venue_1.name
    

    Python Code:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    from sqlalchemy.sql import elements
    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy.sql import operators
    
    
    class FunctionAsBinary(elements.BinaryExpression):
        def __init__(self, fn, left, right):
            super(FunctionAsBinary, self).__init__(
                left, right, operators.custom_op("fake", is_comparison=True))
            self.fn = fn
            self.sql_function = fn(left, right)
    
        def _copy_internals(self, **kw):
            super(FunctionAsBinary, self)._copy_internals(**kw)
            self.sql_function = self.fn(self.left, self.right)
    
    
    @compiles(FunctionAsBinary)
    def _func_as_binary(elem, compiler, **kw):
        return compiler.process(elem.sql_function, **kw)
    
    Base = declarative_base()
    
    
    class Venue(Base):
        __tablename__ = 'venue'
        id = Column(Integer, primary_key=True, nullable=False)
        name = Column(String(254))
    
        parents = relationship(
            "Venue",
            primaryjoin=FunctionAsBinary(
                func.substring,
                remote(foreign(name)), name
            ),
            viewonly=True,
            order_by=name
        )
    
    s = Session()
    print "Query 1:"
    query = s.query(Venue).options(load_only("id")).options(joinedload(Venue.parents)).from_self()
    print query
    print "Query 2:"
    query = s.query(Venue).options(joinedload(Venue.parents)).from_self()
    print query
    
  13. Mike Bayer repo owner

    That condition is not related here, I get that same mis-SQL not using any special features at all:

    class Venue(Base):
        __tablename__ = 'venue'
        id = Column(Integer, primary_key=True, nullable=False)
        parent_id = Column(ForeignKey('venue.id'))
    
        parents = relationship(
            "Venue",
            remote_side=parent_id,
        )
    
    s = Session()
    query = s.query(Venue).options(load_only("id")).options(joinedload(Venue.parents)).from_self()
    print query
    

    and there's necessarily a "bug" there, if you want to do joinedload you can't limit the inner query to not include the columns you need:

    s = Session()
    query = s.query(Venue).options(load_only("id", "parent_id")).options(joinedload(Venue.parents)).from_self()
    print query
    

    In theory the load_only() would be "smart" like the joinedload() in some way but this is all totally outside of the func feature here.

  14. Lukas Siemon reporter

    Ah, that makes so much sense now. I already ran into this before, but didn't connect the dots. The way we handle it currently is by force returning primary and foreign keys on the inner select. I obviously have to extend it to include any columns used in queried relationships.

    Another positive side effect to this ticket is that I just finished the migration to 1.1.2. And nothing big broke. Very impressed!

    Thank you again so much @zzzeek! Always extremely happy with your responses here!

  15. Lukas Siemon reporter

    SQLAlchemy has some smarts here already:

    The first query adds "other_id" in the inner select event though only the id is required. I'm assuming this is b/c it's marked as foreign key? Venue is added as a select table but then never used.

    Not critical, but I'd love some insight what is happening here and if that could be generified. Or is it as simple as "foreign keys are always returned"?

    SELECT anon_1.venue_id AS anon_1_venue_id, anon_1.venue_other_id AS anon_1_venue_other_id, venue.parent_id AS venue_parent_id, venue_1.id AS venue_1_id, venue_1.other_id AS venue_1_other_id 
    FROM venue, (SELECT venue.id AS venue_id, venue.other_id AS venue_other_id 
    FROM venue) AS anon_1 LEFT OUTER JOIN venue AS venue_1 ON anon_1.venue_other_id = venue_1.parent_id
    -----------
    SELECT anon_1.venue_id AS anon_1_venue_id, anon_1.venue_other_id AS anon_1_venue_other_id, anon_1.venue_parent_id AS anon_1_venue_parent_id, venue_1.id AS venue_1_id, venue_1.other_id AS venue_1_other_id, venue_1.parent_id AS venue_1_parent_id 
    FROM (SELECT venue.id AS venue_id, venue.other_id AS venue_other_id, venue.parent_id AS venue_parent_id 
    FROM venue) AS anon_1 LEFT OUTER JOIN venue AS venue_1 ON anon_1.venue_other_id = venue_1.parent_id
    

    Python Code

    class Venue(Base):
        __tablename__ = 'venue'
        id = Column(Integer, primary_key=True, nullable=False)
        other_id = Column(Integer, primary_key=True, nullable=False)
        parent_id = Column(ForeignKey('venue.other_id'))
    
        parents = relationship("Venue", remote_side=parent_id)
    
    s = Session()
    query = s.query(Venue).options(load_only("id")).options(joinedload(Venue.parents).load_only("id")).from_self()
    print query
    print "-----------"
    query = s.query(Venue).options(joinedload(Venue.parents)).from_self()
    print query
    
  16. Lukas Siemon reporter

    Could this be extended to force loading of property.local_remote_pairs for queried relationships?

  17. Lukas Siemon reporter

    I'm doing this now to force fetching of columns used in relationships and it works perfectly (so far):

        @classmethod
        def expand_relationship_lookups(cls, fields):
            """ Expand references so that fields are fetched that are used for
            relationship lookups (e.g. http://tiny.cc/ccg6fy)"""
            result = []
            for field in fields:
                if "." in field:
                    class_ = cls
                    chain = []
                    path = field.split(".")
                    for attr in path[:-1]:
                        prop = getattr(class_, attr).property
                        for p in prop.local_remote_pairs:
                            # force fetching of "both sides"
                            if p[0].table == class_.__table__:
                                result.append(".".join(chain + [p[0].name]))
                            if p[1].table == prop.mapper.class_.__table__:
                                result.append(".".join(chain + [attr, p[1].name]))
                        chain.append(attr)
                        class_ = prop.mapper.class_
                result.append(field)
            return result
    
  18. Lukas Siemon reporter

    I'm trying to use your second recipe now, and it works ok. However I can no longer filter by false to create an empty query. Any idea what is going on?

    Error:

    KeyError: <function istrue at 0x7febb8d4e2a8>
    

    Code:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    from sqlalchemy.sql.compiler import SQLCompiler
    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy.sql import operators, false
    
    
    # Patch compiler to allow for binary operators: http://tiny.cc/5ff6fy
    def patched_get_operator_dispatch(self, operator_, qualifier1, qualifier2):
        if hasattr(operator_, '_compiler_dispatch'):
            def disp(binary, operator_, **kw):
                return operator_._compiler_dispatch(self, binary=binary, **kw)
            return disp
        original_get_operator_dispatch(self, operator_, qualifier1, qualifier2)
    # noinspection PyProtectedMember
    original_get_operator_dispatch = SQLCompiler._get_operator_dispatch
    SQLCompiler._get_operator_dispatch = patched_get_operator_dispatch
    
    
    class FunctionOp(operators.custom_op):
        def __init__(self, sql_function):
            super(FunctionOp, self).__init__(None, is_comparison=True)
            self.sql_function = sql_function
    
    
    @compiles(FunctionOp)
    def _func_as_binary(elem, compiler, binary, **kw):
        return compiler.process(elem.sql_function(binary.left, binary.right), **kw)
    
    Base = declarative_base()
    
    
    class Venue(Base):
        __tablename__ = 'venue'
        id = Column(Integer, primary_key=True, nullable=False)
        name = Column(String(254))
    
        parents = relationship(
            "Venue",
            primaryjoin=FunctionOp(func.substring)(remote(foreign(name)), name),
            viewonly=True,
            order_by=name
        )
    
    
    s = Session()
    query = s.query(Venue).options(joinedload(Venue.parents))
    print query
    
    print s.query(Venue).filter(false())
    
  19. Mike Bayer repo owner

    local_remote_pairs - Just add a column_property() with what you need?

    2nd recipe - 1st recipe not working ? as Ive thought about this more the 1st one is closer to what I think I might do here. as far as how your monkeypatch might be failing the patch I made probably is missing something, running the test suite against my patch directly might show something.

  20. Lukas Siemon reporter

    What I mean is this: Currently SQLAlchemy is "smart" enough to always load foreign keys and primary keys. I was wondering if we could make it even smarter to always load fields used in loaded relationships (the required information is available in local_remote_pairs).

    Ah, from your response it sounded like the 2nd recipe was the desired way of doing this. The first recipe seems to be working fine now! Thank you again!

  21. Mike Bayer repo owner

    only the values in "local" for "local / remote pairs" would make any sense to be loaded locally and for the vast majority of relationships, those are just columns that are already loaded. As I said, for extremely rare / special circumstances where the "local" side is within some kind of function, use a column_property().

  22. Former user Account Deleted

    sqlalchemy.orm import * from sqlalchemy.ext.declarative import declarative_base

    from sqlalchemy.sql.compiler import SQLCompiler from sqlalchemy.ext.compiler import compiles from sqlalchemy.sql import operators, false

    Patch compiler to allow for binary operators: http://tiny.cc/5ff6fy

    def patched_get_operator_dispatch(self, operator_, qualifier1, qualifier2): if hasattr(operator_, 'compiler_dispatch'): def disp(binary, operator, kw): return operator_._compiler_dispatch(self, binary=binary, kw) return disp original_get_operator_dispatch(self, operator_, qualifier1, qualifier2)

    noinspection PyProtectedMember

    original_get_operator_dispatch = SQLCompiler._get_operator_dispatch SQLCompiler._get_operator_dispatch = patched_get_operator_dispatch

    class FunctionOp(operators.custom_op): def init(self, sql_function): super(FunctionOp, self).init(None, is_comparison=True) self.sql_function = sql_function

  23. Lukas Siemon reporter

    I see, that makes sense.

    IMPORTANT: For anyone reading this at a later point, consider using patch 1, not patch 2 as there are known issues with patch 2.

  24. Mike Bayer repo owner
    • changed milestone to 1.3

    I'm assuming you have thing working w/ the recipes for now, I'm trying to expedite 1.2 along and this looks like a complicated feature add (but manageable as a recipe).

  25. Log in to comment