fully self equated join condition?

Issue #3029 resolved
Mike Bayer repo owner created an issue
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Category(Base):
    __tablename__ = 'categories'

    id = Column(SmallInteger, primary_key=True)
    path = Column(String, unique=True, nullable=False)

    all_subcats = relationship('Category', viewonly=True,
                               primaryjoin=path == remote(foreign(path)).concat('%')
                               )

print Session().query(Category).options(joinedload("all_subcats"))

patch:

diff --git a/lib/sqlalchemy/orm/relationships.py b/lib/sqlalchemy/orm/relationships.py
index 311fba4..ed0a9a5 100644
--- a/lib/sqlalchemy/orm/relationships.py
+++ b/lib/sqlalchemy/orm/relationships.py
@@ -2405,6 +2405,10 @@ class JoinCondition(object):
                     self.direction = ONETOMANY
                 elif manytoone_local and not onetomany_local:
                     self.direction = MANYTOONE
+                elif self_equated.intersection(self.foreign_key_columns):
+                    self.direction = ONETOMANY
+                elif self_equated and not self_equated.intersection(self.foreign_key_columns):
+                    self.direction = MANYTOONE
                 else:
                     raise sa_exc.ArgumentError(
                         "Can't determine relationship"

Comments (7)

  1. Mike Bayer reporter
    • The "primaryjoin" model has been stretched a bit further to allow a join condition that is strictly from a single column to itself, translated through some kind of SQL function or expression. This is kind of experimental, but the first proof of concept is a "materialized path" join condition where a path string is compared to itself using "like". The :meth:.Operators.like operator has also been added to the list of valid operators to use in a primaryjoin condition. fixes #3029

    → <<cset 7303b59b00ef>>

  2. Tony Narlock

    @zzzeek : I'm excited to see this changeset. I've been waiting for this area to get attention for a while!

    If you haven't seen it yet https://sqlalchemy-orm-tree.readthedocs.org/ is something I've been playing with lately, it incorporates sqlamp and django treebeard's approach. (materialized path, nested sets)

    From a conversation I had on this, maaku (the creator of sqlalchemy-orm-tree) https://github.com/monetizeio/sqlalchemy-orm-tree/issues/4#issuecomment-38209589, he says the approach used in it is the most flexible/performant around at the moment, so there may be something of use in it.

  3. Mike Bayer reporter

    the materialized path approach here troubles me because we can only get whole sets of anscestors or descendant nodes. I don't see a simple way of getting at the immediate parent or child nodes unless we do a string split on the "/" on the SQL side and discard the last token. Is that something that's been worked out ?

  4. Former user Account Deleted

    Mike, materialized paths often use parts of fixed length (for example 2 characters/level) and maybe binary strings for more efficient encoding.

    Selecting children or parents to any given depth becomes a simple matter of constraining LENGTH(path). Path slicing (SUBSTR) on fixed item boundaries is also trivial.

    For human-readable variable item paths, a separate integer depth field can be stored.

  5. Tony Narlock

    @kvesteri: Good catch!

    @zzzeek : sqlalchemy-orm-tree covers it all because it implements elements of nested sets and adjacency lists.

    Nested sets are superior at solving the issue of retrieving only a limited number of ancestors and descendants.

    sqlalchemy-orm-tree's approach is kind of novel, it blends in a bit of nested sets, adjacent list and materialized path.

    For handling the ancestor, descendant issue, it uses nested sets ('left' and 'right')

    The 'id' represents a top-level node, this simplifies the query to find the root of a tree, and also allows for multiple trees to be kept in the same table.

    The 'depth' represents how deep in the path it exists.

    The issue I have with sqlalchemy-orm-tree is the code is a bit wordy to me, the upside is it handles everything. https://github.com/monetizeio/sqlalchemy-orm-tree/blob/master/sqlalchemy_tree/manager/class_.py

    Edit: corrections, sqlalchemy-orm-tree uses nested sets.

  6. Tony Narlock

    I made some edits to the above.

    Why not have the materialized path, but for performance and query simplicity, also have a parent_id and tree_id column?

    A path of 1/5/19 has a parent_id of 5 and a tree_id of 1. That offers a quicker way to do faster reads that would pave the way for joins / counts on parents and the root of the tree more easily, and be compatible across db's.

    If a depth was added, it would also be trivial to find lists of siblings, because WHERE tree_id and depth are the same level, sibling rows live.

    Materialized paths are 99% read-centric, the write and updates to nested data are going to be expensive and relatively complicated, but it's not meant to be done often.

  7. Log in to comment