Column property labels with classes using with_polymorphic

Issue #3148 resolved
Konsta Vesterinen created an issue

It seems labeling column_properties with classes using with_polymorphic='*' doesn't work. The following test case outlines the problem:

import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base


engine = sa.create_engine('sqlite:///:memory:')
Base = declarative_base()
Session = sessionmaker(bind=engine)
session = Session()


class TextItem(Base):
    __tablename__ = 'text_item'
    id = sa.Column(sa.Integer, primary_key=True)

    type = sa.Column(sa.Unicode(255))

    __mapper_args__ = {
        'polymorphic_on': type,
        'with_polymorphic': '*'
    }


class Article(TextItem):
    __tablename__ = 'article'
    id = sa.Column(
        sa.Integer, sa.ForeignKey(TextItem.id), primary_key=True
    )
    __mapper_args__ = {
        'polymorphic_identity': u'article'
    }


class Tag(Base):
    __tablename__ = 'tag'
    id = sa.Column(sa.Integer, primary_key=True)
    text_item_id = sa.Column(sa.Integer, sa.ForeignKey(TextItem.id))


TextItem.tag_count = sa.orm.column_property(
    sa.select(
        [
            sa.func.count('1')
        ],
    )
    .select_from(Tag.__table__)
    .where(Tag.__table__.c.text_item_id == Tag.__table__.c.id)
    .correlate(TextItem.__table__)
    .label('tag_count')
)


print session.query(TextItem) 

This gives me (notice the anon_1 alias for the labeled column property):

SELECT text_item.id AS text_item_id, text_item.type AS text_item_type, (SELECT count(:param_1) AS count_1
FROM tag
WHERE tag.text_item_id = tag.id) AS anon_1, article.id AS article_id
FROM text_item LEFT OUTER JOIN article ON text_item.id = article.id

The expected query would be

SELECT text_item.id AS text_item_id, text_item.type AS text_item_type, (SELECT count(:param_1) AS count_1
FROM tag
WHERE tag.text_item_id = tag.id) AS tag_count, article.id AS article_id
FROM text_item LEFT OUTER JOIN article ON text_item.id = article.id

Comments (18)

  1. Mike Bayer repo owner

    there's not really a reason to put a label on a column_property(). the column names that the ORM selects when you select an "entity" are kind of what they're going to be; after all what if you selected from TextItem twice, from a table and an alias? we can't just fall back on "tablename_columnname" because a column property doesnt really have a "tablename".

  2. Konsta Vesterinen reporter

    For my use case I'm labeling column_properties to be able to order the query by them. This works fine without with_polymorphic='*'.

    import sqlalchemy as sa
    from sqlalchemy.orm import sessionmaker
    from sqlalchemy.ext.declarative import declarative_base
    
    
    engine = sa.create_engine('sqlite:///:memory:')
    Base = declarative_base()
    Session = sessionmaker(bind=engine)
    session = Session()
    
    
    class Article(Base):
        __tablename__ = 'article'
        id = sa.Column(
            sa.Integer, primary_key=True
        )
    
    class Tag(Base):
        __tablename__ = 'tag'
        id = sa.Column(sa.Integer, primary_key=True)
        text_item_id = sa.Column(sa.Integer, sa.ForeignKey(Article.id))
    
    
    Article.tag_count = sa.orm.column_property(
        sa.select(
            [
                sa.func.count('1')
            ],
        )
        .select_from(Article.__table__)
        .where(Article.__table__.c.text_item_id == Tag.__table__.c.id)
        .correlate(Article.__table__)
        .label('tag_count')
    )
    
    
    print session.query(Article).order_by('tag_count') 
    

    This gives me the following SQL (works as expected):

    SELECT article.id AS article_id, (SELECT count(:param_1) AS count_1
    FROM tag
    WHERE article.id = tag.article_id) AS tag_count
    FROM article ORDER BY tag_count
    
  3. Mike Bayer repo owner

    there's some plans to improve this to some extent over at https://bitbucket.org/zzzeek/sqlalchemy/issue/2992/deprecate-direct-text-conversion-in but that applies more towards ad-hoc expressions with labels, which don't otherwise have an in-python name, e.g. select([expr.label('foo')]).order_by('foo'). But you don't need that here, you have order_by(TextItem.tag_count). It's much better to use that. Again, because it is unambiguous.

    if the .label() on the column_property is misleading, I'd rather have that emit a warning. It would be much more complicated to get the label() given there to be honored as a first class identifier name, but even then I can't see how it would work b.c. if two classes or one class + alias have the same label name, it'll break. The ORM never has naming collisions between entities right now because it always uses a generated name (just a name that happens to be predictable in the case of plain columns).

  4. Konsta Vesterinen reporter

    Thanks for the explanation. I agree its better to use TextItem.tag_count. However using it gives the following query:

    SELECT text_item.id AS text_item_id, text_item.type AS text_item_type, (SELECT count(:param_1) AS count_1
    FROM tag
    WHERE tag.text_item_id = tag.id) AS anon_1, article.id AS article_id
    FROM text_item LEFT OUTER JOIN article ON text_item.id = article.id ORDER BY (SELECT count(:param_2) AS count_2
    FROM tag
    WHERE tag.text_item_id = tag.id)
    

    The fact that it repeats the stuff in ORDER BY feels just wrong and awkward (I don't know if it's a performance issue though).

    Using the same syntax in the latter example gives

    SELECT article.id AS article_id, (SELECT count(:param_1) AS count_1
    FROM tag
    WHERE article.id = tag.article_id) AS tag_count
    FROM article ORDER BY tag_count
    
  5. Mike Bayer repo owner

    there was a long-ago issue where we tried improving on that, e.g. same expression in both places, use the label in order_by(), and it basically just failed. broken code went out, features had to be rolled back and new releases to undo the new feature, total crapshow, but we were a much less mature and capable system back then. But I think a huge part of the problem is that different databases are very picky about what expressions they will accept as labels in ORDER BY. Many will fail, especially Oracle, SQL Server. That's the real problem with making it automatic is that you never know what expression is suddenly going to fail for someone and now they have no way to undo it.

    I think if it were today we might do better, but due to the "works on X but not on Y" case I'm not in a hurry to go there :). But yes, the better way for it to work would be for TextItem.tag_count to still be the public API. The approach I used in #2992 could also provide clues for how to do it here.

    Maybe we can allow the user to produce a cue for us:

    expr = select(..).where(..).label('x', order_by=True)
    

    then an order_by(expr) or group_by(expr) will look it up as we are trying to do in #2992 and use that cue in the label() to do the right thing.

  6. Konsta Vesterinen reporter

    Thanks for clarifying this Mike! I fully agree with you. I marked the issue as minor for now, I can live with the produced double expression :)

  7. Mike Bayer repo owner

    Now that #2992 is done I can actually make your original use case work here without much difficulty, so we'll just go with that!

  8. Mike Bayer repo owner

    more complex than I hoped. wip in branch ticket_3148.

    things we need to test. there seems to be a bit of a regression in a case like below due to 0.9's "order by expression" logic - we can't actually order the below select by a1.b. we need to look into _order_by_clauselist some more and try to integrate it with the new order by system in #2992.

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    Base = declarative_base()
    
    class A(Base):
        __tablename__ = 'a'
    
        id = Column(Integer, primary_key=True)
        b = column_property(func.foo().label('bar'))
    
    a1 = aliased(A)
    e = create_engine("sqlite://", echo=True)
    
    sess = Session(e)
    
    print sess.query(A, a1).order_by(A.b)
    print sess.query(A, a1).order_by(a1.b)
    
  9. Mike Bayer repo owner

    e80c7cc5c103788a4c7e1c479af2c has the initial rev, the top test case here works with it:

    print session.query(TextItem).order_by("tag_count")
    

    however this does not:

    print session.query(TextItem).order_by(TextItem.tag_count)
    

    if we screw with _order_by_clauselist we can make it work, but this shouldn't be resorting to strings for matching.

  10. Mike Bayer repo owner

    Well, here's a better case. the func.foo() is getting all screwy when the prop is not labeled, and this needs to be nailed down:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    Base = declarative_base()
    
    class A(Base):
        __tablename__ = 'a'
    
        id = Column(Integer, primary_key=True)
        b = column_property(func.foo().label('bar'))
    
    class B(Base):
        __tablename__ = 'b'
    
        id = Column(Integer, primary_key=True)
        b = column_property(func.foo())
    
    a1 = aliased(A)
    b1 = aliased(B)
    
    e = create_engine("sqlite://", echo=True)
    
    sess = Session(e)
    
    print sess.query(A, a1).order_by(A.b)
    print sess.query(A, a1).order_by(a1.b)
    
    print sess.query(B, b1).order_by(B.b)
    print sess.query(B, b1).order_by(b1.b)
    
  11. Mike Bayer repo owner

    Given:

    TextItem.tag_count = sa.orm.column_property(
        sa.select([sa.func.count('1')],)
        .select_from(Tag)
        .where(Tag.text_item_id == TextItem.id)
        .label('tag_count')
    )
    

    with and without the .label()

    some tests:

    ta = aliased(TextItem)
    
    # emits warning
    print session.query(ta).order_by("tag_count")
    
    # emits warning
    print session.query(ta.tag_count).order_by("tag_count")
    
    # resolves
    print session.query(TextItem).order_by("tag_count")
    
    # resolves
    print session.query(TextItem.tag_count).order_by("tag_count")
    
    # resolves
    print session.query(TextItem).order_by(TextItem.tag_count)
    
    
    # resolves
    print session.query(ta).order_by(ta.tag_count)
    
    # resolves
    print session.query(TextItem.tag_count, ta.tag_count).order_by(TextItem.tag_count, ta.tag_count)
    
  12. Mike Bayer repo owner
    • enhance ClauseAdapter / ColumnAdapter to have new behaviors with labels. The "anonymize label" logic is now generalized to ClauseAdapter, and takes place when the anonymize_labels flag is sent, taking effect for all .columns lookups as well as within traverse() calls against the label directly.
    • traverse() will also memoize what it gets in columns, so that calling upon traverse() / .columns against the same Label will produce the same anonymized label. This is so that AliasedClass produces the same anonymized label when it is accessed per-column (e.g. SomeAlias.some_column) as well as when it is applied to a Query, and within column loader strategies (e.g. query(SomeAlias)); the former uses traverse() while the latter uses .columns
    • AliasedClass now calls onto ColumnAdapter
    • Query also makes sure to use that same ColumnAdapter from the AliasedClass in all cases
    • update the logic from 0.9 in #1068 to make use of the same _label_resolve_dict we use for #2992, simplifying how that works and adding support for new scenarios that were pretty broken (see #3148, #3188)

    → <<cset 7950270cf2b1>>

  13. Mike Bayer repo owner
    • rework the previous "order by" system in terms of the new one, unify everything.
    • create a new layer of separation between the "from order bys" and "column order bys", so that an OVER doesn't ORDER BY a label in the same columns clause
    • identify another issue with polymorphic for ref #3148, match on label keys rather than the objects

    → <<cset 7904ebc62e0a>>

  14. Log in to comment