Sqlalchemy 1.1 doesn't remove duplicate rows in some cases while 1.0 does

Issue #3958 resolved
Alexander Koval created an issue

After the commit next test case begun to fail:

class FilterUniqueJoinTest(fixtures.MappedTest, AssertsCompiledSQL):
    run_setup_mappers = 'once'

    @classmethod
    def define_tables(cls, metadata):
        Table('users', metadata,
              Column('id', Integer, primary_key=True,
                     test_needs_autoincrement=True),
              Column('name', String(50)),
              Column('date_created', DateTime, default=datetime.datetime.now))

        Table('orders', metadata,
              Column('id', Integer, primary_key=True,
                     test_needs_autoincrement=True),
              Column('user_id', Integer, ForeignKey('users.id')))

    @classmethod
    def setup_classes(cls):
        users = cls.tables.users
        orders = cls.tables.orders

        class User(cls.Comparable):
            pass

        class Order(cls.Comparable):
            pass

        mapper(User, users, properties={
            'orders': relationship(Order, order_by=users.c.id.desc())
        })

        mapper(Order, orders)

    @classmethod
    def insert_data(cls):
        User = cls.classes.User
        Order = cls.classes.Order

        sess = create_session()
        o1 = Order()
        o2 = Order()
        o3 = Order()

        u1 = User(name='Mike')
        u1.orders = [o1, o2]
        u2 = User(name='John')
        u2.orders = [o3]

        sess.add(u1)
        sess.add(u2)
        sess.flush()
        sess.close()

    def test_filter_unique_column(self):
        User = self.classes.User

        sess = create_session()

        users = (
            sess.query(
                User.date_created.label('dt'),
                User,
            )
            .options(joinedload(User.orders))
            .all()
        )
        assert len(users) == 2

    def test_filter_unique_func_column(self):
        User = self.classes.User

        sess = create_session()

        users = (
            sess.query(
                func.date(User.date_created, 'start of month').label('month'),
                User,
            )
            .options(joinedload(User.orders))
            .all()
        )
        assert len(users) == 2

Sqlalchemy 1.0:

test/orm/test_joins.py::FilterUniqueJoinTest::test_filter_unique_column PASSED
test/orm/test_joins.py::FilterUniqueJoinTest::test_filter_unique_func_column PASSED

Sqlalchemy 1.1:

test/orm/test_joins.py::FilterUniqueJoinTest::test_filter_unique_column PASSED
test/orm/test_joins.py::FilterUniqueJoinTest::test_filter_unique_func_column FAILED

Is it expected behaviour?

Comments (4)

  1. Mike Bayer repo owner

    this is expected, however the change notes could use some more examples.

    First off, while I appreciate the clear test case and the effort put into it, I can't run it because I don't have time to find what specific .py file that test happens to work in, and also func.date is not a standard SQL function (and also I can't easily switch branches when the test is given as a patch). So here's the case distilled:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    import datetime
    
    Base = declarative_base()
    
    
    class A(Base):
        __tablename__ = 'a'
        id = Column(Integer, primary_key=True)
        some_thing = Column(String)
        bs = relationship("B")
    
    
    class B(Base):
        __tablename__ = 'b'
        id = Column(Integer, primary_key=True)
        a_id = Column(ForeignKey('a.id'))
    
    e = create_engine("sqlite://", echo=True)
    Base.metadata.create_all(e)
    
    s = Session(e)
    
    s.add_all([
        A(some_thing="ABC_XYZ", bs=[B(), B(), B()]),
        A(some_thing="ABC_PQR", bs=[B(), B(), B()])
    ])
    
    s.commit()
    
    for row in s.query(func.substr(A.some_thing, 0, 4), A).options(joinedload(A.bs)):
        print row
    

    So here's the important part, which you'll see for yours also:

    >>> from sqlalchemy import String, DateTime, Column, func
    >>> c1 = Column('c1', String)
    >>> c2 = Column('c2', DateTime)
    >>> c1.type.hashable
    True
    >>> c2.type.hashable
    True
    >>> func.substr(c1, 0, 3).type
    NullType()
    >>> func.substr(c1, 0, 3).type.hashable
    False
    >>> func.date(c2, 'foobar').type
    NullType()
    >>> func.date(c2, 'foobar').type.hashable
    False
    

    What are we seeing? 1. that func.xyz() by default returns NullType, because func. doesn't know what substr() or date() is. This is not new. But also 2. NullType is not hashable. This is the change described at http://docs.sqlalchemy.org/en/rel_1_1/changelog/migration_11.html#changes-regarding-unhashable-types.

    If you give a type to the expression, then you're OK:

    for row in s.query(func.substr(A.some_thing, 0, 4, type_=String), A).options(joinedload(A.bs)):
        print row
    

    I'll try to add an example to the migration notes.

  2. Log in to comment