different results from sqlalchemy generated sql compared to result from orm - sqlalchemy version 1.0.6 and below

Issue #3477 closed
Michał Szczepański created an issue

When I'm trying to join two tables using String values that are duplicating in each column I get less rows then expected

The query from database returns 10 rows but filtered is set to True and after

        if filtered:
            rows = util.unique_list(rows, filter_fn)

in sqlalchemy.orm.loading.py one row is left.

I think it's a problem with hashing.

            def filter_fn(row):
                return tuple(fn(x) for x, fn in zip(row, filter_fns))

Exist with 1.0.6 and below.

Script with full example in attachment with workaround. Script create tables then execute bugged query and workaround query. You can vary result number from bugged query by changing result_rows to number from 1 to 10

s = Scaffold(db, result_rows=2)

Test script deletes the database file after run so it's always fresh start.

I tried it on 1.0.6 on debian and 0.9.8 on windows same results

Comments (4)

  1. Mike Bayer repo owner

    hi, thanks for reporting this! However, this behavior is works as expected. If we turn on echo=True we can see the result sets coming back. In the first case we have:

    #!
    
     Col ('a_id', 'a_join_1', 'a_date', 'i1_info', 'i2_info')
     Row (100, u'test19', u'2015-07-08 12:11:49.060064', u'aaa', u'bbb')
     Row (100, u'test19', u'2015-07-08 12:11:49.060064', u'aaa', u'bbb')
     Row (100, u'test19', u'2015-07-08 12:11:49.060064', u'aaa', u'bbb')
     Row (100, u'test19', u'2015-07-08 12:11:49.060064', u'aaa', u'bbb')
     Row (100, u'test19', u'2015-07-08 12:11:49.060064', u'aaa', u'bbb')
     Row (99, u'test18', u'2015-07-08 12:11:49.055378', u'aaa', u'bbb')
     Row (99, u'test18', u'2015-07-08 12:11:49.055378', u'aaa', u'bbb')
     Row (99, u'test18', u'2015-07-08 12:11:49.055378', u'aaa', u'bbb')
     Row (99, u'test18', u'2015-07-08 12:11:49.055378', u'aaa', u'bbb')
     Row (99, u'test18', u'2015-07-08 12:11:49.055378', u'aaa', u'bbb')
    

    where we are returning a tuple of A objects as well as two columns. The ORM query will always de-duplicate on entities (e.g. full A objects here) so that joined eager loading strategies work correctly. There are only two A identiies here, 100 and 99, and the additional columns being returned are identical, so we get two ORM rows back.

    In the second case we have:

    #!
    
     Col ('a_id', 'a_join_1', 'a_date', 'b_id', 'i1_info', 'i2_info')
     Row (100, u'test19', u'2015-07-08 12:12:35.768234', 20, u'aaa', u'bbb')
     Row (100, u'test19', u'2015-07-08 12:12:35.768234', 40, u'aaa', u'bbb')
     Row (100, u'test19', u'2015-07-08 12:12:35.768234', 60, u'aaa', u'bbb')
     Row (100, u'test19', u'2015-07-08 12:12:35.768234', 80, u'aaa', u'bbb')
     Row (100, u'test19', u'2015-07-08 12:12:35.768234', 100, u'aaa', u'bbb')
     Row (99, u'test18', u'2015-07-08 12:12:35.762430', 19, u'aaa', u'bbb')
     Row (99, u'test18', u'2015-07-08 12:12:35.762430', 39, u'aaa', u'bbb')
     Row (99, u'test18', u'2015-07-08 12:12:35.762430', 59, u'aaa', u'bbb')
     Row (99, u'test18', u'2015-07-08 12:12:35.762430', 79, u'aaa', u'bbb')
     Row (99, u'test18', u'2015-07-08 12:12:35.762430', 99, u'aaa', u'bbb')
    

    Here, we add "b_id" to the result where we have a unique value for every tuple. Therefore each tuple is unique even though there are ultimately only two A objects present (you get the same A object back in each tuple), so therefore ten rows back.

    this is all works as expected and SQLAlchemy has worked this way since its very first release (though we didn't support tuple results until around 0.4 or 0.5).

  2. Michał Szczepański reporter

    Thanks for this information. There is also one catch I want to mention. When You count() the query You get 10 instead of number of unique values returned by SQLAlchemy. So you will get ex 1 row in result and count() will return 10.

  3. Mike Bayer repo owner

    yup, count() is just doing "SELECT COUNT" and isn't loading the actual PK identities so that it would know such a thing. you should try to write SQL that isn't returning duplicate rows.

  4. Log in to comment