mysql dialect discards cast to float, should probably emit a warning

Issue #3237 resolved
‮rekcäH nitraM‮ created an issue

While debugging an error stemming from the incompatibility of decimal.Decimal with certain numerical operations in python, we discovered that sqlalchemy actually returns decimal when computing the average of an integer column in mysql, even if we wrap the query in a cast to float.

Please see this example:

import sqlalchemy
print "sqlalchemy.__version__", sqlalchemy.__version__

engine = sqlalchemy.create_engine('mysql://yeepa_demo:yeepa_demo@localhost/yeepa_demo?charset=utf8', echo=True)
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
from sqlalchemy import Column, Integer, String
class Track(Base):
    __tablename__ = 'track'
    id = Column('idtrack', Integer, primary_key=True)
    score = Column('score', Integer, server_default="0", nullable=False)
    user_id = Column('userid', Integer, nullable=False)

from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
session = Session()
from sqlalchemy.sql.expression import cast
from sqlalchemy.types import Integer, Float
from sqlalchemy import func, desc, not_
# print session.query(cast(func.avg(func.coalesce(Track.score, 0)), Float).label('average_game_score'))
print session.query(Track.score, Track.user_id).limit(20).all()
print session.query(cast(func.avg(func.coalesce(Track.score, 0)), Float).label('average_game_score')).group_by(Track.user_id).all()

Which creates this output:

% ./sqlalchemy_test.py
sqlalchemy.__version__ 0.9.8
2014-10-28 22:44:36,051 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
2014-10-28 22:44:36,051 INFO sqlalchemy.engine.base.Engine ()
2014-10-28 22:44:36,053 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
2014-10-28 22:44:36,053 INFO sqlalchemy.engine.base.Engine ()
2014-10-28 22:44:36,053 INFO sqlalchemy.engine.base.Engine show collation where `Charset` = 'utf8' and `Collation` = 'utf8_bin'
2014-10-28 22:44:36,053 INFO sqlalchemy.engine.base.Engine ()
2014-10-28 22:44:36,055 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
2014-10-28 22:44:36,056 INFO sqlalchemy.engine.base.Engine ()
2014-10-28 22:44:36,056 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
2014-10-28 22:44:36,056 INFO sqlalchemy.engine.base.Engine ()
2014-10-28 22:44:36,057 INFO sqlalchemy.engine.base.Engine SELECT CAST('test collated returns' AS CHAR CHARACTER SET utf8) COLLATE utf8_bin AS anon_1
2014-10-28 22:44:36,057 INFO sqlalchemy.engine.base.Engine ()
2014-10-28 22:44:36,059 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2014-10-28 22:44:36,059 INFO sqlalchemy.engine.base.Engine SELECT track.score AS track_score, track.userid AS track_userid 
FROM track 
 LIMIT %s
2014-10-28 22:44:36,060 INFO sqlalchemy.engine.base.Engine (20,)
[(30L, 11L), (40L, 12L), (50L, 13L), (60L, 14L), (70L, 15L), (60L, 16L), (70L, 17L), (80L, 18L), (90L, 19L), (50L, 20L), (50L, 21L), (40L, 22L), (40L, 23L), (30L, 11L), (40L, 12L), (50L, 13L), (60L, 14L), (70L, 15L), (60L, 16L), (70L, 17L)]
2014-10-28 22:44:36,062 INFO sqlalchemy.engine.base.Engine SELECT avg(coalesce(track.score, %s)) AS average_game_score 
FROM track GROUP BY track.userid
2014-10-28 22:44:36,062 INFO sqlalchemy.engine.base.Engine (0,)
[(Decimal('222.4444'),), (Decimal('215.1481'),), (Decimal('23.6667'),), (Decimal('60.0000'),), (Decimal('70.0000'),), (Decimal('60.0000'),), (Decimal('70.0000'),), (Decimal('80.0000'),), (Decimal('90.0000'),), (Decimal('50.0000'),), (Decimal('50.0000'),), (Decimal('52.5000'),), (Decimal('45.0000'),), (Decimal('14.5000'),), (Decimal('0.0000'),), (Decimal('121.5000'),), (Decimal('42.0000'),), (Decimal('550.0000'),)]

This shows that the cast is actually discarded in sql (which makes some sense as mysql can't cast to float, but only to decimal - which doesn't make sense, but seems to be the case). But more importantly, there seems to be no better way to express what format the returned column should have. In normal columns I can tell SQLAlchemy that we want the type to be Float, even though the underlying type might be NUMERIC, and still sqlalchemy will convert the result to float for us. To my understanding because Float implies Float(asdecimal=false).

As a workaround we added this custom type:

import sqlalchemy.types as types
class MyFloat(types.TypeDecorator):

    impl = types.Float

    def process_bind_param(self, value, dialect):
        return value

    def process_result_value(self, value, dialect):
        return float(value)

    def copy(self):
        return MyFloat()

print session.query(cast(func.avg(func.coalesce(Track.score, 0)), MyFloat).label('average_game_score')).group_by(Track.user_id).all()

which creates the output

2014-10-28 22:44:36,067 INFO sqlalchemy.engine.base.Engine SELECT avg(coalesce(track.score, %s)) AS average_game_score 
FROM track GROUP BY track.userid
2014-10-28 22:44:36,067 INFO sqlalchemy.engine.base.Engine (0,)
[(222.4444,), (215.1481,), (23.6667,), (60.0,), (70.0,), (60.0,), (70.0,), (80.0,), (90.0,), (50.0,), (50.0,), (52.5,), (45.0,), (14.5,), (0.0,), (121.5,), (42.0,), (550.0,)]

Is it the intentional and expected behavior for a cast to float for the result of a column in a query that it is just discarded? If so, how do I annotate a query to tell sqlalchemy that I would please like to get the result in a specific type?

I think there is a good argument that the type Float(asdecimal=False) would behave more consistently if it would would allow to convert a column from a query via the cast operator as well as the column from a model. Or perhaps a different operator than cast should/could be chosen? Right now, the current behavior certainly surprised me.

Would you change sqlalchemys Float implementation to force the conversion to float in def process_result_value(...) or is there some other better way to achieve this?

Comments (17)

  1. Mike Bayer repo owner

    this is likely a MySQLdb issue - SQLAlchemy relies on the driver to handle that types are returned as expected unless the driver has no/terrible support. It would incur significant performance impact if we were to unconditionally apply float() to a value that in the vast majority of cases is already a float().

    Can you please add metadata.create_all() to your example as well as test data? I will then alter your test to illustrate that it is likely MySQLdb being inconsistent in this regard; consider trying out mysql-connector-python which is MySQL's official driver and may do a better job here (though it also may not, it is much newer and has had a lot of issues of its own).

    Note we have full round trip tests for floats which pass on all MySQL drivers: https://bitbucket.org/zzzeek/sqlalchemy/src/e1d1d999c9a688f4c8dbbe885438c63d6ef494c6/lib/sqlalchemy/testing/suite/test_types.py?at=master#cl-426

  2. ‮rekcäH nitraM‮ reporter

    I'm not sure I understand you correctly.

    Do you mean that this actually should work, i.e. sqlalchemy is giving the information that the result should be float to the mysql driver, even if it's not generating the cast to float as part of the generated sql?

    As for the test case, do you want me to just expand the test script so it can be run independently (against a local mysql instance)?

  3. Mike Bayer repo owner

    the MySQL driver, MySQLdb, is entirely responsible for receiving information about result value types over the socket. This will work if you use a raw MySQLdb connection. SQLAlchemy has nothing to do with it.

    for the test case, if I run this test, it fails - the tables don't exist, and if I add a create_all(), there's no data. So just something that I can type "python test.py" and it runs.

  4. ‮rekcäH nitraM‮ reporter

    Here's the requested full running script:

    #!/usr/bin/env python
    
    import sqlalchemy
    print "sqlalchemy.__version__:", sqlalchemy.__version__
    
    engine = sqlalchemy.create_engine('mysql://test:test@localhost/test?charset=utf8', echo=True)
    from sqlalchemy.ext.declarative import declarative_base
    Base = declarative_base()
    from sqlalchemy import Column, Integer, String
    class Track(Base):
        __tablename__ = 'track'
        id = Column(Integer, primary_key=True)
        score = Column(Integer, server_default="0", nullable=False)
        user_id = Column(Integer, nullable=False)
    
    from sqlalchemy.orm import sessionmaker
    Session = sessionmaker(bind=engine)
    session = Session()
    
    
    Base.metadata.bind = engine
    Base.metadata.create_all()
    
    Track.__table__.insert().execute([
        dict(score=23, user_id=23),
        dict(score=24, user_id=24),
        dict(score=25, user_id=25),
        dict(score=26, user_id=26),
        dict(score=27, user_id=27),
        dict(score=28, user_id=28),
    ])
    
    from sqlalchemy.sql.expression import cast
    from sqlalchemy.types import Integer, Float
    from sqlalchemy import func, desc, not_
    # print session.query(cast(func.avg(func.coalesce(Track.score, 0)), Float).label('average_game_score'))
    print session.query(Track.score, Track.user_id).limit(20).all()
    print session.query(cast(func.avg(func.coalesce(Track.score, 0)), Float).label('average_game_score')).group_by(Track.user_id).all()
    
    
    import sqlalchemy.types as types
    class MyFloat(types.TypeDecorator):
    
        impl = types.Float
    
        def process_bind_param(self, value, dialect):
            return value
    
        def process_result_value(self, value, dialect):
            return float(value)
    
        def copy(self):
            return MyFloat()
    
    print session.query(cast(func.avg(func.coalesce(Track.score, 0)), MyFloat).label('average_game_score')).group_by(Track.user_id).all()
    
  5. ‮rekcäH nitraM‮ reporter

    This outputs the following, where the last lines are the most interesting, as they show that without the custom float decorator the database returns decimals, not floats.

    sqlalchemy.__version__: 0.9.8
    2014-11-11 09:04:44,213 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
    2014-11-11 09:04:44,213 INFO sqlalchemy.engine.base.Engine ()
    2014-11-11 09:04:44,215 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
    2014-11-11 09:04:44,215 INFO sqlalchemy.engine.base.Engine ()
    2014-11-11 09:04:44,216 INFO sqlalchemy.engine.base.Engine show collation where `Charset` = 'utf8' and `Collation` = 'utf8_bin'
    2014-11-11 09:04:44,216 INFO sqlalchemy.engine.base.Engine ()
    2014-11-11 09:04:44,218 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
    2014-11-11 09:04:44,218 INFO sqlalchemy.engine.base.Engine ()
    2014-11-11 09:04:44,219 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
    2014-11-11 09:04:44,219 INFO sqlalchemy.engine.base.Engine ()
    2014-11-11 09:04:44,220 INFO sqlalchemy.engine.base.Engine SELECT CAST('test collated returns' AS CHAR CHARACTER SET utf8) COLLATE utf8_bin AS anon_1
    2014-11-11 09:04:44,220 INFO sqlalchemy.engine.base.Engine ()
    2014-11-11 09:04:44,221 INFO sqlalchemy.engine.base.Engine DESCRIBE `track`
    2014-11-11 09:04:44,221 INFO sqlalchemy.engine.base.Engine ()
    2014-11-11 09:04:44,224 INFO sqlalchemy.engine.base.Engine ROLLBACK
    2014-11-11 09:04:44,224 INFO sqlalchemy.engine.base.Engine 
    CREATE TABLE track (
        id INTEGER NOT NULL AUTO_INCREMENT, 
        score INTEGER DEFAULT '0' NOT NULL, 
        user_id INTEGER NOT NULL, 
        PRIMARY KEY (id)
    )
    
    
    2014-11-11 09:04:44,225 INFO sqlalchemy.engine.base.Engine ()
    2014-11-11 09:04:44,243 INFO sqlalchemy.engine.base.Engine COMMIT
    2014-11-11 09:04:44,245 INFO sqlalchemy.engine.base.Engine INSERT INTO track (score, user_id) VALUES (%s, %s)
    2014-11-11 09:04:44,245 INFO sqlalchemy.engine.base.Engine ((23, 23), (24, 24), (25, 25), (26, 26), (27, 27), (28, 28))
    2014-11-11 09:04:44,246 INFO sqlalchemy.engine.base.Engine COMMIT
    2014-11-11 09:04:44,253 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
    2014-11-11 09:04:44,264 INFO sqlalchemy.engine.base.Engine SELECT track.score AS track_score, track.user_id AS track_user_id 
    FROM track 
     LIMIT %s
    2014-11-11 09:04:44,264 INFO sqlalchemy.engine.base.Engine (20,)
    [(23L, 23L), (24L, 24L), (25L, 25L), (26L, 26L), (27L, 27L), (28L, 28L)]
    2014-11-11 09:04:44,267 INFO sqlalchemy.engine.base.Engine SELECT avg(coalesce(track.score, %s)) AS average_game_score 
    FROM track GROUP BY track.user_id
    2014-11-11 09:04:44,267 INFO sqlalchemy.engine.base.Engine (0,)
    [(Decimal('23.0000'),), (Decimal('24.0000'),), (Decimal('25.0000'),), (Decimal('26.0000'),), (Decimal('27.0000'),), (Decimal('28.0000'),)]
    2014-11-11 09:04:44,269 INFO sqlalchemy.engine.base.Engine SELECT avg(coalesce(track.score, %s)) AS average_game_score 
    FROM track GROUP BY track.user_id
    2014-11-11 09:04:44,269 INFO sqlalchemy.engine.base.Engine (0,)
    [(23.0,), (24.0,), (25.0,), (26.0,), (27.0,), (28.0,)]
    
  6. Mike Bayer repo owner

    this is simple. MySQL does not support CAST to float. If you look at your SQL, you'll see SQLAlchemy is omitting the CAST:

        SELECT avg(coalesce(track.score, %s)) AS average_game_score 
        FROM track GROUP BY track.user_id
    

    if I enable the MySQL dialect to render a traditional cast here, we get an error:

    (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'None) AS average_game_score \nFROM track GROUP BY track.user_id' at line 1") [SQL: 'SELECT CAST(avg(coalesce(track.score, %s)) AS None) AS average_game_score \nFROM track GROUP BY track.user_id'] 
    

    therefore instead of CAST here we want to just use a python-side type_coerce with asdecimal=False, sending Numeric which will convert:

    print session.query(type_coerce(func.avg(func.coalesce(Track.score, 0)), Numeric(asdecimal=False)).label('average_game_score')).group_by(Track.user_id).all()
    
  7. ‮rekcäH nitraM‮ reporter

    Hm, well I do understand that you do not want sqlalchemy to be a database abstraction layer and therefore having a separate cast expression for python side casting is available that has to be added for mysql, while another statement has to be added for other databases.

    Which means that if I want to have optimal performance on all dbs, I will have to construct my statements differently depending on which db I conned to. While I do not like it, I can understand that.

    At least though, sqlalchemy should emit a warning when dropping the cast statement directing the user to resolve to type_coerce instead as cast to the requested type is not available on this database, as this type of bug in an application is really a debugging nightmare.

  8. Mike Bayer repo owner

    I agree the CAST thing here is pretty un-ideal, and that was something we did years ago, today I would never have done that, I'd have it raise an error. I can live with making it warn perhaps. in your case though, I'd definitely at least do a custom Float type that does the float() coercion that you're looking for. However, if you're looking for "performance", unfortunately the Python Decimal type is a poor performer, and if your MySQL driver is creating these Decimals at first, your performance is suffering. So you'd need to wrestle with your driver and/or SQL to make it not create the Decimal at all, really.

    we'll do a warning, it's fine.

  9. ‮rekcäH nitraM‮ reporter

    Well, I would definitely love to get an error here - if you would consider that, that would make this kind of problem much more obvious!

  10. Mike Bayer repo owner

    unfortunately you can't really take a behavior that has passed without issue for ten years and suddenly make it an error condition, that would cause a considerable amount of upgrade pain and resentment.

  11. ‮rekcäH nitraM‮ reporter

    Is that something that you use a kind of deprecation cycle for, i.e. have this be a warning for a year / some releases and then turn it into an error?

  12. Mike Bayer repo owner

    if something is truly harmful we turn it into an error within the subsequent cycle. this one is not in that category. Users will see the warning, they'll get it. A lot of users may be OK with this behavior (seeing as it's never been brought up for 10 years this seems likely).

  13. Mike Bayer repo owner
    • A warning is emitted when :func:.cast is used with the MySQL dialect on a type where MySQL does not support CAST; MySQL only supports CAST on a subset of datatypes. SQLAlchemy has for a long time just omitted the CAST for unsupported types in the case of MySQL. While we don't want to change this now, we emit a warning to show that it's taken place. A warning is also emitted when a CAST is used with an older MySQL version (< 4) that doesn't support CAST at all, it's skipped in this case as well. fixes #3237

    → <<cset 1ad236127c06>>

  14. Log in to comment