numpy int64 garbled upon insert

Issue #3586 resolved
Former user created an issue

numpy.int64 is garbled upon insert. This may apply to other numpy types as well. Workaround is to cast with int before inserting.

Problem observed on latest commit (0847097c29ab48a5f111518e2c6ee324d5242057) and on v1.0.8.

Test case:

    from sqlalchemy import create_engine, MetaData, Column, Integer, Table
    import numpy as np
    engine = create_engine('sqlite:///test.db', echo=True)
    metadata = MetaData()
    test_table = Table('test', metadata,
        Column('id', Integer, primary_key=True),
        Column('number', Integer)
    )
    metadata.drop_all(engine)  # clean
    metadata.create_all(engine)
    engine.execute(test_table.insert(), {'number': np.int64(123)})
    result = engine.execute(test_table.select()).first()
    assert result[1] == 123, result

Comments (5)

  1. Former user Account Deleted reporter

    If only Python int is supported, why not add an assert isinstance(value, int)?

    So, what you're suggesting is to implement a custom MyInteger (with e.g. an assert) by subclassing TypeDecorator and use that instead of Integer in my tables, correct?

  2. Mike Bayer repo owner

    Well the value is passed to the underlying DBAPI which interprets it anyway it wants, so we usually don't get too in the way with that with the base types, but more importantly an isinstance() check is yet another performance expense. If someone wants to insert 50K rows of 30 ints each that would be 150K^H^H^H^H 1.5M more Python function calls and people really get annoyed at that (see http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly).

    So yes the TypeDecorator approach is most straightforward here, but also I'm surprised there aren't already tools for this job, I know that Pandas has a lot of SQLAlchemy integration maybe look into what they're doing.

  3. Former user Account Deleted reporter

    Thanks, I forgot pandas had those features. In fact, this problem originated from inserting data ~manually from a DataFrame (into 3 tables). So, I'll have a look.

    The performance argument is a good one, it's not like we have luxuries like static type checking in Python.

  4. Log in to comment