PickleType does not work with PostgreSQL

Issue #2016 resolved
Former user created an issue

Retrieving data from a PickleType column produces a cPickle.UnpicklingError with PostgreSQL

Instrumenting the sqlalchemy.types.PickleType class to print raw binary pickle data shows the data is corrupted somehow when it is read back out. Probably due to the implementation of the LargeBinary type for PostgreSQL.

Modifying PickleType by changing imp=sqlalchemy.types.Text and protocol=0 works fine, but is slower than a LargeBinary column with protocol=2.

Minimal test case is attached. It uses elixir, but is simple enough to get the point across. Even simple objects do not unpickle.

PostgreSQL: 9.0.0 SQLAlchemy: 0.6.5

Comments (7)

  1. Mike Bayer repo owner
    • assigned issue to
    • changed component to postgres

    This is possibly an issue with PG 9 and/or psycopg2's interaction with it, as we have a full series of unit tests for pickle with all backends and have never had any reported issues like this for our full six years of existence. Here is a non-elixir version of your test and the output I get using Postgresql 8.4.4:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.types import PickleType
    
    from sqlalchemy.ext.declarative import declarative_base
    
    Base = declarative_base()
    
    LOCAL_SERVER="postgresql://scott:tiger@localhost/test"
    
    class FooPickle(Base):
        __tablename__ = 'foo'
        id = Column(Integer, primary_key=True)
        f = Column(PickleType())
    
    e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
    Base.metadata.drop_all(e)
    Base.metadata.create_all(e)
    
    session = Session(e)
    data="foobar"
    foo=FooPickle(f=data)
    session.add(foo)
    session.commit()
    session.expunge_all()
    foo2=session.query(FooPickle).one()
    print("pickled value: ", foo2.f) #Raises unpickling error
    

    output:

    2011-01-06 21:09:41,638 INFO sqlalchemy.engine.base.Engine select version()
    2011-01-06 21:09:41,638 INFO sqlalchemy.engine.base.Engine {}
    2011-01-06 21:09:41,640 INFO sqlalchemy.engine.base.Engine select current_schema()
    2011-01-06 21:09:41,640 INFO sqlalchemy.engine.base.Engine {}
    2011-01-06 21:09:41,643 INFO sqlalchemy.engine.base.Engine select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where n.nspname=current_schema() and lower(relname)=%(name)s
    2011-01-06 21:09:41,643 INFO sqlalchemy.engine.base.Engine {'name': u'foo'}
    2011-01-06 21:09:41,688 INFO sqlalchemy.engine.base.Engine 
    DROP TABLE foo
    2011-01-06 21:09:41,689 INFO sqlalchemy.engine.base.Engine {}
    2011-01-06 21:09:41,745 INFO sqlalchemy.engine.base.Engine COMMIT
    2011-01-06 21:09:41,748 INFO sqlalchemy.engine.base.Engine select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where n.nspname=current_schema() and lower(relname)=%(name)s
    2011-01-06 21:09:41,748 INFO sqlalchemy.engine.base.Engine {'name': u'foo'}
    2011-01-06 21:09:41,790 INFO sqlalchemy.engine.base.Engine 
    CREATE TABLE foo (
        id SERIAL NOT NULL, 
        f BYTEA, 
        PRIMARY KEY (id)
    )
    
    
    2011-01-06 21:09:41,790 INFO sqlalchemy.engine.base.Engine {}
    2011-01-06 21:09:41,905 INFO sqlalchemy.engine.base.Engine COMMIT
    2011-01-06 21:09:41,908 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
    2011-01-06 21:09:41,908 INFO sqlalchemy.engine.base.Engine INSERT INTO foo (f) VALUES (%(f)s) RETURNING foo.id
    2011-01-06 21:09:41,908 INFO sqlalchemy.engine.base.Engine {'f': <psycopg2._psycopg.Binary object at 0x144ecd8>}
    2011-01-06 21:09:41,910 INFO sqlalchemy.engine.base.Engine COMMIT
    2011-01-06 21:09:41,911 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
    2011-01-06 21:09:41,912 INFO sqlalchemy.engine.base.Engine SELECT foo.id AS foo_id, foo.f AS foo_f 
    FROM foo
    2011-01-06 21:09:41,912 INFO sqlalchemy.engine.base.Engine {}
    ('pickled value: ', 'foobar')
    

    My suggestion would be to create a psycopg2-only test that serializes and deserializes a binary stream from pickle. Assuming that fails, this should be reported to psycopg2.

  2. jek

    No problems here with PG 9, might be a psycopg2 version issue.

    $ python -i test.py
    [...](...)
    2011-01-06 21:17:58,997 INFO sqlalchemy.engine.base.Engine.0x...ac90 {}
    ('pickled value: ', 'foobar')
    >>> e.dialect.server_version_info
    (9, 0)
    >>> import psycopg2
    >>> psycopg2.__version__
    '2.2.2 (dt dec ext pq3)'
    >>> import sqlalchemy
    >>> sqlalchemy.__version__
    '0.6.5'
    
  3. Former user Account Deleted

    Replying to jek:

    No problems here with PG 9, might be a psycopg2 version issue.

    Thanks jek and zzzeek for pointing me in the right direction. It was indeed a psycopg2 issue.

    A short explanation in case anyone else runs into this. PostgreSQL-9.0 has a new option for bytea types. They can be returned as 'hex' or 'escape'. The default in 9.0 is hex, but if your psycopg is built against an older libpq it will not be able to translate the 'hex' output. Solutions: (1) set: "bytea_output = 'escape'" in postgresql.conf or better yet (2) make sure your psycopg2 has been built against the libpq from 9.0

    You will know this is your problem if you get the following python exception: cPickle.UnpicklingError: invalid load key, 'x'.

  4. Log in to comment