Postgres JSON Array access fails on postgresql+pg8000

Issue #3995 wontfix
Patrick Hayes created an issue

I've observed that the [] operator to access array elements of JSON/JSONB columns with postgresql+pg8000 returns incorrect values

Here is a minimal example that reproduces the behaviour, it should be executable as long as you have a postgres server running on port 5432 with username username, password password, and a database testdb

from sqlalchemy import BigInteger, Column, create_engine
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.engine.url import URL
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()
class Model(Base):
  __tablename__ = 'test'
  id = Column(BigInteger, primary_key=True)
  data = Column(JSONB, name='data_json', nullable=False)

def run_with_scheme(scheme):
  db_url = URL(
    scheme,
    username='username',
    password='password',
    host='localhost',
    port=5432,
    database='testdb',
  )
  engine = create_engine(db_url)
  Base.metadata.drop_all(engine)
  Base.metadata.create_all(engine)
  session_factory = sessionmaker(bind=engine)
  session = session_factory()
  session.add(Model(data=[1]))
  return session.query(Model.data[0]).all()

print run_with_scheme('postgresql')   # Returns [(1,)]
print run_with_scheme('postgresql+pg8000')  # Returns [(None,)]

Is this a sqlalchemy issue or a pg8000 issue? If this operator is not supported on pg8000 then perhaps it should be forbidden in the sqlalchemy layer, as opposed to silently returning the wrong results?

Comments (15)

  1. Patrick Hayes reporter

    Some notes:

    • This is the simplest reproducing case, as far as I can tell the error occurs anytime you are accessing an array element anywhere in the query
    • Using the [] operator for object key access works fine
    • Occurs on both JSON and JSONB column types
  2. Mike Bayer repo owner

    it's a known limitation that pg8000 does not support the ARRAY datatype nor does it support JSONB. While I haven't followed pg8000's development for many years, I'm not sure that the project has tried to address these issues, or if there are workarounds that would allow these to work. Overall, pg8000 is not nearly as performant as psycopg2 nor does it have widespread use so while I've spent a lot of time in the past helping pg8000 to get up to speed, it's not worth the resources right now.

    We tend not to "forbid" things from working artificially since we support users using newer versions of software as well as workarounds in order to get things to work. For example, some DBAPIs allow type handlers to be injected that would operate underneath the layer of SQLAlchemy.

    Some notes under the pg8000 docs that ARRAYs aren't supported right now are the most I can do myself here, the rest woudl be "wontfix" for me unless someone wants to experiment and possibly interact with pg8000 devs to get these cases working.

  3. Patrick Hayes reporter

    The issue may be related only to integer literals - if I use a column of type integer, then it works as expected. If I change the query to

      return session.query(Model.data[Model.id.cast(sqlalchemy.types.Integer)]).all()
    

    then both drivers return the correct result. Here is the complete executable example

    import sqlalchemy
    from sqlalchemy import BigInteger, Column, create_engine
    from sqlalchemy.dialects.postgresql import *
    from sqlalchemy.engine.url import URL
    from sqlalchemy.orm import sessionmaker
    from sqlalchemy.ext.declarative import declarative_base
    
    Base = declarative_base()
    class Model(Base):
      __tablename__ = 'test'
      id = Column(BigInteger, primary_key=True)
      data = Column(JSON, name='data_json', nullable=False)
    
    def run_with_scheme(scheme):
      db_url = URL(
        scheme,
        username='produser',
        password='produser_password_for_development',
        host='localhost',
        port=5432,
        database='testdb',
      )
      engine = create_engine(db_url)
      Base.metadata.drop_all(engine)
      Base.metadata.create_all(engine)
      session_factory = sessionmaker(bind=engine)
      session = session_factory()
      session.add(Model(id=0, data=[1]))
      return session.query(Model.data[Model.id.cast(sqlalchemy.types.Integer)]).all()
    
    print run_with_scheme('postgresql')   # Returns [(1,)]
    print run_with_scheme('postgresql+pg8000')  # Returns [(1,)]
    
  4. Patrick Hayes reporter

    Upon further investigation just chaning the 0 to a cast(0, sqlalchemy.types.Integer) fixes the issue - can we update the pg8000 driver to do this automatically? I'm happy to put together a PR for this

  5. Mike Bayer repo owner

    PRs welcome but you'd need to get the whole expanse of ARRAY tests working for pg8000 for this to be worth it.

    Here's the start:

    diff --git a/lib/sqlalchemy/ext/baked.py b/lib/sqlalchemy/ext/baked.py
    index ba3c2ae..1833629 100644
    --- a/lib/sqlalchemy/ext/baked.py
    +++ b/lib/sqlalchemy/ext/baked.py
    @@ -175,6 +175,7 @@ class BakedQuery(object):
                     cache_key = opt._generate_cache_key(effective_path)
                     if cache_key is False:
                         self.spoil()
    +                    break
                     elif cache_key is not None:
                         key += cache_key
    
    diff --git a/test/dialect/postgresql/test_types.py b/test/dialect/postgresql/test_types.py
    index d2e19a0..f241bfa 100644
    --- a/test/dialect/postgresql/test_types.py
    +++ b/test/dialect/postgresql/test_types.py
    @@ -1011,7 +1011,7 @@ class ArrayRoundTripTest(object):
    
         __only_on__ = 'postgresql'
         __backend__ = True
    -    __unsupported_on__ = 'postgresql+pg8000', 'postgresql+zxjdbc'
    +    __unsupported_on__ = 'postgresql+zxjdbc'
    
         ARRAY = postgresql.ARRAY
    
    diff --git a/test/requirements.py b/test/requirements.py
    index 0f854d2..9d51e68 100644
    --- a/test/requirements.py
    +++ b/test/requirements.py
    @@ -599,7 +599,7 @@ class DefaultRequirements(SuiteRequirements):
         def array_type(self):
             return only_on([
                 lambda config: against(config, "postgresql") and
    -            not against(config, "+pg8000") and not against(config, "+zxjdbc")
    +            not against(config, "+zxjdbc")
             ])
    
         @property
    @@ -612,7 +612,7 @@ class DefaultRequirements(SuiteRequirements):
    
         @property
         def json_array_indexes(self):
    -        return self.json_type + fails_if("+pg8000")
    +        return self.json_type
    
         @property
         def datetime_literals(self):
    @@ -822,11 +822,7 @@ class DefaultRequirements(SuiteRequirements):
    
         @property
         def postgresql_jsonb(self):
    -        return only_on("postgresql >= 9.4") + skip_if(
    -            lambda config:
    -            config.db.dialect.driver == "pg8000" and
    -            config.db.dialect._dbapi_version <= (1, 10, 1)
    -        )
    +        return only_on("postgresql >= 9.4")
    
         @property
         def psycopg2_native_json(self):
    

    then run:

    py.test test/dialect/postgresql/ --dburi "postgresql+pg8000://user:pass@host/dbname"

    major problem here is that dialect datatypes have limited ability to add SQL functions to the SQL rendered around datatypes at the dialect level; #3981 would need to be implemented for this to be reasonably possible.

  6. Patrick Hayes reporter

    Thanks for your reply. I understand that pg8000 is not a priority - however in this case my query was silently returning incorrect results (nulls) and that is IMO a pretty bad failure scenario (that may be silently impacting others).

    Is it straightforward to add an error/warning inside lib/sqlalchemy/dialects/postgresql/pg8000.py when the __getitem__ key is an integer literal? Then the developer will know they need to add the cast themselves. I am happy to add this in but only if the project is open to such a change

  7. Patrick Hayes reporter

    I see - I was composing my reply in parallel with yours. I understand that the integer is just a subset of the problems. If there is anyway to inform the developer that their query may be silently returning invalid results whenever they are using arrays with pg8000 I think that would help others (it certainly would have helped me)

  8. Mike Bayer repo owner

    at the level of the getitem() operator, you can override that yourself to provide a cast() as an immediate workaround (http://docs.sqlalchemy.org/en/latest/core/custom_types.html#redefining-and-creating-new-operators).

    at the dialect level, it looks like there are "getitem" hooks in the compiler right now, search for "visit_getitem_binary". The JSON type has a special wrapper for an "index" called JSONIndex, so in theory there'd be an ARRAYIndex type to act as a type-level hook, but that's the approach that would need the #3981 enhancement.

    I'd vastly prefer if pg8000 could just fix their parsing here, however.

  9. Mike Bayer repo owner

    pg8000 is the library that's silently returning bad results. Why not post a bug there?

  10. Mike Bayer repo owner

    so here, while there are probably ways to work around these issues on the SQLAlchemy end, I don't have intention to put time into it for these reasons: 1. pg8000 is not a widely used driver and has a relatively low level of support compared to psycopg2 2. psycopg2 is by far the de-facto standard in the Python postgresql space and supports all these use cases without issue

    "wontfix" here is not to say that I wouldn't consider pull requests to get ARRAY support working on pg8000 without corresponding upstream changes in the driver, but these would be relatively low priority. IMO production-quality support is available w/ the psycopg2 driver.

  11. Log in to comment