invalid input syntax with on_conflict_do_update and JSONB

Issue #3888 resolved
Samuel Colvin created an issue

Using postgres 9.5, python 3.6 and aiopg with SQLAlchemy==1.1.4

I've looked and couldn't find another issue pertaining to this.

I have a table with a JSONB field, I'm trying to run

from sqlalchemy.dialects.postgresql import insert as pg_insert

...
id = data.pop('id')
await conn.execute(
    pg_insert(sa_contractors)
    .values(id=id, **data)
    .on_conflict_do_update(
        index_elements=[sa_contractors.c.id],
        set_=data
    )
)

and getting the exception:

...
  File "/home/samuel/code/socket-server/env/lib/python3.6/site-packages/aiopg/connection.py", line 134, in _ready
    state = self._conn.poll()
psycopg2.DataError: invalid input syntax for integer: "Bio"
LINE 1: ...ARRAY['name', 'sort_index', 'value', 'id'], ARRAY['Bio', 0, ...

Is it possible that sqlalchemy is forming the query wrongly for the update of the json field like this SO question?

The project is open source so I can give links to code if that helps.

Comments (11)

  1. Mike Bayer repo owner

    Hi there -

    Thanks for reporting this issue. However, it is non-actionable for two reasons. The first, more minor reason is: "The project is open source so I can give links to code if that helps." Unfortunately we cannot act on bug reports that don't supply an mcve as we don't have the resources to debug other people's projects. But much more critically is that the aiopg library entirely replaces the majority of SQLAlchemy statement execution internals with its own homegrown approaches, which we obviously cannot support. The handling of special datatypes is deeply linked into the statement execution internals.

    This is not all to say there is not a bug in SQLAlchemy, however this would need to be demonstated against SQLAlchemy directly without replacing the entire Connection/Engine execution system with that of another library (e.g. a single standalone script that does not import anything outside of sqlalchemy modules itself).

  2. Samuel Colvin reporter

    Hi Michael, I'd suggest it's rather abrupt of you to mark this as "invalid" without giving me the opportunity to provide the extra proof you needed. Surely you want people to report bugs?

    Complete working example with just sqlalachemy:

    #!/usr/bin/env python3.6
    import os
    
    import psycopg2
    from sqlalchemy import Column, Integer, String, create_engine, select
    from sqlalchemy.dialects.postgresql import JSONB
    from sqlalchemy.dialects.postgresql import insert as pg_insert
    from sqlalchemy.engine.url import URL
    from sqlalchemy.ext.declarative import declarative_base
    
    db = dict(
        password=os.getenv('POSTGRESQL_PASSWORD'),
        host='localhost',
        port='5432',
        user='postgres',
    )
    
    conn = psycopg2.connect(**db)
    
    DB_NAME = 'jsonb_upsert_test'
    conn.autocommit = True
    cur = conn.cursor()
    cur.execute('DROP DATABASE IF EXISTS {}'.format(DB_NAME))
    cur.execute('CREATE DATABASE {}'.format(DB_NAME))
    cur.close()
    conn.close()
    
    Base = declarative_base()
    
    class Person(Base):
        __tablename__ = 'people'
        id = Column(Integer, primary_key=True, nullable=False)
        name = Column(String(63))
        extra = Column(JSONB)
    
    people = Person.__table__
    
    db['username'] = db.pop('user')
    engine = create_engine(URL(database=DB_NAME, drivername='postgres', **db))
    Base.metadata.create_all(engine)
    
    extra = [1, 2, 3]
    with engine.connect() as conn:
        data1 = dict(id=123, name='joe', extra=extra)
        print('trying upsert without JSON obj in "update" data...')
        id1 = data1.pop('id')
        q = (
            pg_insert(people)
            .values(id=id1, **data1)
            .on_conflict_do_update(
                index_elements=[people.c.id],
                set_={k: v for k, v in data1.items() if k != 'extra'}
            )
        )
        conn.execute(q)
        print('successful, trying again so we do an update...')
        conn.execute(q)
        print('successful, current db:')
        for row in conn.execute(select([people.c.id, people.c.name, people.c.extra])):
            print(row)
    
        print('\n####################################\n')
        print('trying upsert **with** JSON obj in "update" data...')
        data2 = dict(id=321, name='joe', extra=extra)
        id2 = data2.pop('id')
        conn.execute(
            pg_insert(people)
            .values(id=id2, **data2)
            .on_conflict_do_update(
                index_elements=[people.c.id],
                set_={k: v for k, v in data2.items()}
            )
        )
        print('successful')
    
    engine.dispose()
    

    output:

    (env) /path/to   POSTGRESQL_PASSWORD=<password> ./demo.py 
    trying upsert without JSON obj in "update" data...
    successful, trying again so we do an update...
    successful, current db:
    (123, 'joe', [1, 2, 3])
    
    ####################################
    
    trying upsert **with** JSON obj in "update" data...
    Traceback (most recent call last):
      File "/path/to/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
        context)
      File "/path/to/env/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 469, in do_execute
        cursor.execute(statement, parameters)
    psycopg2.ProgrammingError: column "extra" is of type jsonb but expression is of type integer[]
    LINE 1: ...CONFLICT (id) DO UPDATE SET name = 'joe', extra = ARRAY[1, 2...
                                                                 ^
    HINT:  You will need to rewrite or cast the expression.
    
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "./demo.py", line 73, in <module>
        set_={k: v for k, v in data1.items()}
      File "/path/to/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 945, in execute
        return meth(self, multiparams, params)
      File "/path/to/env/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
        return connection._execute_clauseelement(self, multiparams, params)
      File "/path/to/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
        compiled_sql, distilled_params
      File "/path/to/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
        context)
      File "/path/to/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1393, in _handle_dbapi_exception
        exc_info
      File "/path/to/env/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause
        reraise(type(exception), exception, tb=exc_tb, cause=cause)
      File "/path/to/env/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 185, in reraise
        raise value.with_traceback(tb)
      File "/path/to/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
        context)
      File "/path/to/env/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 469, in do_execute
        cursor.execute(statement, parameters)
    sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "extra" is of type jsonb but expression is of type integer[]
    LINE 1: ...CONFLICT (id) DO UPDATE SET name = 'joe', extra = ARRAY[1, 2...
                                                                 ^
    HINT:  You will need to rewrite or cast the expression.
     [SQL: 'INSERT INTO people (id, name, extra) VALUES (%(id)s, %(name)s, %(extra)s) ON CONFLICT (id) DO UPDATE SET name = %(param_1)s, extra = %(param_2)s'] [parameters: {'id': 321, 'name': 'joe', 'extra': '[1, 2, 3]', 'param_1': 'joe', 'param_2': [1, 2, 3]}]
    

    I've used a list for extra so the error matches the original one, but you get another error if you use a dict.

    If this doesn't prepresent sufficient information, please let me know what you need.

  3. Mike Bayer repo owner

    Hi Samuel -

    re: "I'd suggest it's rather abrupt of you to mark this as "invalid"", I apologize for this, and I certainly don't do this for every case. However in this case I'd note that I have very prominent messaging on the "post a new bug report" screen which emphatically requests that the poster provide a complete test case, and additionally links to the SQLAlchemy Community Guide which features very detailed instructions on what we're looking for in a bug report. TBH,the suggestion that I would need to go look at your own project, figure out how to install/run/use it and then guess my way into reproducing your test case is diametrically opposed to the spirit of these guidelines.

    Combined with the code example that uses aiopg, a library that seems it may be trying to ride on SQLAlchemy's success without doing the real work of collaborating with us on a true async solution and instead invents their own execution system, I felt the need to make the case that my time is very valuable and that bug posters should at least make an attempt to follow the guidelines.

    In the vast majority of cases where people aren't able to provide a test case initially, they are unable or unwilling to provide a test case at all and these posters usually disappear. That you've come back with a test case in half an hour makes this a very unique case.

    Thank you for posting a proper test case and I will look into your test case now.

  4. Samuel Colvin reporter

    makes sense, thanks for looking into it. And thank you for sqlalchemy, it's great (even when used with aiopg :-)).

  5. Samuel Colvin reporter

    That's great, thank you get much. literal solution is working fine. Not sure if you want me to close this now or wait until it's merged/deployed?

  6. Mike Bayer repo owner

    Use full column->type processing for ON CONFLICT SET clause

    Fixed bug in new "ON CONFLICT DO UPDATE" feature where the "set" values for the UPDATE clause would not be subject to type-level processing, as normally takes effect to handle both user-defined type level conversions as well as dialect-required conversions, such as those required for JSON datatypes. Additionally, clarified that the keys in the set_ dictionary should match the "key" of the column, if distinct from the column name. A warning is emitted for remaining column names that don't match column keys; for compatibility reasons, these are emitted as they were previously.

    Fixes: #3888 Change-Id: I67a04c67aa5f65e6d29f27bf3ef2f8257088d073

    → <<cset afd78a37dafe>>

  7. Log in to comment