NULL characters cause string comparison weirdness in Postgres

Issue #3685 wontfix
Ben Kuhn
created an issue

When filtering for table.c.column == foo_string, if foo_string contains a NUL (\x00) character, the string comparison stops just before the first NULL byte. That is, .where(table.c.column == 'foo') and .where(table.c.column == 'foo\x00bar') return identical result sets. This happens with Postgres and the postgresql dialect.

This is surprising behavior because it means that the returned rows do not satisfy the invariant I would expect, that row.column == foo_string. It also has the potential to cause security vulnerabilities if the programmer assumes that foo_string has passed some kind of validation because it matched a database row.

Attached is a MCVE, and here is its output on my system:

% createdb nullbytes; python sqla_null.py
createdb: database creation failed: ERROR:  database "nullbytes" already exists
Python: 3.5.1 (default, Dec  7 2015, 21:59:08)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)]
SQLAlchemy: 1.0.12
Postgres: PostgreSQL 9.4.4 on x86_64-apple-darwin14.3.0, compiled by Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn), 64-bit
'ben.s.kuhn@gmail.com' == 'ben.s.kuhn@gmail.com\x00 huh??'

I would expect the output not to include the last line (because the result set of the query should be empty).

Comments (4)

  1. Michael Bayer repo owner

    great test. However, the treatment of bound parameters is a function of the database driver. I'm not sure I consider this a bug as NUL is not a legal character for Postgresql TEXT/CHAR/VARCHAR datatypes (which are all synonymous for Postgresql); per http://www.postgresql.org/docs/9.2/static/functions-string.html, regarding the chr() function:

    The NULL (0) character is not allowed because text data types cannot store such bytes.

    Also see http://stackoverflow.com/a/28816294/34549, http://www.postgresql.org/message-id/alpine.BSO.2.00.0906031639270.2432@leary.csoft.net, and probably others which all address the inability of Postgresql to consider the NUL byte in a textual value (the binary datatype BYTEA can handle this just fine).

    Not surprisingly, the test here has the same results using psycopg2 alone. If you want to pursue this as an upstream bug further, you should report it to psycopg2:

    import psycopg2
    
    conn = psycopg2.connect(user='scott', password='tiger', host='localhost', dbname='test')
    cursor = conn.cursor()
    
    try:
        cursor.execute("""drop table "user" """)
    except:
        pass
    
    cursor.execute("""
    CREATE TABLE "user" (
        email_address VARCHAR(60)
    )
    """)
    
    cursor.execute(
        """INSERT INTO "user" (email_address) VALUES (%(email_address)s)""",
        {'email_address': 'ben.s.kuhn@gmail.com'})
    
    cursor.execute("""
    SELECT "user".email_address
    FROM "user"
    WHERE "user".email_address = %(email_address_1)s
    """, {'email_address_1': 'ben.s.kuhn@gmail.com\x00 huh??'})
    
    
    print cursor.fetchall()
    

    If we use a datatype appropriate for NUL, e.g. BYTEA, it works fine:

    import psycopg2
    
    conn = psycopg2.connect(user='scott', password='tiger', host='localhost', dbname='test')
    cursor = conn.cursor()
    
    try:
        cursor.execute("""drop table "user" """)
    except:
        pass
    
    cursor.execute("""
    CREATE TABLE "user" (
        email_address BYTEA
    )
    """)
    
    cursor.execute(
        """INSERT INTO "user" (email_address) VALUES (%(email_address)s)""",
        {'email_address': 'ben.s.kuhn@gmail.com'})
    
    cursor.execute("""
    SELECT "user".email_address
    FROM "user"
    WHERE "user".email_address = %(email_address_1)s
    """, {'email_address_1': psycopg2.Binary('ben.s.kuhn@gmail.com\x00 huh??')})
    
    
    print(cursor.fetchall())
    

    SQLAlchemy's binary datatypes intercept the comparison to a binary datatype and will place the appropriate dbapi.Binary wrapper, so your test can work with this change::

    #!python
    import sqlalchemy as sa
    
    engine = sa.create_engine('postgres://scott:tiger@localhost/test', echo=True)
    
    metadata = sa.MetaData()
    table = sa.Table('user', metadata,
        sa.Column('email_address', sa.LargeBinary()))
    metadata.drop_all(engine)
    metadata.create_all(engine)
    
    engine.execute(table.delete())
    engine.execute(table.insert().values(['ben.s.kuhn@gmail.com']))
    value = 'ben.s.kuhn@gmail.com\x00 huh??'
    for row in engine.execute(table.select().where(
            table.c.email_address == value)):
        print(repr(row[0]), '==', repr(value))
    
  2. Ben Kuhn reporter

    OK, I've raised the issue with psycopg2. Sorry, didn't realize it was a DBAPI issue. Thanks so much for your work reducing the test case further!

  3. Log in to comment