- changed title to NULL characters cause string comparison weirdness in Postgres
NULL characters cause string comparison weirdness in Postgres
When filtering for table.c.column == foo_string
, if foo_string
contains a NUL (\x00
) character, the string comparison stops just before the first NULL byte. That is, .where(table.c.column == 'foo')
and .where(table.c.column == 'foo\x00bar')
return identical result sets. This happens with Postgres and the postgresql dialect.
This is surprising behavior because it means that the returned rows do not satisfy the invariant I would expect, that row.column == foo_string
. It also has the potential to cause security vulnerabilities if the programmer assumes that foo_string
has passed some kind of validation because it matched a database row.
Attached is a MCVE, and here is its output on my system:
% createdb nullbytes; python sqla_null.py
createdb: database creation failed: ERROR: database "nullbytes" already exists
Python: 3.5.1 (default, Dec 7 2015, 21:59:08)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)]
SQLAlchemy: 1.0.12
Postgres: PostgreSQL 9.4.4 on x86_64-apple-darwin14.3.0, compiled by Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn), 64-bit
'ben.s.kuhn@gmail.com' == 'ben.s.kuhn@gmail.com\x00 huh??'
I would expect the output not to include the last line (because the result set of the query should be empty).
Comments (4)
-
reporter -
repo owner great test. However, the treatment of bound parameters is a function of the database driver. I'm not sure I consider this a bug as NUL is not a legal character for Postgresql TEXT/CHAR/VARCHAR datatypes (which are all synonymous for Postgresql); per http://www.postgresql.org/docs/9.2/static/functions-string.html, regarding the
chr()
function:The NULL (0) character is not allowed because text data types cannot store such bytes.
Also see http://stackoverflow.com/a/28816294/34549, http://www.postgresql.org/message-id/alpine.BSO.2.00.0906031639270.2432@leary.csoft.net, and probably others which all address the inability of Postgresql to consider the NUL byte in a textual value (the binary datatype BYTEA can handle this just fine).
Not surprisingly, the test here has the same results using psycopg2 alone. If you want to pursue this as an upstream bug further, you should report it to psycopg2:
import psycopg2 conn = psycopg2.connect(user='scott', password='tiger', host='localhost', dbname='test') cursor = conn.cursor() try: cursor.execute("""drop table "user" """) except: pass cursor.execute(""" CREATE TABLE "user" ( email_address VARCHAR(60) ) """) cursor.execute( """INSERT INTO "user" (email_address) VALUES (%(email_address)s)""", {'email_address': 'ben.s.kuhn@gmail.com'}) cursor.execute(""" SELECT "user".email_address FROM "user" WHERE "user".email_address = %(email_address_1)s """, {'email_address_1': 'ben.s.kuhn@gmail.com\x00 huh??'}) print cursor.fetchall()
If we use a datatype appropriate for NUL, e.g. BYTEA, it works fine:
import psycopg2 conn = psycopg2.connect(user='scott', password='tiger', host='localhost', dbname='test') cursor = conn.cursor() try: cursor.execute("""drop table "user" """) except: pass cursor.execute(""" CREATE TABLE "user" ( email_address BYTEA ) """) cursor.execute( """INSERT INTO "user" (email_address) VALUES (%(email_address)s)""", {'email_address': 'ben.s.kuhn@gmail.com'}) cursor.execute(""" SELECT "user".email_address FROM "user" WHERE "user".email_address = %(email_address_1)s """, {'email_address_1': psycopg2.Binary('ben.s.kuhn@gmail.com\x00 huh??')}) print(cursor.fetchall())
SQLAlchemy's binary datatypes intercept the comparison to a binary datatype and will place the appropriate
dbapi.Binary
wrapper, so your test can work with this change::#!python import sqlalchemy as sa engine = sa.create_engine('postgres://scott:tiger@localhost/test', echo=True) metadata = sa.MetaData() table = sa.Table('user', metadata, sa.Column('email_address', sa.LargeBinary())) metadata.drop_all(engine) metadata.create_all(engine) engine.execute(table.delete()) engine.execute(table.insert().values(['ben.s.kuhn@gmail.com'])) value = 'ben.s.kuhn@gmail.com\x00 huh??' for row in engine.execute(table.select().where( table.c.email_address == value)): print(repr(row[0]), '==', repr(value))
-
repo owner - changed status to wontfix
this is a known behavior of upstream DBAPI and database itself.
-
reporter OK, I've raised the issue with psycopg2. Sorry, didn't realize it was a DBAPI issue. Thanks so much for your work reducing the test case further!
- Log in to comment