mysql case insensitive matching

Issue #418 resolved
Former user created an issue

mysql seems to be case insensitive on String fields by default. so a query for a lowercase name that is present in uppercase returns rows. Postgres, sqlite and oracle do not show that behavior.

The test programm:

#!/usr/bin/env python2.5

from sqlalchemy import *

oracle = 'oracle://user:pass@some_instance'
sqlite = 'sqlite:///somedb.sqlite'
postgres = 'postgres://user:pass@host/some_db'
mysql   = 'mysql://user:pass@host/some_db'

def do_test(uri, dbname):
    table = Table('testtable', BoundMetaData(uri),
        Column('testtable_id', Integer, Sequence('testable_seq'), primary_key=True),
        Column('name', String(20)),
        )

    table.drop(checkfirst=True)
    table.create(checkfirst=True)
    table.insert().execute(name='NAME')
    if table.select().execute(name='name').fetchone():
        print '%10s [FAIL](FAIL)' % dbname
    else:
        print '%10s [PASS](PASS)' % dbname
    table.drop(checkfirst=True)

do_test(postgres, 'postgres')
do_test(mysql, 'mysql')
do_test(sqlite, 'sqlite')
do_test(oracle, 'oracle')

The output of the programm for me is

  postgres [PASS](PASS)
     mysql [FAIL](FAIL)
    sqlite [PASS](PASS)
    oracle [PASS](PASS)

I think sqlalchemy should glue over this differences in db defaults by setting a reasonable (case sensitive) default to all databases it talks to.

Comments (7)

  1. jek

    The columns above are picking up their case sensivity (collation) from the schema ('some_db'), falling back to the server-wide default. If you want case sensitivity on all columns by default, setting it at database creation is the simplest way to go. It seems tricky for SA to programatically pick a collation, as they are tied to the character set and there can be more than one choice per set.

  2. Mike Bayer repo owner

    it would be a little late to change that now considering how many applications are probably running with the assumption of this behavior. While I dont particularly know how to change MySQL's behavior in this area, I would favor a boolean flag on the dialect that you'd pass to create_engine() which then makes this happen, and thats probably as far as I'd want to go.

  3. Former user Account Deleted

    The mysql COLLATE keyword can be used in table creation statements, selects, updates and deletes to define what "flavor" of encoding_cs/ci should be used.

    See also: * http://dev.mysql.com/doc/refman/5.0/en/create-table.html * http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html * http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html

    I understand that you worry about backwards compatibilitiy, however, have you considered that knowing all those switches intimately in order to make them work together in a db agnostic way is by far and large undocumented, unexampled and a pain in the butt. That would merit "flavor" classes or whatever, wheras one "flavor" implementation would be DBAgnosticFlavor() and the other might be MySQLFanaticFlavor() or BackwardsCompatibleFlavor(), whatever, to me it's inconsistencies barring multi db development for people who simply want to use sqlalchemy and do not want to worry too much about the finer details.

  4. Mike Bayer repo owner

    an argument as usual....

    i dont think SA is trying to make everything work completely DB agnostically, i think thats largely impossible. I think most people using SA with MySQL are assuming MySQL's default behavior, and if i made a new release with a change like this across the board youre going to see people's production systems crashing. additionally ive no idea what performance and other side effects are introduced by globally forcing the word COLLATE everywhere, id imagine having no option to not use this keyword in all queries is going to upset people. I use mysql with hibernate every day, and hibernate is not issuing any COLLATE keywords anywhere.

    plus, we arent even getting into what if you use an Oracle database which is set to be case insensitive ? databases that are created for utf-8 vs. ascii ? theres so many issues like these and its just not SA's job to try to smooth them all out into one big virtual "uni-database"..many of these conditions are undetectable from the client side.

    also note that when you create tables in mysql, they also are ISAM tables by default and dont even have any foreign keys on them - so reflection in MySQL is largely broken by default as well, yet nobody's had a problem with that - they know its MySQL, they know to add "InnoDB" to their tables to get foreign keys, and if I add options to add COLLATE on table creates, then theyll know to use those as well. since this is an option that can be specified at table create, thats the best place for this since we already have a way to feed in extra options to the Table object.

    if youre writing an app that youd like to work on lots of DBs, all you have to do is read the DatabaseNotes page to see what incompatibilities exist, test your app on all the platforms youd like to support and assign proper settings to each. its not unreasonable. and we have a "flavor" class, its called Dialect. but i think trying to create some system of multiple dialects that will automatically smooth all issues is impossible, being how many are undetectable and/or may occur at stages that some apps run and others do not (some apps create tables, others reflect them, etc), so its better not to try to "hide" the existence of these quirks.

  5. Former user Account Deleted

    Yeah, I see what you mean. It would still be nice if one could be db agnostic :)

    Anyway, I think you need a better tactics to handle the case that something would be sorta nice, but can't be done because you need to remain backwards compatible. I do know that this is a definite rabbit hole, but my strategy is to throw away every once in a while and start with a clean plate. Of course that's the application point of view.

  6. Mike Bayer repo owner

    im assuming table-level options are good enough here, currently Column doesnt support dialect-specific arguments. I have modified MySQL to allow all Table arguments like mysql_<something> to just pass through now, such as ENGINE, COLLATE, CHARSET, etc. so you can send through wahtever you want. of course you could always have just used a distinct ALTER statement to set up collation as well (and no problem with straight text since its a mysql-specific option anyway).

  7. Mike Bayer repo owner

    a new ticket to address this issue from the PG side has been introduced, specifically to implement case insensitive matching on that end (as opposed to case-sensitive matching for mysql). it might be good to have that ticket implement "specifically case-insensitive", "specifically case-sensitive", and "no preference" options. see #487

  8. Log in to comment