Possible case sensitivity bug with 'in' operator

Issue #6 resolved
Tom Wittbrodt created an issue

I've started using the nunicode sqlite extension for the unicode case folding features.

I'm not sure if this is a bug but the 'in' operator works differently if using the NU800_COLLATE NOCASE collation versus using COLLATE NOCASE.

Here's an example

sqlite3 test_in_collation.db
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .load libnusqlite3.so sqlite3_nunicode_init
sqlite> CREATE TABLE ascii_value_index (value TEXT COLLATE NOCASE, PRIMARY KEY(value));
sqlite> insert into ascii_value_index values ('ct');
sqlite> insert into ascii_value_index values ('ma');
sqlite> insert into ascii_value_index values ('me');
sqlite> insert into ascii_value_index values ('nh');
sqlite> insert into ascii_value_index values ('ri');
sqlite> insert into ascii_value_index values ('vt');
sqlite> select * from ascii_value_index where value in ('nh');
nh
sqlite> select * from ascii_value_index where value in ('NH');
nh
sqlite> -- when using COLLATE NOCASE, selecting in uppercase 'NH' does return a value
sqlite> 
sqlite> CREATE TABLE unicode_value_index (value TEXT NU800_COLLATE NOCASE, PRIMARY KEY(value));
sqlite> insert into unicode_value_index values ('ct');
sqlite> insert into unicode_value_index values ('ma');
sqlite> insert into unicode_value_index values ('me');
sqlite> insert into unicode_value_index values ('nh');
sqlite> insert into unicode_value_index values ('ri');
sqlite> insert into unicode_value_index values ('vt');
sqlite> select * from unicode_value_index where value in ('nh');
nh
sqlite> select * from unicode_value_index where value in ('NH');
sqlite> -- when using NU800_COLLATE NOCASE, selecting in uppercase 'NH' doesn't return a value

Can case insensitive comparison only be done with the 'like' operator?

Comments (5)

  1. Aleksey Tulinov repo owner

    Hello,

    For nocase collation you could use NU800_NOCASE colaltion:

    sqlite> CREATE TABLE ascii_value_index (value TEXT COLLATE NU800_NOCASE, PRIMARY KEY(value));
    sqlite> insert into ascii_value_index values ('nh');
    sqlite> select * from ascii_value_index where value in ('NH');
    nh
    

    Note that i use "COLLATE NU800_NOCASE" instead of "NU800_COLLATE NOCASE". Does that work?

    Please also note that embedding specific Unicode collaltion (8.0 in this case) into table might render indexes unusable in future if you would want to switch to Unicode 9.0. I.e. indexes built on NU80* collation won't be compatible with NU90* collations.

    Alternatively you could possibly use SELECT ... COLLATE NU800_NOCASE in some cases, this would likely imply some performance penalty however.

  2. Tom Wittbrodt reporter

    Success!!

    "COLLATE NU800_NOCASE" is working exactly as I'd hoped.

    Thank you so much!

  3. Log in to comment