No nvarchar support in mssql.py

Issue #298 resolved

Former user created an issue 2006-09-09

mssql.py doesn't support nvarchar fields. I've attached a patch that adds an MSUnicode type and maps it to nvarchar. The patch also corrects an indentation error (tab instead of spaces)

Comments (27)

Former user Account Deleted
- attached mssql.diff
Patch to add unicode/nvarchar support to mssql.py
- 2006-09-09T00:05:17+00:00
Former user Account Deleted
- attached mssql.2.diff
Updated patch supports nchar as well
- 2006-09-09T08:54:47+00:00
Former user Account Deleted
- changed status to resolved
(original author: ram) Applied, thanks
- 2006-09-26T20:04:15+00:00
paj
- changed status to open
- removed status
Unicode columns are still created as VARCHAR. The problem is that Unicode is a TypeDecorator around a String type, and that TypeDecorator.dialect_impl gets the database type for the type that it wraps (i.e. String, not Unicode).

Fixing this could be interesting! One option is to make Unicode a completely separate type from String. This could raise issues for cross-database portability. MS-SQL has different database types for string and unicode fields. Postgres has a single string type, and it is a database option whether your strings are unicode.

It has been noted that NVARCHAR fields are UCS-2 only. Still, this is preferable to using VARCHAR with utf-8 encoding, as other applications (not using SQLAlchemy) may access the data. In any case, most Python builds are UCS-2 only.
- 2007-01-07T20:23:17+00:00
Mike Bayer repo owner
why not have MSUnicode override the "impl" class member to point to an NVARCHAR type defined in MS-SQL ? then MSUnicode doesnt wrap String anymore (not directly at least).
- 2007-01-08T13:21:42+00:00
paj
- attached mssql-nvarchar.patch
Patch for this using Michael's suggested approach
- 2007-01-08T17:40:10+00:00
paj
I should just clarify, the patch use the approach Michael suggested on the mailing list: another thing we might do if you want Unicode to act database-neutral is add a step at line 85 of types.py to ask the dialect first for a type_descriptor of "Unicode", then if not found to ask for the "impl" version which is normally String. The MSSQL dialect would return NVARCHAR for the "Unicode" type and "String" would not be involved. all other dialects would return None when given "Unicode".
- 2007-01-08T17:59:57+00:00
Mike Bayer repo owner
how about we go with "MissingTypeError" ?
- 2007-01-08T20:58:32+00:00
paj
- attached mssql-nvarchar2.patch
Sure, updated the patch, it includes a unit test now too.
- 2007-01-09T17:18:10+00:00
paj
- attached mssql-nvarchar3.patch
Ok, this patch also makes it actually store unicode as unicode, rather than a utf-8 encoded string. If you want this as a separate patch, just ask. MSSQL now passes the Unicode unit tests!
- 2007-01-09T17:42:24+00:00
Former user Account Deleted
(original author: ram) This looks fine to me, and passes my in-house tests.
- 2007-01-10T19:47:14+00:00
Mike Bayer repo owner
is there a condition where encoding would be sent to the dialect as None? I see that we are checking for that now. DefaultDialect has encoding as "utf-8" by default.
- 2007-01-10T19:53:58+00:00
paj
Yes, I've modified the MSSQLDialect's constructor so its default encoding is None.
- 2007-01-11T10:15:00+00:00
Mike Bayer repo owner
- changed status to resolved
OK the current version of the patch, which is completely different from what I proposed, is in changeset:2199.
- the MissingTypeError is not the way to go; the adapt_type() method does need to return the type that was given in the case that no adapted type is located. this is in the case of a completely custom user-defined type which has no dialect-specific version (unittest is testtypes/OverrideTest)
- the TypeDecorator instead just checks if the returned type object is the same object as itself (meaning it was not overridden). otherwise, it then runs the adapt step on the returned type, and returns that type.
- mssql has a separate MSNVarchar and MSUnicode type. MSNVarchar is analgous to using MSNVarchar without unicode conversion, MSUnicode analgous to using MSNVarchar with unicode conversion.
- reflecting a mssql table with NVARCHAR will give you MSNVarchar as the return type, i.e. it wont do any unicode conversion unless the dialect has "convert_unicode=True" turned on. this is the same behavior as all the other dialects (i.e. unicode is never converted unless you either specified Unicode types specifically for those columns you want, or you have convert_unicode turned on).
- since MS-SQL isnt going to do unicode conversion by default, i removed the "encoding=None" default setting, particularly because i dont like the latency introduced into the Unicode type having to check if the encoding is not None. if theres a real reason to have encoding=None on the dialect, ill reconsider (but i would think convert_unicode=False, and dont use Unicode, would be enough, if unicode conversion isnt desired).
im going to mark this fixed, but i havent tested on an actual MS-SQL database so reopen if i screwed something up.
- 2007-01-14T15:37:22+00:00
paj
- changed status to open
- removed status
It is creating the column as NVARCHAR, but the data inserted is utf-8 encoded, as opposed to being actual unicode. When you select the data, you get a unicode object that contains utf-8 encoded data. The reason for the option of "encoding = None" was to avoid these problem.

Here's an example of it happening: users_table.insert().execute(user_id=u'hello\u1234') print users_table.select().execute().fetchone() (u'hello\xe1\u02c6\xb4',)
- 2007-01-14T19:38:03+00:00
Mike Bayer repo owner
by "actual unicode", i assume you mean a python "unicode" object. that would imply that the DBAPI youre working with handles python "unicode" objects directly, and does not want an encoded string (and also returns python "unicode" objects). therefore SA should not be doing the encode() step.

if that is the case, the MS-SQL code would be changed to this:
```
class MSNVarchar(MSString):
    def get_col_spec(self):
        return "NVARCHAR(%(length)s)" % {'length' : self.length}
    def convert_bind_param(self, value, dialect):
        return value
    def convert_result_value(self, value, dialect):
        return value

class MSUnicode(sqltypes.Unicode):
    impl = MSNVarchar
    def convert_bind_param(self, value, dialect):
        return value
    def convert_result_value(self, value, dialect):
        return value
```
if thats so, I also need to know if both DBAPIs in use do it this way or not.

if you can show me completely straight DBAPI code that does the right thing, that would be informative.
- 2007-01-14T21:07:39+00:00
paj
Absolutely, what works for me is passing adodbapi a Python Unicode object as a bind parameter. And it does return a Python Unicode object from queries when appropriate. The code you suggested works perfectly.

Offhand, I don't know how pymssql works. I will investigate this and get back to you.

Many thanks for looking into this.
- 2007-01-16T19:36:16+00:00
paj
Ok, it appears pymssql does need the encoding - at least, passing it unicode bind parameters fails. But the values end up in the database encoded, rather than as proper unicode. Ooooh, getting reliable MSSQL unicode support is going to be fun!

I will keep you posted.
- 2007-01-16T19:43:16+00:00
Mike Bayer repo owner
so....does the current checked in code work with pymssql ? in that case we need to have these types just check which dialect is in use (or maybe better is to provide different versions based on the dialect in use).
- 2007-01-17T13:51:49+00:00
paj
- attached mssql-nvarchar4.patch
patch to use your suggested behaviour, just for adodbapi
- 2007-01-23T15:51:59+00:00
paj
The current code is IMHO broken for pymssql, but I think it's fine for most people's needs, and probably as good as we can easily get, bearing in mind the comment at the top of mssql.py "pymssql has problems with binary and unicode data that this module does NOT work around". In any case, this patch doesn't make pymssql any worse :-)
- 2007-01-23T15:53:45+00:00
Former user Account Deleted
- changed status to resolved
(original author: ram) committed with some additional fixes, thanks
- 2007-02-06T15:05:20+00:00
paj
I've just realised one of these changes introduced an unintended change - making the default encoding None. Here is a patch to revert that part.
- 2007-02-17T20:59:46+00:00
paj
- attached nvarchar-bugfix.patch
Patch to fix unintended code brought in by this
- 2007-02-17T21:00:23+00:00
paj
- removed status
- 2007-02-17T21:01:55+00:00
paj
- changed status to open
- 2007-02-17T21:01:55+00:00
Former user Account Deleted
- changed status to resolved
(original author: ram) nvarchar-bugfix.patch committed in 2326, thanks
- 2007-02-18T14:45:38+00:00
Log in to comment

Assignee: Mike Bayer

Type: bug

Priority: major

Status: resolved

Component: (none)

Milestone: –

Votes: 0

Watchers: 0