Issue #739 new

Session storage to Postgres fails for unicode string objects when using utf-8 based database

created an issue

I'm using CherryPy 2.2.1 with Postgresql that has a database using UNICODE encoding. Storing sessions there breaks when the stored content start being non-ascii, as Python's pickle stores these strings as raw-unicode-escape, which stores nicely into an ISO-LATIN encoded database, but does not store into a UNICODE-encoded database.

A simple solution for me is to say {{{pickled_data = pickled_data.decode('raw-unicode-escape').encode('utf-8')}}} when saving and {{{pickled_data = pickled_data.decode('utf-8').encode('raw-unicode-escape')}}} when loading, but of course a proper fix should figure out the encoding used in the database and use that.

Or you should fix the documentation to say that the database where sessions are stored needs to use ISO-LATIN encoding (which then often will require a separate database for sessions which might be a problem).

Comments (1)

  1. guest reporter

    Another option would be to change the session table schema from using field type "text" to using type "bytea" (from clob to blob). This would then not cause problems with string encodings. However, then the load method would need to be changed slightly, since reading from bytea will return a buffer instead of a string, so something like this is needed: data = pickle.loads(str(pickled_data)) (add the str call into the line). This also would not break existing implementations that already use a text field, since calling str on a string is a no-op.

  2. Log in to comment