unistring.py incompatible with Jython (contains isolated surrogate unicode chars)

Issue #358 resolved
Anonymous created an issue

The latest Jython has switched to backing unicode by UTF-16 (see [http://zyasoft.com/pythoneering/2008/06/utf-16-jython-2.html Jim Baker's blog post] for more information).

Unfortunately this change means that Jython doesn't have the ability to represent isolated surrogates via unicode.

Pygments' unistring.Cs variable contains all unicode surrogates in isolation. Jython can't handle this, which causes the following exception (using the asm branch). This is pretty much triggered by any typical use of Pygments:

{{{ Jython 2.5a1+ (asm:5198M, Aug 18 2008, 13:31:57) [Java HotSpot(TM) Client VM (Apple Inc.)] on java1.5.0_13 Type "help", "copyright", "credits" or "license" for more information.

import unistring Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 2-8: illegal Unicode character }}}

Reported by guest