cboos  committed 604b2dd

Wiki headings containing some special characters could
trigger an `UnicodeDecodeError: ... unexpected end of data`.

For example, the character `LATIN SMALL LETTER A WITH GRAVE`
is encoded in UTF-8 using the 2 bytes sequence `'\xc3\xa0'`.

Depending on the current locale, the `splitlines` function
used in `wiki_to_oneliner` might interpret the single byte
`'\xa0'` as a `NO-BREAK SPACE` (in most of the ''iso-8859''
and a lot of the ''cp'' encodings).
This splits the utf-8 encoded string right in the middle of
a two bytes sequence, which will later trigger the above exception.

Therefore we temporarily use `unicode` objects before
doing the `splitlines`.

Of course, this is only relevant for [milestone:0.9],
as since [milestone:0.10], `unicode` objects are used

Fixes #3058.

  • Participants
  • Parent commits 04b0062
  • Branches 0.9-stable

Comments (0)

Files changed (2)

File trac/wiki/

 import re
 import os
 import urllib
+import StringIO as pyStringIO
     from cStringIO import StringIO
         # Simplify code blocks
         in_code_block = 0
         processor = None
-        buf = StringIO()
+        buf = pyStringIO.StringIO()
+        text = unicode(text, 'utf-8', 'replace')
         for line in text.strip().splitlines():
             if line.strip() == '{{{':
                 in_code_block += 1
                 print>>buf, line
         result = buf.getvalue()[:-1]
+        result = result.encode('utf-8')
         if shorten:
             result = util.shorten_line(result)

File trac/wiki/tests/wiki-tests.txt

+== Indice di Priorità ==
+<h2 id="IndicediPriorità">Indice di Priorità</h2>
+== Indice di Priorità ==
 == Heading with trailing white-space == 
 <h2 id="Headingwithtrailingwhitespace">Heading with trailing white-space</h2>