Non-ASCII strings in DICOM cause extractor routines to fail
As reported by Samuli Hel: Google Group discussion
Protocol name with ä
in causes error: You must not use 8-bit bytestring
.
Comments (8)
-
reporter -
reporter From Eivind https://groups.google.com/forum/#!topic/openrem/rh55ulcQPl8
-
Adding from
__future__ import unicode_literals
in rdsr.py fails theif dataset.SOPClassUID ..
(near the end of rdsr.py) somehow, so extraction will fail silently.- Solution: adding
[:]
behindSOPClassUID
->SOPClassUID[:]
(both instances) will make the script run, but eventually fail (due to the "strange" characters).
- Solution: adding
-
get_value_kw(tag,dataset
) inget_values.py
must be modified to handle both strings and bytes, and code characters properly.- Solution: after
def get_value_kw(tag,dataset)
, adding
- Solution: after
# guarantee byte string in UTF8 encoding _u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
and after the
if value != '':"
value=format(_u8(value)) return value.decode(‘latin-1, ‘replace’)
(I guess changing "latin-1" with something else would work also.) I tried this using Postgresql (default settings), and it seems to work quite well.
-
-
reporter Added unicode encoding to a couple of strings instead of using the str function which can't handle non-ASCII letters. Refs
#256- possibly fixed. Needs more testing.→ <<cset 1216374ba525>>
-
reporter Added latin-1 decode which refs
#256and looks to fix it, but wouldn't work for other character sets.→ <<cset 26e8564ad802>>
-
reporter - changed status to open
-
reporter Corrected mistake that meant changes made in 26e8564ad802 would never work! Refs
#256and hopefully fixes it.→ <<cset ae724b09b510>>
-
reporter - changed status to resolved
Confirmed working by Eivind 10th September by email.
-
reporter Made use of get_value_kw to cover the case of non-ASCII characters in series description. Refs
#256. Also ensures that we don't store '' to an integer field. Refs#316→ <<cset 3af97675ec37>>
- Log in to comment
from __future__ import unicode_literals
removes the error, but the import fails silently (from Google Groups discussion)