1. OpenREM
  2. OpenREM
  3. OpenREM
  4. Issues

Issues

Issue #256 resolved

Non-ASCII strings in DICOM cause extractor routines to fail

Ed McDonagh
created an issue

As reported by Samuli Hel: Google Group discussion

Protocol name with ä in causes error: You must not use 8-bit bytestring.

Comments (8)

  1. Ed McDonagh reporter

    From Eivind https://groups.google.com/forum/#!topic/openrem/rh55ulcQPl8

    1. Adding from __future__ import unicode_literals in rdsr.py fails the if dataset.SOPClassUID .. (near the end of rdsr.py) somehow, so extraction will fail silently.

      • Solution: adding [:] behind SOPClassUID -> SOPClassUID[:] (both instances) will make the script run, but eventually fail (due to the "strange" characters).
    2. get_value_kw(tag,dataset) in get_values.py must be modified to handle both strings and bytes, and code characters properly.

      • Solution: after def get_value_kw(tag,dataset) , adding
    # guarantee byte string in UTF8 encoding 
    _u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t 
    

    and after the if value != '':"

    value=format(_u8(value)) 
    return value.decode(latin-1, replace) 
    

    (I guess changing "latin-1" with something else would work also.) I tried this using Postgresql (default settings), and it seems to work quite well.

  2. Log in to comment