Non-ASCII strings in DICOM cause extractor routines to fail (again)

Issue #385 resolved
Eivind L created an issue

Import fails (for fluoroscopy) with non-ascii characters, confirmed that extraction works with these characters removed. I think it may be due to handling strings in "Acquistion Protocol" - quite similar to issue #256, so I guess the solution could be the same?

"remapp.extractors.rdsr.rdsr[] raised unexpected: ProgrammingError('You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings"

@edmcdonagh, I can email you the log and a sample RDSR if necessary.

(0.7.0b12)

Comments (9)

  1. Ed McDonagh

    Thanks for reporting this @leivind

    I've had a quick look to remind myself, and it seems that the fix last time was to modify get_value_keyword to deal with non-ASCII. However, this time the strings aren't extracted using that function, so I'll have to look to see the best way of fixing it.

    If you could forward me an RDSR that would be great!

  2. Ed McDonagh

    This should be fixed now, as the other extraction routines don't deal with strings from sequences.

    Just need to test to make sure I haven't broken anything.

  3. Ed McDonagh

    Tested by importing 50 CT, 9 fluoro and 2 mammo RDSRs in both this branch and the dev branch. The import times were not significantly different, the web interface displays were identical and the exports were identical.

    This won't have tested every field that is in the database, but should suffice for merging back into dev.

  4. Log in to comment