Non-ASCII strings in DICOM cause extractor routines to fail (again)
Import fails (for fluoroscopy) with non-ascii characters, confirmed that extraction works with these characters removed. I think it may be due to handling strings in "Acquistion Protocol" - quite similar to issue #256, so I guess the solution could be the same?
"remapp.extractors.rdsr.rdsr[] raised unexpected: ProgrammingError('You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings"
@edmcdonagh, I can email you the log and a sample RDSR if necessary.
(0.7.0b12)
Comments (9)
-
-
New safe_strings method for strings from sequences. Made get_seq_code_meaning non-ASCII safe too. Refs
#385→ <<cset 70acaa3fb672>>
-
Put all the TextValues in rdsr.py through safe_strings. Also put all the lists of if statements into elif's where appropriate. Refs
#385→ <<cset d69f780622e3>>
-
This should be fixed now, as the other extraction routines don't deal with strings from sequences.
Just need to test to make sure I haven't broken anything.
-
Tested by importing 50 CT, 9 fluoro and 2 mammo RDSRs in both this branch and the dev branch. The import times were not significantly different, the web interface displays were identical and the exports were identical.
This won't have tested every field that is in the database, but should suffice for merging back into dev.
-
Added ref
#385to changes/CHANGES→ <<cset 214a2757e255>>
-
Added ref
#385to release notes→ <<cset 979c179ba080>>
-
- changed status to resolved
Merging fixes for non-ASCII strings in TextValue RDSR sequences into develop. Fixes
#385→ <<cset be1b9e15ebd1>>
-
- changed milestone to 0.7.0
-
assigned issue to
- Log in to comment
Thanks for reporting this @leivind
I've had a quick look to remind myself, and it seems that the fix last time was to modify get_value_keyword to deal with non-ASCII. However, this time the strings aren't extracted using that function, so I'll have to look to see the best way of fixing it.
If you could forward me an RDSR that would be great!