Switch decoding fixes to use .decode()
Non-ASCII encoding issues have been tackled in various ways across the versions, addressing issues #256, #385, #400, #403, #476.
Turns out I should have just added a simple ds.decode()
as soon as the file was imported. I think.
This will mainly replace the function added into get_value_kw()
and related functions, but will be better in that it is done the once at the start and can cope with sequences with multiple encodings.
Comments (37)
-
reporter -
reporter Removed the decoding function from get_value_kw. Refs
#503→ <<cset ac67bcc80d93>>
-
reporter Adding in comments - not functioning yet. Refs
#503→ <<cset 9acb67960372>>
-
reporter Now works for mammo, committing to test with pipelines and postgres. Lots of comments to be removed. Possibly impossible to test unicode values in tests, as everything is unicode. Could insert concatination in extractor to test instead... Refs
#503→ <<cset e809b6018a15>>
-
reporter Strangely, at e809b6018a15 I get two export failures with my PC setup with SQLite3, seven with pipelines and none on my laptop with Postgres...
-
reporter Following advice from QuantifiedCode to use a tuple for the isinstance comparison rather than several OR statements. Refs
#503→ <<cset 3e38027a79d4>>
-
reporter Removed the comments, added decode to each extract routine. Fails with incorrect use of float for Carestream. Refs
#503→ <<cset 6a32a2b46386>>
-
reporter Moved Kodak float fix to new function carried out at start of import if decode fails. Removed original test, replaced with file import based test of the same. Refs
#503→ <<cset 9a2a6aa2419e>>
-
reporter Refactored get_value_num, get_seq_code_meaning to remove char_set arg, +PEP8 changes. Refs
#503→ <<cset 7fa13a7eebc3>>
-
reporter Unicoded ct_philips strings. minor changes to get_values. Refs
#503→ <<cset b950c01dd40e>>
-
reporter Unicoded dx.py. Refs
#503→ <<cset 3b72c3ebd1ab>>
-
reporter Unicoded rdsr.py. Refs
#503→ <<cset 890d275622c4>>
-
reporter Started unicoding mam.py. Added some this file is utf-8 strings. Refs
#503→ <<cset 256752e3e2cf>>
-
reporter Unicoded all the strings in dx_export.py. Refs
#503→ <<cset 74083c9957fc>>
-
Amended path creation for test files post merge with ref
#503. Refs#508→ <<cset 496a521a55ae>>
-
reporter Unicoded all the strings in export_csv.py. Refs
#503→ <<cset 1b41b4f179d8>>
-
reporter Unicoded all the strings in exportviews.py. Refs
#503→ <<cset 0a708b0f4549>>
-
reporter Added utf-8 declaration Refs
#503→ <<cset 384b098f75c2>>
-
reporter Added utf-8 declaration Refs
#503→ <<cset dabe3f8c68e6>>
-
reporter Unicoded rf_export, some PEP8 changes. Refs
#503→ <<cset 180a4564e824>>
-
reporter a little more unicoded rf_export, some PEP8 changes. Refs
#503→ <<cset b5654d177c82>>
-
reporter Unicoded xlsx.py, plus some PEP8. Refs
#503→ <<cset d8a052a04117>>
-
reporter Unicoded strings in mod_filters.py. Refs
#503→ <<cset c9e344f27c8a>>
-
reporter Unicoded strings in dicomviews.py, tiny bit of PEP8. Refs
#503→ <<cset 993d40b22af2>>
-
reporter Unicoded strings in keepalive.py. Refs
#503→ <<cset 4ca6c13be492>>
-
reporter Unicoded strings in qrscu.py, added utf-8 line to keepalive.py. Refs
#503→ <<cset b4123d1300f1>>
-
reporter Unicoded strings in storescp.py. Refs
#503→ <<cset 5c03b651a50b>>
-
reporter Unicoded strings in tools.py. Refs
#503→ <<cset a7d30768ca3d>>
-
reporter Added utf8 statement to check_uid and dcmdatetime. Refs
#503. Export_safe from get_values needs to be removed I think.→ <<cset 8eb342f7c347>>
-
reporter Established utf-8 encoding is essential for csv export. Renamed export_safe to export_csv_prep and added docstrings accordingly. Refs
#503→ <<cset 4685eb8e9411>>
-
reporter Removing erroneous 'u'. Refs
#503→ <<cset cc067228eb5e>>
-
reporter Added utf-8 statement to hash_id.py, deleted some commented code. Refs
#503→ <<cset 40869770e18f>>
-
reporter utf-8 statement plus mix of unicoding and PEP8 for make_skin_map.py. Refs
#503→ <<cset 40ee8ed79cd7>>
-
reporter utf-8 statement, unicoding for not_patient_indicators.py. Refs
#503. Moved incidators to settings, needs to be moved to the database as per#510.→ <<cset 95abf35d2031>>
-
reporter utf-8 statement, unicoding for launcher scripts. Refs
#503.→ <<cset 34fe046fc0e0>>
-
reporter - changed status to resolved
-
reporter Mainly file encoding ref
#503→ <<cset 4c5ed7823242>>
- Log in to comment
Refactored get_value_kw to remove the char_set part of the signature. Refs
#503→ <<cset cabeebc8ce59>>