Switch decoding fixes to use .decode()

Issue #503 resolved

Ed McDonagh created an issue 2017-06-04

Non-ASCII encoding issues have been tackled in various ways across the versions, addressing issues ~~#256~~, ~~#385~~, ~~#400~~, ~~#403~~, ~~#476~~.

Turns out I should have just added a simple ds.decode() as soon as the file was imported. I think.

This will mainly replace the function added into get_value_kw() and related functions, but will be better in that it is done the once at the start and can cope with sequences with multiple encodings.

Comments (37)

Ed McDonagh reporter
Refactored get_value_kw to remove the char_set part of the signature. Refs ~~#503~~

→ <<cset cabeebc8ce59>>
- 2017-06-05T11:48:42+00:00
Ed McDonagh reporter
Removed the decoding function from get_value_kw. Refs ~~#503~~

→ <<cset ac67bcc80d93>>
- 2017-06-05T11:48:42+00:00
Ed McDonagh reporter
Adding in comments - not functioning yet. Refs ~~#503~~

→ <<cset 9acb67960372>>
- 2017-06-05T11:48:42+00:00
Ed McDonagh reporter
Now works for mammo, committing to test with pipelines and postgres. Lots of comments to be removed. Possibly impossible to test unicode values in tests, as everything is unicode. Could insert concatination in extractor to test instead... Refs ~~#503~~

→ <<cset e809b6018a15>>
- 2017-06-05T13:39:27+00:00
Ed McDonagh reporter
Strangely, at e809b6018a15 I get two export failures with my PC setup with SQLite3, seven with pipelines and none on my laptop with Postgres...
- 2017-06-05T13:54:01+00:00
Ed McDonagh reporter
Following advice from QuantifiedCode to use a tuple for the isinstance comparison rather than several OR statements. Refs ~~#503~~

→ <<cset 3e38027a79d4>>
- 2017-06-05T16:02:10+00:00
Ed McDonagh reporter
Removed the comments, added decode to each extract routine. Fails with incorrect use of float for Carestream. Refs ~~#503~~

→ <<cset 6a32a2b46386>>
- 2017-06-05T17:10:20+00:00
Ed McDonagh reporter
Moved Kodak float fix to new function carried out at start of import if decode fails. Removed original test, replaced with file import based test of the same. Refs ~~#503~~

→ <<cset 9a2a6aa2419e>>
- 2017-06-05T21:32:12+00:00
Ed McDonagh reporter
Refactored get_value_num, get_seq_code_meaning to remove char_set arg, +PEP8 changes. Refs ~~#503~~

→ <<cset 7fa13a7eebc3>>
- 2017-06-06T17:24:48+00:00
Ed McDonagh reporter
Unicoded ct_philips strings. minor changes to get_values. Refs ~~#503~~

→ <<cset b950c01dd40e>>
- 2017-06-07T07:46:00+00:00
Ed McDonagh reporter
Unicoded dx.py. Refs ~~#503~~

→ <<cset 3b72c3ebd1ab>>
- 2017-06-07T22:06:33+00:00
Ed McDonagh reporter
Unicoded rdsr.py. Refs ~~#503~~

→ <<cset 890d275622c4>>
- 2017-06-07T22:13:23+00:00
Ed McDonagh reporter
Started unicoding mam.py. Added some this file is utf-8 strings. Refs ~~#503~~

→ <<cset 256752e3e2cf>>
- 2017-06-07T22:17:43+00:00
Ed McDonagh reporter
Unicoded all the strings in dx_export.py. Refs ~~#503~~

→ <<cset 74083c9957fc>>
- 2017-06-10T22:09:42+00:00
Jamie Dormand
Amended path creation for test files post merge with ref ~~#503~~. Refs ~~#508~~

→ <<cset 496a521a55ae>>
- 2017-06-12T16:17:32+00:00
Ed McDonagh reporter
Unicoded all the strings in export_csv.py. Refs ~~#503~~

→ <<cset 1b41b4f179d8>>
- 2017-06-12T17:17:53+00:00
Ed McDonagh reporter
Unicoded all the strings in exportviews.py. Refs ~~#503~~

→ <<cset 0a708b0f4549>>
- 2017-06-12T17:30:27+00:00
Ed McDonagh reporter
Added utf-8 declaration Refs ~~#503~~

→ <<cset 384b098f75c2>>
- 2017-06-12T17:30:27+00:00
Ed McDonagh reporter
Added utf-8 declaration Refs ~~#503~~

→ <<cset dabe3f8c68e6>>
- 2017-06-13T17:07:26+00:00
Ed McDonagh reporter
Unicoded rf_export, some PEP8 changes. Refs ~~#503~~

→ <<cset 180a4564e824>>
- 2017-06-13T17:07:26+00:00
Ed McDonagh reporter
a little more unicoded rf_export, some PEP8 changes. Refs ~~#503~~

→ <<cset b5654d177c82>>
- 2017-06-13T17:16:23+00:00
Ed McDonagh reporter
Unicoded xlsx.py, plus some PEP8. Refs ~~#503~~

→ <<cset d8a052a04117>>
- 2017-06-14T08:41:03+00:00
Ed McDonagh reporter
Unicoded strings in mod_filters.py. Refs ~~#503~~

→ <<cset c9e344f27c8a>>
- 2017-06-14T11:51:32+00:00
Ed McDonagh reporter
Unicoded strings in dicomviews.py, tiny bit of PEP8. Refs ~~#503~~

→ <<cset 993d40b22af2>>
- 2017-06-14T17:03:23+00:00
Ed McDonagh reporter
Unicoded strings in keepalive.py. Refs ~~#503~~

→ <<cset 4ca6c13be492>>
- 2017-06-14T17:03:23+00:00
Ed McDonagh reporter
Unicoded strings in qrscu.py, added utf-8 line to keepalive.py. Refs ~~#503~~

→ <<cset b4123d1300f1>>
- 2017-06-14T21:12:56+00:00
Ed McDonagh reporter
Unicoded strings in storescp.py. Refs ~~#503~~

→ <<cset 5c03b651a50b>>
- 2017-06-15T07:27:32+00:00
Ed McDonagh reporter
Unicoded strings in tools.py. Refs ~~#503~~

→ <<cset a7d30768ca3d>>
- 2017-06-15T07:27:32+00:00
Ed McDonagh reporter
Added utf8 statement to check_uid and dcmdatetime. Refs ~~#503~~. Export_safe from get_values needs to be removed I think.

→ <<cset 8eb342f7c347>>
- 2017-06-15T08:22:16+00:00
Ed McDonagh reporter
Established utf-8 encoding is essential for csv export. Renamed export_safe to export_csv_prep and added docstrings accordingly. Refs ~~#503~~

→ <<cset 4685eb8e9411>>
- 2017-06-15T13:42:48+00:00
Ed McDonagh reporter
Removing erroneous 'u'. Refs ~~#503~~

→ <<cset cc067228eb5e>>
- 2017-06-15T13:45:59+00:00
Ed McDonagh reporter
Added utf-8 statement to hash_id.py, deleted some commented code. Refs ~~#503~~

→ <<cset 40869770e18f>>
- 2017-06-15T21:24:48+00:00
Ed McDonagh reporter
utf-8 statement plus mix of unicoding and PEP8 for make_skin_map.py. Refs ~~#503~~

→ <<cset 40ee8ed79cd7>>
- 2017-06-15T21:24:48+00:00
Ed McDonagh reporter
utf-8 statement, unicoding for not_patient_indicators.py. Refs ~~#503~~. Moved incidators to settings, needs to be moved to the database as per ~~#510~~.

→ <<cset 95abf35d2031>>
- 2017-06-15T21:24:48+00:00
Ed McDonagh reporter
utf-8 statement, unicoding for launcher scripts. Refs ~~#503~~.

→ <<cset 34fe046fc0e0>>
- 2017-06-19T07:52:58+00:00
Ed McDonagh reporter
- changed status to resolved
Updating changes. Refs or fixes ~~#464~~, ~~#476~~, ~~#498~~, ~~#503~~, ~~#504~~, ~~#505~~, ~~#508~~, ~~#509~~, ~~#511~~

→ <<cset 87ef1ac19991>>
- 2017-06-22T21:45:24+00:00
Ed McDonagh reporter
Mainly file encoding ref ~~#503~~

→ <<cset 4c5ed7823242>>
- 2017-07-11T14:32:47+00:00
Log in to comment

Assignee: Ed McDonagh

Type: proposal

Priority: major

Status: resolved

Component: Import: All

Milestone: 0.8.0

Votes: 0

Watchers: 1