Radiographic csv export is full of extra line breaks and each field is surrounded in b''
This may also affect other modalities.
Comments (32)
-
reporter -
reporter If it is an xlsx file then use binary mode; otherwise use text mode. This removes the spurious b characters from the csv exports. However, there remains an extra line break after every row. References issue
#797→ <<cset 1acb526b8437>>
-
reporter This now correctly works for mammo csv exports, but other modalities have the b characters again... References issue
#797→ <<cset a901426af080>>
-
Unless you are really keen @David Platten , I’ll take this on
-
reporter @Ed McDonagh : very happy for you to take this one on. I suspect it may be a Windows thing, so I’ll happily test any changes that you make.
-
Removing encoding to utf-8 (everything is utf-8 now). Removing legacy imports. Needs thorough testing. Refs
#797→ <<cset bfeb4b07fe48>>
-
Added excel dialect - makes no difference for LibreOffice/Linux, might help with non-ASCI on Excel/Windows. Refs
#797→ <<cset 6904fca96819>>
-
Fixing import error. Also have version error with django-qsstats-magic on local testing, not sure why. Refs
#797→ <<cset 42882b60c771>>
-
Correcting spacing error, adding import-outside-toplevel to excludes on pylint. Refs
#797→ <<cset 91c94af87faa>>
-
reporter I’ve just tried exporting some radiographic data to a csv file. There are no b characters surrounding the data, but there are blank rows between each row of data.
I also tried exporting one of the mammography test studies that includes unicode characters. This resulted in an error in my default.log:
[2019-12-06 14:44:57,589: WARNING/MainProcess] --- Logging error ---
[2019-12-06 14:44:57,590: WARNING/MainProcess] Traceback (most recent call last):
[2019-12-06 14:44:57,590: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
[2019-12-06 14:44:57,591: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 641, in protected_call
return self.run(*args, **kwargs)
[2019-12-06 14:44:57,591: WARNING/MainProcess] File "D:\code\python\openrem\openrem\remapp\exports\mg_export.py", line 303, in exportMG2excel
writer.writerow([str(data_string) for data_string in series_data])
[2019-12-06 14:44:57,591: WARNING/MainProcess] File "C:\Python37\lib\tempfile.py", line 481, in func_wrapper
return func(*args, **kwargs)
[2019-12-06 14:44:57,592: WARNING/MainProcess] File "C:\Python37\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
[2019-12-06 14:44:57,592: WARNING/MainProcess] UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: character maps to <undefined>
[2019-12-06 14:44:57,592: WARNING/MainProcess] During handling of the above exception, another exception occurred:
[2019-12-06 14:44:57,593: WARNING/MainProcess] Traceback (most recent call last):
[2019-12-06 14:44:57,593: WARNING/MainProcess] File "C:\Python37\lib\logging_init_.py", line 1028, in emit
stream.write(msg + self.terminator)
[2019-12-06 14:44:57,593: WARNING/MainProcess] File "C:\Python37\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
[2019-12-06 14:44:57,593: WARNING/MainProcess] UnicodeEncodeError: 'charmap' codec can't encode characters in position 178-181: character maps to <undefined> -
Thanks David. That’s useful feedback - Windows Python is obviously trying to to stick to stupid code page 1252 which is mostly ASCII plus Latin-1. Which won’t work very well for Chinese characters!
I’ll see if I can work out where we need to specify utf-8 without turning the strings into bytecode.
-
Specifying encoding of temporary file. On Ubuntu, defaults to UTF-8, maybe not on Windows. Refs
#797Left print statement in, should confirm format. With or without explicit, printstemp_file is <_io.TextIOWrapper name=35 mode='w+' encoding='utf-8'>
on Ubuntu.→ <<cset d5a708160617>>
-
@David Platten - doing a minimal test in Python 3.8 on Windows suggests that adding the encoding to the TemporaryFile fixes that error, though I didn’t see it all the way through to writing to disk.
Are you in a position to test this version of the code on a Windows machine?
-
reporter @Ed McDonagh I’ve tested this version of the code; it still breaks. Full log below:
[2019-12-10 08:26:06,351: WARNING/MainProcess] temp_file is <tempfile._TemporaryFileWrapper object at 0x00000289B3463CC8> [2019-12-10 08:26:06,437: WARNING/MainProcess] --- Logging error --- [2019-12-10 08:26:06,439: WARNING/MainProcess] Traceback (most recent call last): [2019-12-10 08:26:06,439: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 382, in trace_task R = retval = fun(*args, **kwargs) [2019-12-10 08:26:06,439: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 641, in __protected_call__ return self.run(*args, **kwargs) [2019-12-10 08:26:06,439: WARNING/MainProcess] File "D:\code\python\openrem\openrem\remapp\exports\mg_export.py", line 347, in exportMG2excel write_export(tsk, export_filename, tmpfile, datestamp) [2019-12-10 08:26:06,439: WARNING/MainProcess] File "D:\code\python\openrem\openrem\remapp\exports\export_common.py", line 549, in write_export task.filename.save(filename, File(temp_file)) [2019-12-10 08:26:06,440: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\django\db\models\fields\files.py", line 87, in save self.name = self.storage.save(name, content, max_length=self.field.max_length) [2019-12-10 08:26:06,440: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\django\core\files\storage.py", line 52, in save return self._save(name, content) [2019-12-10 08:26:06,440: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\django\core\files\storage.py", line 274, in _save _file.write(chunk) [2019-12-10 08:26:06,440: WARNING/MainProcess] File "C:\Python37\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] [2019-12-10 08:26:06,440: WARNING/MainProcess] UnicodeEncodeError: 'charmap' codec can't encode characters in position 399-402: character maps to <undefined> [2019-12-10 08:26:06,440: WARNING/MainProcess] During handling of the above exception, another exception occurred: [2019-12-10 08:26:06,441: WARNING/MainProcess] Traceback (most recent call last): [2019-12-10 08:26:06,441: WARNING/MainProcess] File "C:\Python37\lib\logging\__init__.py", line 1028, in emit stream.write(msg + self.terminator) [2019-12-10 08:26:06,441: WARNING/MainProcess] File "C:\Python37\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] [2019-12-10 08:26:06,441: WARNING/MainProcess] UnicodeEncodeError: 'charmap' codec can't encode characters in position 581-584: character maps to <undefined> [2019-12-10 08:26:06,441: WARNING/MainProcess] Call stack: [2019-12-10 08:26:06,443: WARNING/MainProcess] File "C:\Python37\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) [2019-12-10 08:26:06,443: WARNING/MainProcess] File "C:\Python37\lib\runpy.py", line 85, in _run_code exec(code, run_globals) [2019-12-10 08:26:06,443: WARNING/MainProcess] File "c:\pythonVirtualEnvs\openrem-37\Scripts\celery.exe\__main__.py", line 7, in <module> sys.exit(main()) [2019-12-10 08:26:06,443: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\__main__.py", line 16, in main _main() [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\celery.py", line 322, in main cmd.execute_from_commandline(argv) [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\celery.py", line 496, in execute_from_commandline super(CeleryCommand, self).execute_from_commandline(argv))) [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\base.py", line 275, in execute_from_commandline return self.handle_argv(self.prog_name, argv[1:]) [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\celery.py", line 488, in handle_argv return self.execute(command, argv) [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\celery.py", line 420, in execute ).run_from_argv(self.prog_name, argv[1:], command=argv[0]) [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\worker.py", line 223, in run_from_argv return self(*args, **options) [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\base.py", line 238, in __call__ ret = self.run(*args, **kwargs) [2019-12-10 08:26:06,445: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\worker.py", line 258, in run worker.start() [2019-12-10 08:26:06,445: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\worker.py", line 205, in start self.blueprint.start(self) [2019-12-10 08:26:06,445: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bootsteps.py", line 119, in start step.start(parent) [2019-12-10 08:26:06,445: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bootsteps.py", line 369, in start return self.obj.start() [2019-12-10 08:26:06,445: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\consumer\consumer.py", line 317, in start blueprint.start(self) [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bootsteps.py", line 119, in start step.start(parent) [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\consumer\consumer.py", line 593, in start c.loop(*c.loop_args()) [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\loops.py", line 121, in synloop connection.drain_events(timeout=2.0) [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\kombu\connection.py", line 315, in drain_events return self.transport.drain_events(self.connection, **kwargs) [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\kombu\transport\pyamqp.py", line 103, in drain_events return connection.drain_events(**kwargs) [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\connection.py", line 505, in drain_events while not self.blocking_read(timeout): [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\connection.py", line 511, in blocking_read return self.on_inbound_frame(frame) [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\method_framing.py", line 79, in on_frame callback(channel, msg.frame_method, msg.frame_args, msg) [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\connection.py", line 518, in on_inbound_method method_sig, payload, content, [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\abstract_channel.py", line 145, in dispatch_method listener(*args) [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\channel.py", line 1615, in _on_basic_deliver fun(msg) [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\kombu\messaging.py", line 624, in _receive_callback return on_m(message) if on_m else self.receive(decoded, message) [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\consumer\consumer.py", line 567, in on_task_received callbacks, [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\strategy.py", line 200, in task_message_handler handle(req) [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\worker.py", line 228, in _process_task req.execute_using_pool(self.pool) [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\request.py", line 532, in execute_using_pool correlation_id=task_id, [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\concurrency\base.py", line 155, in apply_async **options) [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\concurrency\base.py", line 31, in apply_target ret = target(*args, **kwargs) [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 549, in _fast_trace_task uuid, args, kwargs, request, [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 396, in trace_task I, R, state, retval = on_error(task_request, exc, uuid) [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 338, in on_error task, request, eager=eager, call_errbacks=call_errbacks, [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 172, in handle_error_state call_errbacks=call_errbacks) [2019-12-10 08:26:06,449: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 226, in handle_failure self._log_error(task, req, einfo) [2019-12-10 08:26:06,449: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 256, in _log_error extra={'data': context})
-
Messy hack for now, needs testing on Windows server before formalising. Benefit of Excel knowing to open in UTF-8. Refs
#797→ <<cset eacc5fd77f4e>>
-
Hi @David Platten - could you try this out again please on a Windows server. I’ve worked out it works on Windows without the Django component - I’m trying to implement it within the Django FileField. Pipeline failure is something to do with the coverage file and UTF-8, not sure what. But the tests pass!
-
reporter Thanks Ed. I’ve tested a mammography csv export on my Windows system with the current code.
The export succeeds, and there are no blank rows, nor are the the values surrounded in b’'. However, the first cell contains “Institution” rather than just “Institution”
-
That is presumably the BOM.
Can you send me a copy of the file? I want to see how it is formatted at the byte level…
Also, how did you open the file - just double click, or importing to Excel somehow?
-
Thanks David.
When downloaded from the interface, hex
ef bb bf
gets replaced by hexc3 af c2 bb c2 bf
. The former is the byte order marking that tells Excel the file is in UTF-8.I’ll take a look at the download options.
Have you tried an export with the Chinese characters or other non-ASCII characters?
-
reporter Just e-mailed you re the Chinese character export - the export file in my media\exports folder is perfect; the one downloaded from OpenREM via the URL on the exports page is mashed.
-
reporter @Ed McDonagh changing line 472 in exportviews to make the file open in binary mode fixes the problem:
file_wrapper = FileWrapper(open(file_path, mode='rb'))
-
Excellent. I’ll see about factoring out the changes and implementing it for all the CSV exports later.
Thank you!
-
Moved csv creation back into export_common.py. Added @dplatten change to download as binary in exportviews.py and moved imports to top. Refs
#797→ <<cset 76513a76a200>>
-
Made csv export change for Fluoro and OpenSkin. Refs
#797→ <<cset c45a1fda8989>>
-
Made csv export change for CT and DX. Refs
#797→ <<cset 894af0236f72>>
-
Codacy changes, tidying up. Refs
#797→ <<cset cde9355394f2>>
-
Fixing reason for coverage dropping to 13%, not sure it will pass. Updated requirements for docs. Refs
#797→ <<cset c6b424f95f30>>
-
Moving the coverage xml command earlier. Refs
#797→ <<cset 7de5a0776bae>>
-
Hi @David Platten - I’ve made all the csv changes, I’m just trying to sort out the coverage and pipeline errors.
Can you check the other modalities on WIndows so I can merge when I’ve sorted the problems?
-
reporter I can confirm that csv export works as expected from each modality.
-
Thanks David. Still struggling with coverage and failed pipelines! Without using parallel, tests take several minutes with 59% coverage. With parallel, they take a minute or so and have 13% coverage! Using the
--parallel-mode
doesn’t in itself fix it, but it’s part of the solution! Either way, tox is failing in the pipeline.I’ll get there!
-
- changed status to resolved
Merged in issue797csvExportsHaveBsInThem (pull request #343)
Fixes
#797and also gets pipelines and coverage working again, hopefully.→ <<cset edb32dd1e0a5>>
- Log in to comment
This affects all modalities when the “Export to CSV” option is chosen. The exports look like the single line below. The headings are OK, but all of the data rows have the values in single quotes, with a b in front:
Institution,Manufacturer,Model name,Station name,Display name,Accession number,Operator,Study date,Study time,Age,Sex,Height,Mass (kg),Test patient?,Study description,Requested procedure,Study Comments,No. events,DLP total (mGy.cm),E1 Protocol,E1 Type,E1 Exposure time,E1 Scanning length,E1 Slice thickness,E1 Total collimation,E1 Pitch,E1 No. sources,E1 CTDIvol,E1 Phantom,E1 DLP,E1 S1 name,E1 S1 kVp,E1 S1 max mA,E1 S1 mA,E1 S1 Exposure time/rotation,E1 S2 name,E1 S2 kVp,E1 S2 max mA,E1 S2 mA,E1 S2 Exposure time/rotation,E1 mA Modulation type,E1 Dose check details,E1 Comments,E2 Protocol,E2 Type,E2 Exposure time,E2 Scanning length,E2 Slice thickness,E2 Total collimation,E2 Pitch,E2 No. sources,E2 CTDIvol,E2 Phantom,E2 DLP,E2 S1 name,E2 S1 kVp,E2 S1 max mA,E2 S1 mA,E2 S1 Exposure time/rotation,E2 S2 name,E2 S2 kVp,E2 S2 max mA,E2 S2 mA,E2 S2 Exposure time/rotation,E2 mA Modulation type,E2 Dose check details,E2 Comments,E3 Protocol,E3 Type,E3 Exposure time,E3 Scanning length,E3 Slice thickness,E3 Total collimation,E3 Pitch,E3 No. sources,E3 CTDIvol,E3 Phantom,E3 DLP,E3 S1 name,E3 S1 kVp,E3 S1 max mA,E3 S1 mA,E3 S1 Exposure time/rotation,E3 S2 name,E3 S2 kVp,E3 S2 max mA,E3 S2 mA,E3 S2 Exposure time/rotation,E3 mA Modulation type,E3 Dose check details,E3 Comments,E4 Protocol,E4 Type,E4 Exposure time,E4 Scanning length,E4 Slice thickness,E4 Total collimation,E4 Pitch,E4 No. sources,E4 CTDIvol,E4 Phantom,E4 DLP,E4 S1 name,E4 S1 kVp,E4 S1 max mA,E4 S1 mA,E4 S1 Exposure time/rotation,E4 S2 name,E4 S2 kVp,E4 S2 max mA,E4 S2 mA,E4 S2 Exposure time/rotation,E4 mA Modulation type,E4 Dose check details,E4 Comments
b'LINCOLN COUNTY HOSPITAL',b'TOSHIBA',b'Aquilion/LB',b'AQ16LB_SCAN',b'Lincoln RT sim',b’RWDBLAH',b'',b'2019-05-09',b'13:26:37',b'65.200',b'F',b'',b'',b'',b'CT Planning Scan Breast 2 F',b'',b'',b'3',b'97.90000000',b'RTP Breast',b'Constant Angle Acquisition',b'3.97000000',b'2.00000000',b'2.00000000',b'2.00000000',b'',b'1',b'',b'',b'',b'1',b'120.00000000',b'100.00000000',b'100.00000000',b'',b'',b'',b'',b'',b'',b'',b'',b'',b'RTP Breast',b'Constant Angle Acquisition',b'3.96000000',b'2.00000000',b'2.00000000',b'2.00000000',b'',b'1',b'',b'',b'',b'1',b'120.00000000',b'100.00000000',b'100.00000000',b'',b'',b'',b'',b'',b'',b'',b'',b'',b'RTP Breast',b'Spiral Acquisition',b'5.90000000',b'193.00000000',b'3.00000000',b'16.00000000',b'0.93800000',b'1',b'5.50000000',b'32 cm',b'97.90000000',b'1',b'120.00000000',b'111.00000000',b'91.00000000',b'0.50000000',b'',b'',b'',b'',b'',b'3D',b'',b''