Radiographic csv export is full of extra line breaks and each field is surrounded in b''

Issue #797 resolved
David Platten created an issue

This may also affect other modalities.

Comments (32)

  1. David Platten reporter

    This affects all modalities when the “Export to CSV” option is chosen. The exports look like the single line below. The headings are OK, but all of the data rows have the values in single quotes, with a b in front:

    Institution,Manufacturer,Model name,Station name,Display name,Accession number,Operator,Study date,Study time,Age,Sex,Height,Mass (kg),Test patient?,Study description,Requested procedure,Study Comments,No. events,DLP total (mGy.cm),E1 Protocol,E1 Type,E1 Exposure time,E1 Scanning length,E1 Slice thickness,E1 Total collimation,E1 Pitch,E1 No. sources,E1 CTDIvol,E1 Phantom,E1 DLP,E1 S1 name,E1 S1 kVp,E1 S1 max mA,E1 S1 mA,E1 S1 Exposure time/rotation,E1 S2 name,E1 S2 kVp,E1 S2 max mA,E1 S2 mA,E1 S2 Exposure time/rotation,E1 mA Modulation type,E1 Dose check details,E1 Comments,E2 Protocol,E2 Type,E2 Exposure time,E2 Scanning length,E2 Slice thickness,E2 Total collimation,E2 Pitch,E2 No. sources,E2 CTDIvol,E2 Phantom,E2 DLP,E2 S1 name,E2 S1 kVp,E2 S1 max mA,E2 S1 mA,E2 S1 Exposure time/rotation,E2 S2 name,E2 S2 kVp,E2 S2 max mA,E2 S2 mA,E2 S2 Exposure time/rotation,E2 mA Modulation type,E2 Dose check details,E2 Comments,E3 Protocol,E3 Type,E3 Exposure time,E3 Scanning length,E3 Slice thickness,E3 Total collimation,E3 Pitch,E3 No. sources,E3 CTDIvol,E3 Phantom,E3 DLP,E3 S1 name,E3 S1 kVp,E3 S1 max mA,E3 S1 mA,E3 S1 Exposure time/rotation,E3 S2 name,E3 S2 kVp,E3 S2 max mA,E3 S2 mA,E3 S2 Exposure time/rotation,E3 mA Modulation type,E3 Dose check details,E3 Comments,E4 Protocol,E4 Type,E4 Exposure time,E4 Scanning length,E4 Slice thickness,E4 Total collimation,E4 Pitch,E4 No. sources,E4 CTDIvol,E4 Phantom,E4 DLP,E4 S1 name,E4 S1 kVp,E4 S1 max mA,E4 S1 mA,E4 S1 Exposure time/rotation,E4 S2 name,E4 S2 kVp,E4 S2 max mA,E4 S2 mA,E4 S2 Exposure time/rotation,E4 mA Modulation type,E4 Dose check details,E4 Comments

    b'LINCOLN COUNTY HOSPITAL',b'TOSHIBA',b'Aquilion/LB',b'AQ16LB_SCAN',b'Lincoln RT sim',b’RWDBLAH',b'',b'2019-05-09',b'13:26:37',b'65.200',b'F',b'',b'',b'',b'CT Planning Scan Breast 2 F',b'',b'',b'3',b'97.90000000',b'RTP Breast',b'Constant Angle Acquisition',b'3.97000000',b'2.00000000',b'2.00000000',b'2.00000000',b'',b'1',b'',b'',b'',b'1',b'120.00000000',b'100.00000000',b'100.00000000',b'',b'',b'',b'',b'',b'',b'',b'',b'',b'RTP Breast',b'Constant Angle Acquisition',b'3.96000000',b'2.00000000',b'2.00000000',b'2.00000000',b'',b'1',b'',b'',b'',b'1',b'120.00000000',b'100.00000000',b'100.00000000',b'',b'',b'',b'',b'',b'',b'',b'',b'',b'RTP Breast',b'Spiral Acquisition',b'5.90000000',b'193.00000000',b'3.00000000',b'16.00000000',b'0.93800000',b'1',b'5.50000000',b'32 cm',b'97.90000000',b'1',b'120.00000000',b'111.00000000',b'91.00000000',b'0.50000000',b'',b'',b'',b'',b'',b'3D',b'',b''

  2. David Platten reporter

    If it is an xlsx file then use binary mode; otherwise use text mode. This removes the spurious b characters from the csv exports. However, there remains an extra line break after every row. References issue #797

    → <<cset 1acb526b8437>>

  3. David Platten reporter

    @Ed McDonagh : very happy for you to take this one on. I suspect it may be a Windows thing, so I’ll happily test any changes that you make.

  4. David Platten reporter

    I’ve just tried exporting some radiographic data to a csv file. There are no b characters surrounding the data, but there are blank rows between each row of data.

    I also tried exporting one of the mammography test studies that includes unicode characters. This resulted in an error in my default.log:

    [2019-12-06 14:44:57,589: WARNING/MainProcess] --- Logging error ---
    [2019-12-06 14:44:57,590: WARNING/MainProcess] Traceback (most recent call last):
    [2019-12-06 14:44:57,590: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
    [2019-12-06 14:44:57,591: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 641, in protected_call
    return self.run(*args, **kwargs)
    [2019-12-06 14:44:57,591: WARNING/MainProcess] File "D:\code\python\openrem\openrem\remapp\exports\mg_export.py", line 303, in exportMG2excel
    writer.writerow([str(data_string) for data_string in series_data])
    [2019-12-06 14:44:57,591: WARNING/MainProcess] File "C:\Python37\lib\tempfile.py", line 481, in func_wrapper
    return func(*args, **kwargs)
    [2019-12-06 14:44:57,592: WARNING/MainProcess] File "C:\Python37\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
    [2019-12-06 14:44:57,592: WARNING/MainProcess] UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: character maps to <undefined>
    [2019-12-06 14:44:57,592: WARNING/MainProcess] During handling of the above exception, another exception occurred:
    [2019-12-06 14:44:57,593: WARNING/MainProcess] Traceback (most recent call last):
    [2019-12-06 14:44:57,593: WARNING/MainProcess] File "C:\Python37\lib\logging_init_.py", line 1028, in emit
    stream.write(msg + self.terminator)
    [2019-12-06 14:44:57,593: WARNING/MainProcess] File "C:\Python37\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
    [2019-12-06 14:44:57,593: WARNING/MainProcess] UnicodeEncodeError: 'charmap' codec can't encode characters in position 178-181: character maps to <undefined>

  5. Ed McDonagh

    Thanks David. That’s useful feedback - Windows Python is obviously trying to to stick to stupid code page 1252 which is mostly ASCII plus Latin-1. Which won’t work very well for Chinese characters!

    I’ll see if I can work out where we need to specify utf-8 without turning the strings into bytecode.

  6. Ed McDonagh

    Specifying encoding of temporary file. On Ubuntu, defaults to UTF-8, maybe not on Windows. Refs #797 Left print statement in, should confirm format. With or without explicit, prints temp_file is <_io.TextIOWrapper name=35 mode='w+' encoding='utf-8'> on Ubuntu.

    → <<cset d5a708160617>>

  7. Ed McDonagh

    @David Platten - doing a minimal test in Python 3.8 on Windows suggests that adding the encoding to the TemporaryFile fixes that error, though I didn’t see it all the way through to writing to disk.

    Are you in a position to test this version of the code on a Windows machine?

  8. David Platten reporter

    @Ed McDonagh I’ve tested this version of the code; it still breaks. Full log below:

    [2019-12-10 08:26:06,351: WARNING/MainProcess] temp_file is <tempfile._TemporaryFileWrapper object at 0x00000289B3463CC8>
    [2019-12-10 08:26:06,437: WARNING/MainProcess] --- Logging error ---
    [2019-12-10 08:26:06,439: WARNING/MainProcess] Traceback (most recent call last):
    [2019-12-10 08:26:06,439: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 382, in trace_task
        R = retval = fun(*args, **kwargs)
    [2019-12-10 08:26:06,439: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 641, in __protected_call__
        return self.run(*args, **kwargs)
    [2019-12-10 08:26:06,439: WARNING/MainProcess] File "D:\code\python\openrem\openrem\remapp\exports\mg_export.py", line 347, in exportMG2excel
        write_export(tsk, export_filename, tmpfile, datestamp)
    [2019-12-10 08:26:06,439: WARNING/MainProcess] File "D:\code\python\openrem\openrem\remapp\exports\export_common.py", line 549, in write_export
        task.filename.save(filename, File(temp_file))
    [2019-12-10 08:26:06,440: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\django\db\models\fields\files.py", line 87, in save
        self.name = self.storage.save(name, content, max_length=self.field.max_length)
    [2019-12-10 08:26:06,440: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\django\core\files\storage.py", line 52, in save
        return self._save(name, content)
    [2019-12-10 08:26:06,440: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\django\core\files\storage.py", line 274, in _save
        _file.write(chunk)
    [2019-12-10 08:26:06,440: WARNING/MainProcess] File "C:\Python37\lib\encodings\cp1252.py", line 19, in encode
        return codecs.charmap_encode(input,self.errors,encoding_table)[0]
    [2019-12-10 08:26:06,440: WARNING/MainProcess] UnicodeEncodeError: 'charmap' codec can't encode characters in position 399-402: character maps to <undefined>
    [2019-12-10 08:26:06,440: WARNING/MainProcess] During handling of the above exception, another exception occurred:
    [2019-12-10 08:26:06,441: WARNING/MainProcess] Traceback (most recent call last):
    [2019-12-10 08:26:06,441: WARNING/MainProcess] File "C:\Python37\lib\logging\__init__.py", line 1028, in emit
        stream.write(msg + self.terminator)
    [2019-12-10 08:26:06,441: WARNING/MainProcess] File "C:\Python37\lib\encodings\cp1252.py", line 19, in encode
        return codecs.charmap_encode(input,self.errors,encoding_table)[0]
    [2019-12-10 08:26:06,441: WARNING/MainProcess] UnicodeEncodeError: 'charmap' codec can't encode characters in position 581-584: character maps to <undefined>
    [2019-12-10 08:26:06,441: WARNING/MainProcess] Call stack:
    [2019-12-10 08:26:06,443: WARNING/MainProcess] File "C:\Python37\lib\runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
    [2019-12-10 08:26:06,443: WARNING/MainProcess] File "C:\Python37\lib\runpy.py", line 85, in _run_code
        exec(code, run_globals)
    [2019-12-10 08:26:06,443: WARNING/MainProcess] File "c:\pythonVirtualEnvs\openrem-37\Scripts\celery.exe\__main__.py", line 7, in <module>
        sys.exit(main())
    [2019-12-10 08:26:06,443: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\__main__.py", line 16, in main
        _main()
    [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\celery.py", line 322, in main
        cmd.execute_from_commandline(argv)
    [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\celery.py", line 496, in execute_from_commandline
        super(CeleryCommand, self).execute_from_commandline(argv)))
    [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\base.py", line 275, in execute_from_commandline
        return self.handle_argv(self.prog_name, argv[1:])
    [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\celery.py", line 488, in handle_argv
        return self.execute(command, argv)
    [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\celery.py", line 420, in execute
        ).run_from_argv(self.prog_name, argv[1:], command=argv[0])
    [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\worker.py", line 223, in run_from_argv
        return self(*args, **options)
    [2019-12-10 08:26:06,444: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\base.py", line 238, in __call__
        ret = self.run(*args, **kwargs)
    [2019-12-10 08:26:06,445: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bin\worker.py", line 258, in run
        worker.start()
    [2019-12-10 08:26:06,445: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\worker.py", line 205, in start
        self.blueprint.start(self)
    [2019-12-10 08:26:06,445: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bootsteps.py", line 119, in start
        step.start(parent)
    [2019-12-10 08:26:06,445: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bootsteps.py", line 369, in start
        return self.obj.start()
    [2019-12-10 08:26:06,445: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\consumer\consumer.py", line 317, in start
        blueprint.start(self)
    [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\bootsteps.py", line 119, in start
        step.start(parent)
    [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\consumer\consumer.py", line 593, in start
        c.loop(*c.loop_args())
    [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\loops.py", line 121, in synloop
        connection.drain_events(timeout=2.0)
    [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\kombu\connection.py", line 315, in drain_events
        return self.transport.drain_events(self.connection, **kwargs)
    [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\kombu\transport\pyamqp.py", line 103, in drain_events
        return connection.drain_events(**kwargs)
    [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\connection.py", line 505, in drain_events
        while not self.blocking_read(timeout):
    [2019-12-10 08:26:06,446: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\connection.py", line 511, in blocking_read
        return self.on_inbound_frame(frame)
    [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\method_framing.py", line 79, in on_frame
        callback(channel, msg.frame_method, msg.frame_args, msg)
    [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\connection.py", line 518, in on_inbound_method
        method_sig, payload, content,
    [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\abstract_channel.py", line 145, in dispatch_method
        listener(*args)
    [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\amqp\channel.py", line 1615, in _on_basic_deliver
        fun(msg)
    [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\kombu\messaging.py", line 624, in _receive_callback
        return on_m(message) if on_m else self.receive(decoded, message)
    [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\consumer\consumer.py", line 567, in on_task_received
        callbacks,
    [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\strategy.py", line 200, in task_message_handler
        handle(req)
    [2019-12-10 08:26:06,447: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\worker.py", line 228, in _process_task
        req.execute_using_pool(self.pool)
    [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\worker\request.py", line 532, in execute_using_pool
        correlation_id=task_id,
    [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\concurrency\base.py", line 155, in apply_async
        **options)
    [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\concurrency\base.py", line 31, in apply_target
        ret = target(*args, **kwargs)
    [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 549, in _fast_trace_task
        uuid, args, kwargs, request,
    [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 396, in trace_task
        I, R, state, retval = on_error(task_request, exc, uuid)
    [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 338, in on_error
        task, request, eager=eager, call_errbacks=call_errbacks,
    [2019-12-10 08:26:06,448: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 172, in handle_error_state
        call_errbacks=call_errbacks)
    [2019-12-10 08:26:06,449: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 226, in handle_failure
        self._log_error(task, req, einfo)
    [2019-12-10 08:26:06,449: WARNING/MainProcess] File "c:\pythonvirtualenvs\openrem-37\lib\site-packages\celery\app\trace.py", line 256, in _log_error
        extra={'data': context})
    

  9. Ed McDonagh

    Hi @David Platten - could you try this out again please on a Windows server. I’ve worked out it works on Windows without the Django component - I’m trying to implement it within the Django FileField. Pipeline failure is something to do with the coverage file and UTF-8, not sure what. But the tests pass!

  10. David Platten reporter

    Thanks Ed. I’ve tested a mammography csv export on my Windows system with the current code.

    The export succeeds, and there are no blank rows, nor are the the values surrounded in b’'. However, the first cell contains “Institution” rather than just “Institution”

  11. Ed McDonagh

    That is presumably the BOM.

    Can you send me a copy of the file? I want to see how it is formatted at the byte level…

    Also, how did you open the file - just double click, or importing to Excel somehow?

  12. Ed McDonagh

    Thanks David.

    When downloaded from the interface, hex ef bb bf gets replaced by hexc3 af c2 bb c2 bf. The former is the byte order marking that tells Excel the file is in UTF-8.

    I’ll take a look at the download options.

    Have you tried an export with the Chinese characters or other non-ASCII characters?

  13. David Platten reporter

    Just e-mailed you re the Chinese character export - the export file in my media\exports folder is perfect; the one downloaded from OpenREM via the URL on the exports page is mashed.

  14. David Platten reporter

    @Ed McDonagh changing line 472 in exportviews to make the file open in binary mode fixes the problem:

    file_wrapper = FileWrapper(open(file_path, mode='rb'))
    

  15. Ed McDonagh

    Excellent. I’ll see about factoring out the changes and implementing it for all the CSV exports later.

    Thank you!

  16. Ed McDonagh

    Hi @David Platten - I’ve made all the csv changes, I’m just trying to sort out the coverage and pipeline errors.

    Can you check the other modalities on WIndows so I can merge when I’ve sorted the problems?

  17. Ed McDonagh

    Thanks David. Still struggling with coverage and failed pipelines! Without using parallel, tests take several minutes with 59% coverage. With parallel, they take a minute or so and have 13% coverage! Using the --parallel-mode doesn’t in itself fix it, but it’s part of the solution! Either way, tox is failing in the pipeline.

    I’ll get there!

  18. Log in to comment