Failing parsing of comment field for Siemens RF

Issue #809 wontfix
Erlend Andersen created an issue fails parsing of comment field for Siemens RF when string contains non ascii characters, such as øæå.


Traceback (most recent call last):
File "C:\Users\erlend\source\repos\openrem\", line 31, in <module>
File "C:\Users\erlend\WinPython-32bit-\python-2.7.10\lib\site-packages\celery\", line 188, in __call__
return self._get_current_object()(*a, **kw)
File "C:\Users\erlend\WinPython-32bit-\python-2.7.10\lib\site-packages\celery\app\", line 428, in __call__
return*args, **kwargs)
File "C:\Users\erlend\source\repos\openrem\openrem\remapp\extractors\", line 1661, in rdsr
File "C:\Users\erlend\source\repos\openrem\openrem\remapp\extractors\", line 1516, in _rdsr2db
_generalstudymoduleattributes(dataset, g, ch)
File "C:\Users\erlend\source\repos\openrem\openrem\remapp\extractors\", line 1352, in _generalstudymoduleattributes
_projectionxrayradiationdose(dataset, g, 'projection', ch)
File "C:\Users\erlend\source\repos\openrem\openrem\remapp\extractors\", line 1114, in _projectionxrayradiationdose
_irradiationeventxraydata(cont, proj, ch, dataset)
File "C:\Users\erlend\source\repos\openrem\openrem\remapp\extractors\", line 643, in _irradiationeventxraydata
_irradiationeventxraysourcedata(dataset, event, ch)
File "C:\Users\erlend\source\repos\openrem\openrem\remapp\extractors\", line 453, in _irradiationeventxraysourcedata
source.ii_field_size = fromstring(source.irradiation_event_xray_data.comment).find('iiDiameter').get('SRData')
File "C:\Users\erlend\WinPython-32bit-\python-2.7.10\lib\site-packages\defusedxml\", line 131, in fromstring
File "C:\Users\erlend\WinPython-32bit-\python-2.7.10\lib\xml\etree\", line 1640, in feed
self._parser.Parse(data, 0)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 120: ordinal not in range(128)

Comments (10)

  1. Erlend Andersen reporter


    This was on the release-0.10.0b2 branch, the same error also happens on our 0.9.1 server.

  2. Ed McDonagh

    Thanks. I’ll try and replicate it then look at your fix - it doesn’t initially look nice to me as we should be able to handle non-ASCII everywhere, but potentially it is a problem with ElementTree maybe?

  3. Erlend Andersen reporter

    It appears OpenREM handles non ASCII characters except when parsing a string with ElementTree. It only occurs on Siemens RF when parsing the comment field.

    The string triggering the error:

    <FluoroData><SceneCounter SRData=" "/><ExtendedAcqMode SRData=""/><PeriDynaStepCount SRData=" "/><SceneName SRData="FL låg Angio 1"/><AngulationStep SRData=" "/><Dose SRData=" "/><CurrentTimeProduct SRData="0.956080"/><TubeFocalSpot SRData="small"/><iiDiameter SRData="480"/><Time SRData="02-Nov-18 12:39:11"/><IsPuck SRData="False"/><SceneTime SRData="1"/><FrameRate SRData="4.000000"/><NumOfFrames SRData="4"/><MaxSkinEntranceDose SRData=" "/></FluoroData>

    Note the “FL låg Angio 1”.

    The proposed fix is a but crude, and perhaps it’s a better way. I’ll try to anonymize the RDSR file if you are interested in it?

  4. Erlend Andersen reporter

    Thanks David, but I could not get the suggested solution in the accepted answer to work.

    The type of comment (in source.irradiation_event_xray_data.comment) is unicode and is somehow converted (encoded) to a python str type in the fromstring method. I suspect XMLParser used by ElementTree expect that proper encoding is embedded in the xml string, in this case it’s not and default ascii encoding is used.

    BTW OpenREM handles unicode all other places except for the XML parsing bits in our experience.

  5. Ed McDonagh

    I wonder if maybe we leave it as a local fix for 0.10 for anyone it affects; I’m not sure it warrants a point release on its own? The problem will just go away with the next release, v1.0.

    Thoughts anyone?

  6. Erlend Andersen reporter

    If OpenREM migrates to python3 in v1.0 the fix is not needed. I suspect this affects very few people so no need for an extra release. I suggest to mark this issue as resolved and reject the pull request.

  7. Log in to comment