UnicodeDecodeError

Issue #13 resolved
F P created an issue

Hi Jared,

First of all thanks a lot for the hard work on this converter. I've tried it with a few exported SAS datasets and they all seem to be getting up to 99.X% and aborting with this error just at the end so it is likely to be the 'footer' in the file that contains this issue: " Traceback (most recent call last):

File "/usr/lib64/python2.6/logging/init.py", line 799, in emit

stream.write(fs % msg.encode("UTF-8"))

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 40: ordinal not in range(128) "

This is basically happening because you can sometimes get rubbish (non-printable) characters from SAS so the industry standard is usually to turn it into an inverted question mark “¿” when it cannot recognise what the source was. It would be great to do this replacement instead of erroring out so that the end result could be checked against this and maybe one line/column is ignored as opposed to erroring out the whole file.

Would it be possible to ‘capture the error’ above and replace that character with this instead, and resume the conversion?

I would love to attach the files but the seem to contain sensitive info so rather not if that's ok. Happy to be pointed out where this error handling to take place to change it myself and possibly branch it out/test it/check it back in as an alternative.

Thanks

Comments (1)

  1. Log in to comment