dataDictionary "invalid start byte" on Ubuntu/64 bit

I'm running into a decode error when using UTF-8 mode on Linux and hope you can help out.

I'm attempting to execute the below code on the attached data file:

import savReaderWriter as srw

h = srw.SavHeaderReader("/home/user/Desktop/July 2004 Selective Exposure.sav", ioUtf8=True)

d = h.dataDictionary()

On Windows/64 bit, it works perfectly. On Ubuntu/64 bit, it returns the following:

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    d = h.dataDictionary()
  File "/home/user/src/savreaderwriter/savReaderWriter/savHeaderReader.py", line 105, in dataDictionary
    metadata = dict([(item, getattr(self, item)) for item in items])
  File "/home/user/src/savreaderwriter/savReaderWriter/header.py", line 62, in wrapper
    uresult[uS(k)][uS(i)] = uS(uL(j))
  File "/home/user/src/savreaderwriter/savReaderWriter/header.py", line 48, in <lambda>
    uS = lambda x: x.decode("utf-8") if isinstance(x, bytes_) else x
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 3: invalid start byte

Think the culprit is an apostrophe character on the values of variables like "iraq8a". I've tried on both the bleeding-edge repository version and the pip install version of savReaderWriter.

Any suggestions that don't involve modifying the original dataset? Re-saving in SPSS seems to do away with the problem, but I've got hundreds of datasets to manage so am looking for something a little more programmatic.

Love the tool, thanks for your help!

Comments (3)