UnicodeDecodeError when loading .sav into pandas

Issue #75 new
Oli Dernie created an issue

I have a large SPSS file which I am attempting to load into a pandas dataframe. A number of the columns have different types of special characters, including Chinese and accented.

The documentation suggests doing the below:

with SavReader('greetings.sav', ioUtf8=True) as reader:
    for record in reader:
        print(record[-1])

and my code looks like below:

rawdata = []
with SavReader('largefile.sav', ioUtf8=True) as reader:
    for record in reader:
        try:
            rawdata.append(record)
        except UnicodeDecodeError:
            r = record.decode('latin-1')
            rawdata.append(r.encode('utf-8'))                
 data = pd.DataFrame(raw_data_list)
 data = data.rename(columns=data.loc[0]).iloc[1:]

Running the above, triggers an error on for record in error which looks like this. How can I get around this?

Comments (0)

  1. Log in to comment