Issue with SAS7BDAT class

Issue #42 resolved
Former user created an issue

Hello,

I have an issue with sas7bdat module (I use SAS7BDAT class to read SAS datasets - “from sas7bdat import SAS7BDAT”) that means some of datasets are badly encoded (I think that, maybe I am wrong). This happens not every time I open sas7bdat dataset. I found that 2 datasets from my library are tricky for SAS7BDAT. Module opens first, tricky dataset but returns many of that lines: attachment - attachment 1

When I print data (into cmd window) it looks like this (grey rectangle covers sensitive data; that cells look good) : attachment - attachment 2

This is the sample of data. Capture shows only first 5 columns and 6 rows. I can see not natural “gap” between values of 3rd and 4th column. The first row has the gap before “GP MDI 14.4 …”, third and sixth row has the same gap. It is alarming.

Let’s focus on the second dataset. When I try to open it using the same code as in previous case I get this exception: attachment - attachment 3

In this case it is hard to find anything responsible for this exceptions.

I decided to read those tricky datasets using pandas module. I used code: sas = pd.read_sas("ABCD.sas7bdat", encoding='iso-8859-1') for both tricky datasets. Pandas works well with these 2 datasets. It takes more time but it works. Pandas does not give exceptions and all observations from decoded SAS datasets are appropriate. There is no mismatches or issues when pandas works. I thought these 2 datasets are broken but pandas encoded them well, SAS7BDAT doen not. My question is simple, could you find a solution for my issue with SAS7BDAT? I can not share these tricky datasets with you so this may be a problem with solving this case. I guess there may be a problem with DECOMPRESSORS (I am not sure about that, I am only guessing).

Are you able to help me? Did you have one similar issue? I tried various encodings and it did not work.

Thanks in advance!

Comments (2)

  1. Mateusz Majtczak

    Thanks a lot for that! Module decompress data correctly. I have found another issue and maybe it is related to issue described above. During program execution suddenly program stops and return 'TypeError'. This happens all the time on exacly the same dataset. Below I enclosed screen of cmd.

    01022019_error_screen.JPG

    Do you have any solution for that? Could you please solve this issue too?

    Thanks in advance!

  2. Log in to comment