Only first 13,567 records extracted, then everything's zero/null/empty

Issue #12 new

Former user created an issue 2015-07-27

I have a *.sas7bdat file from a customer that's pretty big - 29G in its original form, containing 2,528,692 records. However, when I process it I get good data for the first 13,567 rows, then all values are zeros / nulls / empty strings for the remaining 2,515,125 rows. The types do look correct though. My processing script is as follows:

from sas7bdat import SAS7BDAT
import sys
import csv

csvwriter = csv.writer(sys.stdout)

with SAS7BDAT(sys.argv[1]) as f:
    for row in f:
        csvwriter.writerow(row)

The only warning/error I get when processing is a single instance of this:

[bmsdata_2015_06_25.sas7bdat] column count mismatch

Does this ring any bells? Anything significant about 13,567? It's not close to a power of 2 or anything I can think of. I asked the customer to re-confirm there's data all the way through the file and they say it looks fine.

Comments (5)

Former user Account Deleted
Just realized that https://pyhacker.com/pages/sas7bdat.html says Python 2.7+ is required, but https://bitbucket.org/jaredhobbs/sas7bdat says only 2.6+ is required. I'm using 2.6. Might that explain it?
- 2015-07-27T15:58:50+00:00
Kerby A Shedden
Old issue, but if you're still there, is the file compressed? What is the value of f.properties.compression?

We just fixed a bug that affected RLE compressed files (compression = SASYZCRL), and there are known issues with RDC compression (compression = SASYZCR2) that I think I can fix in a few weeks.
- 2016-01-07T01:44:47+00:00
Eric An
Hi @kshedden - did you ever get around to looking at the issues with RDC compression? I'm using version 2.0.7 and the decompress_row method is raising an error on my file with unknown markers - 6,7,9,10,11, etc.
- 2016-08-01T17:35:50+00:00
Kerby A Shedden
I ported this over to Pandas ... have you tried the pandas version (pandas.read_sas)?

I think I added a few RDC codes there, but I think more are still missing.

Can you share the data?

Other options are wizard and https://github.com/kshedden/datareader

Kerby
- 2016-08-02T09:45:14+00:00
Eric An
Kerby - thanks much!! pandas.read_sas worked well.

The data was here:
- 2016-08-02T16:56:02+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: new

Votes: 0

Watchers: 2