Only first 13,567 records extracted, then everything's zero/null/empty

Issue #12 new
Former user created an issue

I have a *.sas7bdat file from a customer that's pretty big - 29G in its original form, containing 2,528,692 records. However, when I process it I get good data for the first 13,567 rows, then all values are zeros / nulls / empty strings for the remaining 2,515,125 rows. The types do look correct though. My processing script is as follows:

from sas7bdat import SAS7BDAT
import sys
import csv

csvwriter = csv.writer(sys.stdout)

with SAS7BDAT(sys.argv[1]) as f:
    for row in f:
        csvwriter.writerow(row)

The only warning/error I get when processing is a single instance of this:

[bmsdata_2015_06_25.sas7bdat] column count mismatch

Does this ring any bells? Anything significant about 13,567? It's not close to a power of 2 or anything I can think of. I asked the customer to re-confirm there's data all the way through the file and they say it looks fine.

Comments (5)

  1. Kerby A Shedden

    Old issue, but if you're still there, is the file compressed? What is the value of f.properties.compression?

    We just fixed a bug that affected RLE compressed files (compression = SASYZCRL), and there are known issues with RDC compression (compression = SASYZCR2) that I think I can fix in a few weeks.

  2. Eric An

    Hi @kshedden - did you ever get around to looking at the issues with RDC compression? I'm using version 2.0.7 and the decompress_row method is raising an error on my file with unknown markers - 6,7,9,10,11, etc.

  3. Log in to comment