Adding support for numpy.nan as sysmis
I think it would be great if there will be support not only None as missing value but also numpy.nan
For now I have to replace numpy.nan objects by None in every record that I write by savWriter.
Test case is
from savReaderWriter import SavWriter
import numpy as np
def main():
test_array = np.array([1,2,3,4,5,6, np.nan])
with SavWriter(savFileName='/tmp/test_base.sav',
varNames=['a'],
varTypes={'a': 0},
ioUtf8=True) as writer:
for record in test_array:
writer.writerow([record])
return 'done'
if __name__ == '__main__':
main()
Comments (8)
-
repo owner -
reporter Hi Albert-Jan,
>>> import sys, savReaderWriter as rw >>> sys.version_info, rw.__version__ (sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0), '3.3.0') >>> import platform >>> platform.platform() 'Linux-3.13.0-32-generic-x86_64-with-Ubuntu-14.04-trusty'
Could you please show descriptives statistics (freqs) of 'a' variable? I also see np.nan as sysmis when I look into the data editor (win-x64, spss v22; I run python code under platform, that Ive described above, but I have spss installed only under windows so I have to copy test_base.sav to windows platform and open it there): but in fact it is not $sysmis, because
fre a.
in the syntax editor outputs
but I expect:
As I can see there is difference between processing None and np.nan.
-
repo owner Hi,
Ah, now I see what you mean. That's annoying indeed. It might be nice to have a parameter similar to
recodeSysmisTo
in SavReader. It is simple, but quite expensive to convert np.nan to sysmis, because you would need go check every value. I will keep this issue open. Meanwhile (you probably are doing this already) you could try something like:In [1]: import numpy as np, savReaderWriter as rw In [2]: arr = np.array([np.nan, 1, np.nan, 666]).reshape(4, 1) In [3]: with rw.SavWriter("somefile.sav", ["v1"], {"v1": 0}) as writer: .....: arr[:] = np.where(np.isnan(arr), writer.sysmis, arr) .....: writer.writerows(arr.tolist()) .....:
Best wishes, Albert-Jan
-
repo owner hmmm, come to think of it: it would be a very small effort to add a method
writearray
which --yes-- writes an array, withnan
values converted into SPSS$sysmis
.def writearray(self, array): """Write a numpy array to a .sav""" for i in range(len( np.where(np.isnan(array), self.sysmis, array) )): record = array[i].tolist() self._pyWriterow(record)
-
reporter Hi @fomcl I think new method for particular data type makes user api more complicated. What if
writerows
will be smarter and will be type-aware?def writerows(self, records): """ This function writes all records.""" if not isinstance(records, (tuple, list, np.array)): raise TypeError('records instance type must be one of list, tuple, numpy.array but got %s' % (type(records), )) if isinstance(records, np.array): for i in range(len( np.where(np.isnan(records), self.sysmis, records) )): record = records[i].tolist() self.writerow(record) if isinstance(records, (list, tuple)): for record in records: self.writerow(record)
-
repo owner - changed status to resolved
fixed issue
#25, added some unittests for this→ <<cset 932499c09ac7>>
-
reporter @fomcl Nice commit!
Also I've noticed another one bug (or maybe it is feature?)
this test will pass
args = ( ["v1", "v2"], dict(v1=0, v2=0) ) desired = [[1.0, 1.0], [1.0, 1.0]] def test_writerows_str(): records = ['11', '11'] savFileName = "output_regular.sav" with srw.SavWriter(savFileName, *args) as writer: writer.writerows(records) with srw.SavReader(savFileName) as reader: actual = reader.all() assert actual == desired, actual
maybe its better to test on whether
records
inwriterows
isinstance(records, collections.Iterable)
and first record must be iterable too? -
repo owner Thanks :) I followed your advice and changed the code. A TypeError was already raised elsewhere in the code for the (very plausible) scenario that you described. It's in 732c4e90bfd80b6d6d81685868a8277a78e0dcb5
- Log in to comment
Hi,
What version of savReaderWriter and Python are you using?
The code you gave runs without errors on Windows 7 32 bit. In SPSS, the np.nan shows up as $sysmis (blank) in the data editor.
Best wishes, Albert-Jan