# Source

In the documentation below, the associated SPSS commands are given in CAPS. See also the IBM SPSS Statistics Command Syntax Reference.pdf for info about SPSS syntax.

## Installation

### Platforms

As shown in Table 0 below, this program works for Linux (incl. z/Linux), Windows, Mac OS (32 and 64 bit), AIX-64, HP-UX and Solaris-64. However, it has only been tested on Linux 32 (Ubuntu and Mint), Windows (mostly on Windows XP 32, but also a few times on Windows 7 64), and Mac OS (with an earlier version of savReaderWriter). The other OSs are entirely untested.

### Setup

The program can be installed by running:

python setup.py install


Or alternatively:

pip install savReaderWriter


To get the 'bleeding edge' version straight from the repository do:

pip install -U -e git+https://bitbucket.org/fomcl/savreaderwriter.git#egg=savreaderwriter

• The savReaderWriter program is now self-contained. That is, the IBM SPSS I/O modules now all load by themselves, without any changes being required anymore to PATH, LD_LIBRARY_PATH and equivalents. Also, no extra .deb files need to be installed anymore (i.e. no dependencies).
• savReaderWriter now uses version 21.0.0.1 (i.e., Fixpack 1) of the I/O module.

### Optional features

cWriterow. The cWriterow package is a faster Cython implementation of the pyWriterow method (66 % faster). To install it, you need Cython and run setup.py in the cWriterow folder:

easy_install cython
python setup.py build_ext --inplace


psyco. The psyco package may be installed to speed up reading (66 % faster).

numpy. The numpy package should be installed if you intend to use array slicing (e.g data[:2,2:4]).

## :mod:SavWriter -- Write SPSS system files

Typical use:

savFileName = "someFile.sav"
records = [['Test1', 1, 1], ['Test2', 2, 1]]
varNames = ['var1', 'v2', 'v3']
varTypes = {'var1': 5, 'v2': 0, 'v3': 0}
with SavWriter(savFileName, varNames, varTypes) as writer:
for record in records:
writer.writerow(record)


## :mod:SavReader -- Read SPSS system files

Warning

Once a file is open, ioUtf8 and ioLocale can not be changed. The same applies after a file could not be successfully closed. Always ensure a file is closed by calling __exit__() (i.e., using a context manager) or close() (in a try - finally suite)

Typical use:

savFileName = "someFile.sav"
process(line)


Use of __getitem__ and other methods:

data = SavReader(savFileName, idVar="id")
with data:
print "The file contains %d records" % len(data)
print unicode(data)  # prints a file report
print "The first six records look like this\n", data[:6]
print "The first record looks like this\n", data[0]
print "The last four records look like this\n", data.tail(4)
print "The first five records look like this\n", data.head()
allData = data.all()
print "First column:\n", data[..., 0]  # requires numpy
print "Row 4 & 5, first three cols\n", data[4:6, :3]  # requires numpy
## ... Do a binary search for records --> idVar


## :mod:SavHeaderReader -- Read SPSS file meta data

Warning

The program calls spssFree* C functions to free memory allocated to dynamic arrays. This previously sometimes caused segmentation faults. This problem now appears to be solved. However, if you do experience segmentation faults you can set segfaults=True in __init__.py. This will prevent the spssFree* functions from being called (and introduce a memory leak).

Typical use:

with SavHeaderReader(savFileName) as header:
print report


### Formats

SPSS knows just two different data types: string and numerical data. These data types can be formatted (displayed) by SPSS in several different ways. Format names are followed by total width (w) and an optional number of decimal positions (d). Table 1 below shows a complete list of all the available formats.

String data can be alphanumeric characters (A format) or the hexadecimal representation of alphanumeric characters (AHEX format). The maximum size of a string value is 32767 bytes. String formats do not have any decimal positions (d). Currently, SavReader maps both of the string formats to a regular alphanumeric string format.

Numerical data formats include the default numeric format (F), scientific notation (E) and zero-padded (N). For example, a format of F5.2 represents a numeric value with a total width of 5, including two decimal positions and a decimal indicator. For all numeric formats, the maximum width (w) is 40. For numeric formats where decimals are allowed, the maximum number of decimals (d) is 16. SavReader does not format numerical values, except for the N format, and dates/times (see under Date formats). The N format is a zero-padded value (e.g. SPSS format N8 is formatted as Python format %08d, e.g. '00001234'). For most numerical values, formatting means loss of precision. For instance, formatting SPSS F5.3 to Python %5.3f means that only the first three digits are retained. In addition, formatting incurs additional processing time. Finally, e.g. appending a percent sign to a value (PCT format) renders the value less useful for calculations.

Note. The User Programmable currency formats (CCA, CCB, CCC and CCD) cannot be defined or written by SavWriter and existing definitions cannot be read by SavReader.

### Date formats

Dates in SPSS. Date formats are a group of numerical formats. SPSS stores dates as the number of seconds since midnight, October 14, 1582 (the beginning of the Gregorian calendar). In SPSS, the user can make these seconds human-readable by giving them a print and/or write format (usually these are set at the same time using the FORMATS command). Examples of such display formats include ADATE (American date, mmddyyyy) and EDATE (European date, ddmmyyyy), SDATE (Asian/Sortable date, yyyymmdd) and JDATE (Julian date).

Reading dates. SavReader deliberately does not honor the different SPSS date display formats, but instead tries to convert them to the more practical (sortable) and less ambibiguous ISO 8601 format (yyyy-mm-dd). You can easily change this behavior by modifying the supportedDates dictionary in __init__.py. Table 2 below shows how SavReader converts SPSS dates. Where applicable, the SPSS-to-Python conversion always results in the 'long' version of a date/time. For instance, TIME5 and TIME40.16 both result in a %H:%M:%S.%f-style format. If you do not want SavReader to automatically convert dates, you can set rawMode=True. If you use this setting, keep in mind that SavReader will also not convert system missing values (\$SYSMIS) to an empty string; instead sysmis values will appear as the smallest value that can be represented on that system (-1 * sys.float_info.max)

Note. [1] IBM SPSS Statistics Command Syntax Reference.pdf [2] http://docs.python.org/2/library/datetime.html [3] ISO 8601 format dates are used wherever possible, e.g. mmddyyyy (ADATE) and ddmmyyyy (EDATE) is not maintained. [4] Months are converted to quarters using a simple lookup table [5] weekday, month names depend on host locale (not on ioLocale argument)

Writing dates. With SavWriter a Python date string value (e.g. "2010-10-25") can be converted to an SPSS Gregorian date (i.e., just a whole bunch of seconds) by using the spssDateTime method, e.g.:

kwargs = dict(savFileName="/tmp/date.sav", varNames=['aDate'], varTypes={'aDate': 0}, formats={'aDate': 'EDATE40'})
with SavWriter(**kwargs) as writer:
spssDateValue = writer.spssDateTime("2010-10-25", "%Y-%m-%d")
writer.writerow([spssDateValue])


The display format of the date (i.e., the way it looks in the SPSS data editor after opening the .sav file) may be set by specifying the formats dictionary (see also Table 1). This is one of the optional arguments of the SavWriter initializer. Without such a specification, the date will look like a large integer (the number of seconds since the beginning of the Gregorian calendar).