# Source

In the documentation below, the associated SPSS commands are given in CAPS. See also the IBM SPSS Statistics Command Syntax Reference.pdf for info about SPSS syntax.

## Installation

### Platforms

As shown in the table below, this program works for Linux (incl. z/Linux), Windows, MacOS (32 and 64 bit), AIX-64, HP-UX and Solaris-64. However, it has only been tested on Linux 32 (Ubuntu and Mint), Windows (mostly on Windows XP 32, but also a few times on Windows 7 64), and MacOS (with an earlier version of savReaderWriter). The other OSs are entirely untested.

Operating System Architecture
32 bit 64 bit
AIX   X
HP-UX   X
Linux X X
MAC OS X X?
Solaris   X
Windows X X
zLinux   X

### Setup

The program can be installed by running:

python setup.py install

Or alternatively:

To get the 'bleeding edge' version straight from the repository do:

The savReaderWriter program is now self-contained. That is, the IBM SPSS I/O modules now all load by themselves, without any changes being required anymore to PATH, LD_LIBRARY_PATH and equivalents. Also, no extra .deb files need to be installed anymore (i.e. no dependencies). savReaderWriter now uses version 21.0.0.1 (i.e., Fixpack 1) of the I/O module.

### Optional

cWriterow. The cWriterow package is a faster Cython implementation of the pyWriterow method (66 % faster). To install it, you need Cython and run setup.py in the cWriterow folder:

easy_install cython
python setup.py build_ext --inplace

psyco. The psyco package may be installed to speed up reading (66 % faster).

numpy. The psyco package should be installed if you intend to use array slicing (e.g data[:2,2:4]).

## :mod:SavWriter -- Write Spss system files

Typical use:

savFileName = "someFile.sav"
records = [['Test1', 1, 1], ['Test2', 2, 1]]
varNames = ['var1', 'v2', 'v3']
varTypes = {'var1': 5, 'v2': 0, 'v3': 0}
with SavWriter(savFileName, varNames, varTypes) as writer:
for record in records:
writer.writerow(record)

## :mod:SavReader -- Read Spss system files

Warning

Once a file is open, ioUtf8 and ioLocale can not be changed. The same applies after a file could not be successfully closed. Always ensure a file is closed by calling __exit__() (i.e., using a context manager) or close() (in a try - finally suite)

Typical use:

savFileName = "someFile.sav"
for line in sav:
process(line)

Use of __getitem__ and other methods:

with data:
print "The file contains %d records" % len(data)
print unicode(data)  # prints a file report
print "The first six records look like this\n", data[:6]
print "The first record looks like this\n", data[0]
print "The last four records look like this\n", data.tail(4)
print "The first five records look like this\n", data.head()
allData = data.all()
print "First column:\n", data[..., 0]  # requires numpy
print "Row 4 & 5, first three cols\n", data[4:6, :3]  # requires numpy
## ... Do a binary search for records --> idVar

## :mod:SavHeaderReader -- Read Spss file meta data

Warning

The program calls spssFree* C functions to free memory allocated to dynamic arrays. This previously sometimes caused segmentation faults. This problem now appears to be solved. However, if you do experience segmentation faults you can set segfaults=True in __init__.py. This will prevent the spssFree* functions from being called (and introduce a memory leak).

Typical use:

print unicode(spssDict)

### Formats

SPSS knows just two different data types: string and numerical data. These datatypes can be displayed in several different ways. String data can be alphanumeric characters (A format) or the hexadecimal representation of alphanumeric characters (AHEX format). savReaderWriter maps both of these formats to a regular alphanumeric string format. Numerical data formats include: default numeric (F) format, scientific notation (E), percent (PCT), dollar (DOLLAR), decimal comma (COMMA), decimal dot (DOT), zero-padded (N).

savReaderWriter.SavReader formats the N as a zero-padded version, but does no formatting for the other formats. Formatting implies a lot of additional processing time and e.g. appending a percent sign to a value (PCT format) renders it useless for calculations. Format names are followed by total width (w) and an optional number of decimal positions (d). For example, a format of F5.2 represents a numeric value with a total width of 5, including two decimal positions and a decimal indicator. A complete list of all the available formats is shown below. Most of these will not be formatted at all by savReader.

### Date formats

Date formats are another group of numerical formats. SPSS stores dates as the number of seconds since midnight, Oct 14, 1582 (the beginning of the Gregorian calendar). The user can make these seconds understandable by giving them a print and/or write format (usually these are set at the same time using the FORMATS command). Examples of such display formats include ADATE (American date) and European date, for mmddyyyy- and ddmmyyyy-style display formats in the SPSS data editor, respectively. savReaderWriter deliberately does not honour these different formats, but tries to convert them to the more practical (sortable) and less ambibiguous ISO 8601 format (yyyymmdd). The table below shows how savReaderWriter converts SPSS dates.

With savReaderWriter.SavWriter a Python date string value (e.g. "2010-10-25") can be converted to an SPSS Gregorian date (i.e., just a whole bunch of seconds) by using e.g.: