savReaderWriter / savReaderWriter / documentation / index.rst

Welcome to savReaderWriter's documentation!

In the documentation below, the associated SPSS commands are given in CAPS. See also the IBM SPSS Statistics Command Syntax Reference.pdf for info about SPSS syntax.



As shown in the table below, this program works for Linux (incl. z/Linux), Windows, MacOS (32 and 64 bit), AIX-64, HP-UX and Solaris-64. However, it has only been tested on Linux 32 (Ubuntu and Mint), Windows (mostly on Windows XP 32, but also a few times on Windows 7 64), and MacOS (with an earlier version of savReaderWriter). The other OSs are entirely untested.

Operating System Architecture
32 bit 64 bit
Linux X X
Solaris   X
Windows X X
zLinux   X


The program can be installed by running:

python install

Or alternatively:

pip install savReaderWriter

To get the 'bleeding edge' version straight from the repository do:

pip install -U -e git+

The savReaderWriter program is now self-contained. That is, the IBM SPSS I/O modules now all load by themselves, without any changes being required anymore to PATH, LD_LIBRARY_PATH and equivalents. Also, no extra .deb files need to be installed anymore (i.e. no dependencies). savReaderWriter now uses version (i.e., Fixpack 1) of the I/O module.


cWriterow. The cWriterow package is a faster Cython implementation of the pyWriterow method (66 % faster). To install it, you need Cython and run in the cWriterow folder:

easy_install cython
python build_ext --inplace

psyco. The psyco package may be installed to speed up reading (66 % faster).

numpy. The psyco package should be installed if you intend to use array slicing (e.g data[:2,2:4]).

:mod:`SavWriter` -- Write Spss system files

Typical use:

savFileName = "someFile.sav"
records = [['Test1', 1, 1], ['Test2', 2, 1]]
varNames = ['var1', 'v2', 'v3']
varTypes = {'var1': 5, 'v2': 0, 'v3': 0}
with SavWriter(savFileName, varNames, varTypes) as writer:
    for record in records:

:mod:`SavReader` -- Read Spss system files


Once a file is open, ioUtf8 and ioLocale can not be changed. The same applies after a file could not be successfully closed. Always ensure a file is closed by calling __exit__() (i.e., using a context manager) or close() (in a try - finally suite)

Typical use:

savFileName = "someFile.sav"
with SavReader(savFileName, returnHeader=True) as sav:
    header =
    for line in sav:

Use of __getitem__ and other methods:

data = SavReader(savFileName, idVar="id")
with data:
    print "The file contains %d records" % len(data)
    print unicode(data)  # prints a file report
    print "The first six records look like this\n", data[:6]
    print "The first record looks like this\n", data[0]
    print "The last four records look like this\n", data.tail(4)
    print "The first five records look like this\n", data.head()
    allData = data.all()
    print "First column:\n", data[..., 0]  # requires numpy
    print "Row 4 & 5, first three cols\n", data[4:6, :3]  # requires numpy
    ## ... Do a binary search for records --> idVar
    print data.get(4, "not found")  # gets 1st record where id==4

:mod:`SavHeaderReader` -- Read Spss file meta data


The program calls spssFree* C functions to free memory allocated to dynamic arrays. This previously sometimes caused segmentation faults. This problem now appears to be solved. However, if you do experience segmentation faults you can set segfaults=True in This will prevent the spssFree* functions from being called (and introduce a memory leak).

Typical use:

with SavHeaderReader(savFileName) as spssDict:
    metadata = spssDict.dataDictionary()
    print unicode(spssDict)


SPSS knows just two different data types: string and numerical data. These datatypes can be displayed in several different ways. String data can be alphanumeric characters (A format) or the hexadecimal representation of alphanumeric characters (AHEX format). savReaderWriter maps both of these formats to a regular alphanumeric string format. Numerical data formats include: default numeric (F) format, scientific notation (E), percent (PCT), dollar (DOLLAR), decimal comma (COMMA), decimal dot (DOT), zero-padded (N).

savReaderWriter.SavReader formats the N as a zero-padded version, but does no formatting for the other formats. Formatting implies a lot of additional processing time and e.g. appending a percent sign to a value (PCT format) renders it useless for calculations. Format names are followed by total width (w) and an optional number of decimal positions (d). For example, a format of F5.2 represents a numeric value with a total width of 5, including two decimal positions and a decimal indicator. A complete list of all the available formats is shown below. Most of these will not be formatted at all by savReader.

Date formats

Date formats are another group of numerical formats. SPSS stores dates as the number of seconds since midnight, Oct 14, 1582 (the beginning of the Gregorian calendar). The user can make these seconds understandable by giving them a print and/or write format (usually these are set at the same time using the FORMATS command). Examples of such display formats include ADATE (American date) and European date, for mmddyyyy- and ddmmyyyy-style display formats in the SPSS data editor, respectively. savReaderWriter deliberately does not honour these different formats, but tries to convert them to the more practical (sortable) and less ambibiguous ISO 8601 format (yyyymmdd). The table below shows how savReaderWriter converts SPSS dates.

With savReaderWriter.SavWriter a Python date string value (e.g. "2010-10-25") can be converted to an SPSS Gregorian date (i.e., just a whole bunch of seconds) by using e.g.:

kwargs = dict(savFileName="/tmp/date.sav", varNames=['aDate'], varTypes={'aDate': 0}, formats={'aDate': 'EDATE40'})
with SavWriter(**kwargs) as writer:
    spssDateValue = writer.spssDateTime("2010-10-25", "%Y-%m-%d")

The display format of the date (i.e., the way it looks in the SPSS data editor after opening the .sav file) may be set by using the savReaderWriter.SavWriter.formats setter property. This is one of the optional arguments of the SavWriter initializer.

Note. [1] ISO 8601 format dates are used wherever possible, e.g. mmddyyyy (ADATE) and ddmmyyyy (EDATE) is not maintained. [2] Months are converted to quarters using a simple lookup table [3] [4]