savReaderWriter / savReaderWriter / documentation / index.rst

Welcome to savReaderWriter's documentation!

In the documentation below, the associated SPSS commands are given in CAPS. See also the IBM SPSS Statistics Command Syntax Reference.pdf for info about SPSS syntax.

Installation

Platforms

As shown in the table below, this program works for Linux (incl. z/Linux), Windows, MacOS (32 and 64 bit), AIX-64, HP-UX and Solaris-64. However, it has only been tested on Linux 32 (Ubuntu and Mint), Windows (mostly on Windows XP 32, but also a few times on Windows 7 64), and MacOS (with an earlier version of savReaderWriter). The other OSs are entirely untested.

Operating System Architecture
32 bit 64 bit
AIX   X
HP-UX   X
Linux X X
MAC OS X X?
Solaris   X
Windows X X
zLinux   X

Setup

The program can be installed by running:

python setup.py install

Or alternatively:

pip install savReaderWriter

To get the 'bleeding edge' version straight from the repository do:

pip install -U -e git+https://bitbucket.org/fomcl/savreaderwriter.git#egg=savreaderwriter
  • The savReaderWriter program is now self-contained. That is, the IBM SPSS I/O modules now all load by themselves, without any changes being required anymore to PATH, LD_LIBRARY_PATH and equivalents. Also, no extra .deb files need to be installed anymore (i.e. no dependencies).
  • savReaderWriter now uses version 21.0.0.1 (i.e., Fixpack 1) of the I/O module.

Optional features

cWriterow. The cWriterow package is a faster Cython implementation of the pyWriterow method (66 % faster). To install it, you need Cython and run setup.py in the cWriterow folder:

easy_install cython
python setup.py build_ext --inplace

psyco. The psyco package may be installed to speed up reading (66 % faster).

numpy. The psyco package should be installed if you intend to use array slicing (e.g data[:2,2:4]).

:mod:`SavWriter` -- Write Spss system files

Typical use:

savFileName = "someFile.sav"
records = [['Test1', 1, 1], ['Test2', 2, 1]]
varNames = ['var1', 'v2', 'v3']
varTypes = {'var1': 5, 'v2': 0, 'v3': 0}
with SavWriter(savFileName, varNames, varTypes) as writer:
    for record in records:
        writer.writerow(record)

:mod:`SavReader` -- Read Spss system files

Warning

Once a file is open, ioUtf8 and ioLocale can not be changed. The same applies after a file could not be successfully closed. Always ensure a file is closed by calling __exit__() (i.e., using a context manager) or close() (in a try - finally suite)

Typical use:

savFileName = "someFile.sav"
with SavReader(savFileName, returnHeader=True) as reader:
    header = next(reader)
    for line in reader:
        process(line)

Use of __getitem__ and other methods:

data = SavReader(savFileName, idVar="id")
with data:
    print "The file contains %d records" % len(data)
    print unicode(data)  # prints a file report
    print "The first six records look like this\n", data[:6]
    print "The first record looks like this\n", data[0]
    print "The last four records look like this\n", data.tail(4)
    print "The first five records look like this\n", data.head()
    allData = data.all()
    print "First column:\n", data[..., 0]  # requires numpy
    print "Row 4 & 5, first three cols\n", data[4:6, :3]  # requires numpy
    ## ... Do a binary search for records --> idVar
    print data.get(4, "not found")  # gets 1st record where id==4

:mod:`SavHeaderReader` -- Read Spss file meta data

Warning

The program calls spssFree* C functions to free memory allocated to dynamic arrays. This previously sometimes caused segmentation faults. This problem now appears to be solved. However, if you do experience segmentation faults you can set segfaults=True in __init__.py. This will prevent the spssFree* functions from being called (and introduce a memory leak).

Typical use:

with SavHeaderReader(savFileName) as header:
    metadata = header.dataDictionary()
    report = unicode(header)
    print report

Formats

SPSS knows just two different data types: string and numerical data. These data types can be formatted by SPSS in several different ways. Format names are followed by total width (w) and an optional number of decimal positions (d).

String data can be alphanumeric characters (A format) or the hexadecimal representation of alphanumeric characters (AHEX format). Currently, SavReader maps both of these formats to a regular alphanumeric string format. String formats do not have any decimal positions (d).

Numerical data formats include the default numeric format (F), scientific notation (E) and zero-padded (N). For example, a format of F5.2 represents a numeric value with a total width of 5, including two decimal positions and a decimal indicator. SavReader does not format numerical values, except for the N format, and dates/times (see below). The N format is a zero-padded value (e.g. SPSS format N8 is formatted as Python format %08d, e.g. '00001234'). For most numerical values, formatting means loss of precision. For instance, formatting SPSS F5.3 to Python %5.3f means that only the first three digits are retained. In addition, formatting incurs a lot of additional processing time. Finally, e.g. appending a percent sign to a value (PCT format) renders the value less useful for calculations. Table 1 below shows a complete list of all the available formats.

Date formats

Dates in SPSS. Date formats are a group of numerical formats. SPSS stores dates as the number of seconds since midnight, Oct 14, 1582 (the beginning of the Gregorian calendar). In SPSS, the user can make these seconds understandable by giving them a print and/or write format (usually these are set at the same time using the FORMATS command). Examples of such display formats include ADATE (American date, mmddyyyy) and EDATE (European date, ddmmyyyy), SDATE (Asian/Sortable date, yyyymmdd) and JDATE (Julian date).

Reading dates. SavReader deliberately does not honour the different SPSS date display formats, but instead tries to convert them to the more practical (sortable) and less ambibiguous ISO 8601 format (yyyy-mm-dd). You can easily change this behavior by modifying the supportedDates dictionary in __init__.py. Table 2 below shows how SavReader converts SPSS dates. Where applicable, the SPSS-to-Python conversion always results in the 'long' version of a date/time. For instance, TIME5 and TIME10.40 both result in a %H:%M:%S.%f-style format.

Note. [1] ISO 8601 format dates are used wherever possible, e.g. mmddyyyy (ADATE) and ddmmyyyy (EDATE) is not maintained. [2] Months are converted to quarters using a simple lookup table [3] http://docs.python.org/2/library/datetime.html [4] ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/20.0/en/client/Manuals/IBM_SPSS_Statistics_Command_Syntax_Reference.pdf [5] weekday, month names depend on host locale (not on ioLocale argument)

Writing dates. With SavWriter a Python date string value (e.g. "2010-10-25") can be converted to an SPSS Gregorian date (i.e., just a whole bunch of seconds) by using e.g.:

kwargs = dict(savFileName="/tmp/date.sav", varNames=['aDate'], varTypes={'aDate': 0}, formats={'aDate': 'EDATE10'})
with SavWriter(**kwargs) as writer:
    spssDateValue = writer.spssDateTime("2010-10-25", "%Y-%m-%d")
    writer.writerow([spssDateValue])

The display format of the date (i.e., the way it looks in the SPSS data editor after opening the .sav file) may be set by specifying the formats dictionary (see also Table 1). This is one of the optional arguments of the SavWriter initializer.

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.