1. Ning Sun
  2. squealer


squealer /

Filename Size Date modified Message
4.1 KB
239 B
Squealer is a jython library for unit testing Apache Pig scripts.

Starting with v0.8 Pig comes with a library known as PigUnit that can
be used for unit testing pig scripts from java (or any other JVM
compatible language).  This served as an inspiration and starting
point for Squealer but is suboptimal in several aspects which 
this library hopes to solve:

1) All equality comparisons are done by converting data to strings and 
checking for string equality.  This makes testing scripts doing floating
point arithmetic nearly impossible.  It also requires you to know the
ordering of records in a relation, in practice this can be somewhat
difficult for a number of operations in Pig.

2) PigUnit keeps a static reference to the Pig run time which prevents
the implementation of isolated tests.  This can result in passing tests 
that should be failing and vice versa.  This could also result in the
non-deterministic failing of tests that should be deterministically 
failing, as the ordering of test execution changes.

3) PigUnit currently only works with Pig 0.8+ (though anyone is free
to back port it).  However, due to the dynamic nature of python it 
is trivial to write a single library which supports multiple versions 
of Pig.

4) Language preference.  Also, this yak isn't going to shave itself.

Currently Squealer is tested against Pig 0.8.0 and Jython 2.2.1, though
if you can confirm/deny it's feasibility on other versions this information
would be highly appreciated.  A version matrix is planned for the future
to documented what combinations of these libraries are supported. Of note
is that these versions are what is available in the Ubuntu 10.04 and 
Cloudera apt repositories.

Once the dependencies are installed, installation of Squealer is as simple
as dropping it into any directory in your PYTHONPATH.  For a stock Ubuntu
install this can be accomplished by dropping the directory into 

In tests/test_examples.py is a test case that illustrates how to define
a unit test with Squealer and some common tasks you may want to perform
in one.

1) After performing parameter subsitition and alias overriding, Squealer
writes out a temp file containing the actual pig code to be executed. In
the event of a test failure or error, the path to this temp file is 
printed with the error report to provide an ease in debugging the actual
script that was executed.

2) The majority of output from the Pig runtime which is useful in 
debugging your scripts is provided via apache logging mechnisims.  
Unfortunately, the majority of information providied via logging is 
quite unuseful.  For this reason, all logging is turned off by default.
However, if you see a failure message you do not understand you can use
the '-v' option when running the python file with your tests (assuming
you are using Squealer's main() function).

Common issues
1) You see the following error message:
ImportError: no module named squealer

This indicates that the Jython run time can not find your copy of the Squealer 
library.  A common mistake with jython is to set the PYTHONPATH environment 
variable and expect this to be used as is with CPython.  To specify the path
to jython use the -Dpython.path=... command line argument, alternatively a
common practice is to use a bash alias so that the PYTHONPATH variable is
used like it is in CPython:
alias jython="jython -Dpython.path=\$PYTHONPATH"

2) You see any of the following error messages:
ImportError: no module named org
ImportError: no module named apache
ImportError: no module named hadoop

The Jython run time can not find the jars containing the required Pig/Hoop
dependencies.  You will need to set your class path so that they can be 
found.  On a system with Ubuntu/Cloudera packages installed the following
should get you going:
export CLASSPATH=/usr/lib/pig/*:/usr/share/java/*:/usr/lib/hadoop/*