PReach : A distributed model checker for Murphi

PReach was developed and is owned by a formal verification team at Intel
in collaboration with Brad Bingham and Flavio De Paula at 
the University of British Columbia.

Jesse Bingham
John Erickson

1.  Setup & Compiling Preach
2.  Running Preach
3.  Caveats
4.  Preach Command Line Arguments
5.  Can Get To feature (experimental)
6.  Utilities
7.  Contacts & Acknowledgements

Setup & Compiling Preach.............................................................

PReach requires Erlang, which can be downloaded here:

We are using version otp_src_R13B02-1.  We cannot guarantee
compatibility with other versions.

Or if you are within Intel, you might be able to use the path in the setup file.

Edit the defs of PREACH_ROOT and PREACH_TEMP in ./setup appropriately; these
should be the absolute paths to the directory where the preach sources live
and the directory where preach will create the trace file and diskq
during runtime, respectively.  Also ERLANG_PREFIX should point to an Erlang
installation; see the setup file for more details.  The contents of setup
should then be added to your ~/.cshrc.* file, and sourced:

> source setup

To build preach, simply do:  

> make

To run a test (single-threaded) do:

> cd ./test
> ./test

This runs a short model check on Peterson's protocol, which should take
less than 30 seconds.  The test was successful if you see something this
near the bottom of the output:

	Total of 35159 states visited (this only accurate if no error or in localmode)

Running Preach...............................................................

Now cd into whatever directory you want to run preach in, which we'll assume 
contains the murphi source file blah.m .  One can find plenty of example
Murphi models under examples/. Next do 

> make -f $PREACH_ROOT/Makefile

This should create the file.  To invoke preach in local mode, do

> lgo blah

You can run lgo with arguments that will get passed to preach; anything
after -- is passed.  

> lgo blah -- -nosym

For non-local mode first make sure the contents of setup are pasted
into your ~/.cshrc.<user> file.  Populate a "hosts" file with the
names of node@host in this form:


Here machine1 and machine2 are placeholders for real machine domain names, 
while n1, n2,.. are identifiers you are free to make up, with the only
restriction that they must be unique for the same machine (i.e. no
too lines of the host file should be the same).

To make sure you can ssh without a password prompt and also that
that the erlang virtual machine is automatically in your
path when ssh'ing to each <host>, try:

> ssh <host> erl -run init stop

Success here looks like:
Eshell V5.7.3  (abort with ^G)
(and then you are returned to your local unix shell)
Note that we've had PReach users get bitten by this problem, so it is worth 
double checking for a few of your hosts.

Then run 

erl  -rsh ssh -sname root_worker -pa $PREACH_ROOT  -run preach run -model blah

Running with Netbatch inside Intel.............................................

Make sure you source your setup file from within your ~/.cshrc.<user> 
file.  Also it is important that a "no suspend" (nosusp) netbatch class is 
used.  The netbatch class is defined in your setup file.  Once the 
class is set appropriately, you can do:

>nbrun 8 blah

to run blah on 8 nodes.  If you want to specify the name of the run directory, 

> nbrun 8 blah -r blah.rundir

which will create a subdir blah.rundir in the current directory wherein (once 
the job is run by netbatch) will contain (among other things) the main 
logfile blah.log.  Without specifying blah.rundir, you will get a run dir named 
something like run.123456789.

You can run nbrun with arguments that will get passed to preach.
You need to use -- to do this

> nbrun 8 blah -r blahdir -- -nosym -m 1000


* We have observed many issues that are best described as issues inheritted
  from the Murphi Front end and/or Murphi language defficiencies.  There are
  two classes. 
  - calling a function from an invariant can cause bad C code
    to be generated
  - cannot make assignments between non-simple data types (like arrays or 
    or records), even though this is well defined
  - cannot do equality comparison between non-simple data types (like arrays or 
    or records), even though this is well defined

* Specifying a command line arg multiple times currently causes Preach to 
  use the default value; this should be fixed at some point.  but beware
  when you are calling preach via layers of scripts that each arg
  should only be mentioned (at most) once!

* There is a bug that manifests on 32 bit machines if you use a hash table of 
  more than about 500MB; Preach will not work.  Unfortunately the default
  is 1G.

* Preach (or perhaps Erlang) has an outstanding bug where occasionally it will
  die immediately after being launched with the following message:

   {"init terminating in do_boot",{badarg,[{murphi_interface,call_port,1},

  this happens non-deterministically; if you see it simply rerunning preach
  will typically solve the problem.

* the lgo script currently deletes files matching $PREACH_TEMP/*$USER* before
  launching preach.  So this means it will destructively interfere with any
  running programs (including other preach instances) that use such files.
  we should probably fix this.

* Running with (exactly) 2 threads that are both on one machine seems to 
  sometimes cause problems

Preach Command Line Arguments.....................................................

(these args can be passed to lgo after the model)

-nosym      turns off symmetry reduction, which is on by default 
            (note this is murphi symmetry reduction algorithm #3)

-ndl        turns off deadlock detection, which is on by default

-bob N1     back-off kicks in whenever a node's erlang runtime message queue grows
            beyond N messages long.  By default N1 = 10,000.  Having back-off
            set too high (or off) can lead to nodes spacing out for large
            models.  Having it too low can lead to too unnecessary performance
            degradation.  irrelevant if in localmode.  to turn back-off off,
            use N1 = 0.

-unbob N2   un-back-off when in back-off mode and message queue shrinks to 
            N messages.  Defaults to N1/10.  irrelevant if in localmode.

-m N3       make the murphi hash table at each node use N3 Megabytes.
            If not given, this defaults to 1024 (1 GB).

-pr N4      each worker node prints profiling information every N4 seconds.  
            default is 5.

-lbr N5     Using load balancing, which means whenever a worker node's
            work queue is empty, it is sent N5 percent of the states from
            the peer worker with the longest work queue.  By default
            N5=0, which turns off the load balancing mechanism.  
            Typically we use 50 for this parameter.

-verbose N6 Controls verbosity level. Greater value gives more output. 
                        Range is {0,...,5}; default value is 1.

-seed S     By changing S, once can change the uniform state ownership
            function to a different (but still presumably random
            and uniform) function.  

-msu        Make Successors Unique.  If given, when successor states
            are computed, any redundancies are removed before
            sending to their owners.  This can have a performance
            speed-up for some models (i.e. those that for whatever
            reason have alot of different rules that often produce
            the same next state)

-pri        Print Rule Info.  Prints the number of times each rule was fired.
            Currently only works in localmode.  This flag is called -pr in 

-pas        Print All States.  This arg has been removed, for the same
            effect use -psr 1 .

-psr R      Print every Rth unique state visited (per worker).  Using
            -psr 1 will print every state, which of course will cause 
            a butt-load of output for non-small models.

Can Get To feature (experimental).........................................

Preach currently handles CTL properties of the form AGEFq, where q is a 
state assertion.  We can this "Can get to" q, or CGT(q).  There can be
at most one such property, and is specified in the murphi code in the
(admittedly clunky) syntax:

liveness "CGT"

Where q is a state assertion.
Since this is a not technically a liveness property, the word liveness is
misleading.  To verify CGT(q), Preach does a forward search from each 
reachable state.  This forward search fires "helpful" rules in some
arbitrary sequence and checks that a q-state is found.  If a q-state 
is indeed found ,we call the resulting sequence of states a CGT
suffix.  To reduce redundant work, some or all of the states in the CGT 
suffix are stored in an special CGT hash table, and subsequently CGT
suffix contstruction can safely stop whenever a q-state or a CGT hashed
state is found.  If CGT suffix creation fails, a counterexample trace 
(formed by constructing the path to the reached state concatenated with 
the failed CGT suffix) is printed.  CGT suffix failure can happen for 3 reasons
1. A loop is found in the CGT suffix
2. A deadlock occurs, in the sense that a state is reached where no helpful
   rule is enabled
3. The CGT suffix grows to length 500

There is a test that demos the can-get-to feature, see test/test_cgt.
Here are the CGT related flags Preach takes

-nhr r1 r2 ... rk 

   Specify the non helpful rules (there are typically more helpful rules
   than nonhelpful, so it is easier to specify the latter).  Here
   r1 - rk are strings, and rule whos name contains any of these strings
   as a substring are deemed nonhelpful.  For example for german's protocol
   we use -nhr SendReq Store . Without -nhr all rules will be considered

-cgthdt k

   CGT hash distance threshold.  All states on a CGT suffix except for the
   last k are stored in the CGT hash.  Using k=0 will result in all states
   being hashed, while k=500 will result in no states being hashed.
   The defailt is k=0. 

-cm j

   Allocate j Megabytes for the CGT hash table. Default is 1024.


   Don't check the CGT property even if there is one in the model.


There is a perl script Intel/plotlog that uses gnuplot to create plots
of work queue lengths, hash table state counts, expanded state counts,
etc., using info from Preach log files.  See the comment at the top of the 
script for usage.

Contacts & Acknowledgements..............................................

Primary contacts

* Jesse Bingham (Intel)
* John Erickson (Intel)

Preach was initially developed by Flavio de Paula and Brad Bingham
at UBC and they continue to be active.  Thanks to Flemming Andersen (Intel) 
and Mark Greenstreet (UBC) for supporting this project.   Thanks to 
Norris Ip (Jasper) for help understanding and solving a tricky symmetry 
reduction bug.