HTTPS SSH

Armus X10: distributed deadlock-detection for X10

See also project armus. Armus-X10 automatically verifies deadlocks caused by Clock, SPMDBarrier, and finish blocks.

Usage

Armus-X10 expects a compiled X10 program (a JAR or a directory with classes). Armus-X10 instruments a given program (by introducing some checks) and then outputs a checked program. The usual workflow is therefore:

  1. compile
  2. instrument
  3. run

Example

Here is our deadlocked example

$ cat src/pos-samples/x10/DistClockClock.x10
class DistClockClock {
    public static def main(Rail[String]) {
        val c1 = Clock.make();
        val c2 = Clock.make();
        at(Place.FIRST_PLACE) async clocked(c1, c2) {
            c1.advance();
        }
        at(Place.FIRST_PLACE.next()) async clocked(c1, c2) {
            c2.advance();
        }
    }
}

Compile. The input is file src/pos-samples/x10/DistClockClock.x10, the output is archive cc.jar. Additionally, you can run the program and note how it deadlocks.

$ x10c src/pos-samples/x10/DistClockClock.x10 -o cc.jar

Instrument. The expected console output is verbose; warnings are OK. Armus-X10 lists where verification checks are introduced in the program.

$ java -jar target/armusc-x10.jar cc.jar cc-checked.jar
[warning] couldn't find aspectjrt.jar on classpath, checked: ...

Join point 'method-call(void x10.lang.Clock.advance())' in Type 'DistClockClock$$Closure$1' (DistClockClock.java:223) advised by around advice from 'pt.ul.armus.x10.inst.ClockObserver' (ClockObserver.class:17(from ClockObserver.aj))

Join point 'method-call(void x10.lang.Clock.advance())' in Type 'DistClockClock$$Closure$0' (DistClockClock.java:157) advised by around advice from 'pt.ul.armus.x10.inst.ClockObserver' (ClockObserver.class:17(from ClockObserver.aj))


1 warning
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

Run. Armus-X10 reports the deadlock and terminates the program.

$ x10 -cp cc-checked.jar DistClockClock
[P0,W3,T3034310919] deadlock found [pt.ul.armus.edgebuffer.SynchronizedTaskHandle@3d15ae94, pt.ul.armus.edgebuffer.SynchronizedTaskHandle@5b83aa96]
Deadlock detected: [pt.ul.armus.edgebuffer.SynchronizedTaskHandle@3d15ae94, pt.ul.armus.edgebuffer.SynchronizedTaskHandle@5b83aa96]

How do I enable distributed deadlock detection?

You need to install and configure your Redis server. For this example, we will assume the server is localhost.

Check if the redis server is running:

$ redis-cli PING
PONG

Create a armus.properties file to configure Armus-X10:

$ cat armus.properties 
print_config = true
detection.enabled = true
avoidance.enabled = false
; Activate the redis backend
buffer.class = pt.ul.armus.edgebuffer.redis.RedisEdgeBufferFactory
; Defaults:
; redis.host = localhost
; redis.port = 6379

Finally, run the program with more than one host. Parameter -np 2 tells X10 to use run two instances at localhost.

$ x10 -np 2 -cp cc-checked.jar DistClockClock
[P0,W1,T2870213923] Configuration loaded: MainConfiguration [detection=DetectionConfiguration [isEnabled=true, delay=200, period=100], edgeBuffer=EdgeBufferConfiguration [clearBufferAtTheEnd=true, factory=pt.ul.armus.edgebuffer.redis.RedisEdgeBufferFactory@50c02524], avoidanceEnabled=false, cycleDetector=pt.ul.armus.cycledetector.jgraph.JGraphTSolver@4fe5fda6, deadlockResolver=pt.ul.armus.x10.X10DeadlockResolver@7a23a3f6, graph=AUTO, printConfig=true]
[P1,W0,T2870214019] Configuration loaded: MainConfiguration [detection=DetectionConfiguration [isEnabled=true, delay=200, period=100], edgeBuffer=EdgeBufferConfiguration [clearBufferAtTheEnd=true, factory=pt.ul.armus.edgebuffer.redis.RedisEdgeBufferFactory@9ba59df], avoidanceEnabled=false, cycleDetector=pt.ul.armus.cycledetector.jgraph.JGraphTSolver@59ef16dc, deadlockResolver=pt.ul.armus.x10.X10DeadlockResolver@7981fac6, graph=AUTO, printConfig=true]
[P0,W4,T2870214200] deadlock found [pt.ul.armus.edgebuffer.SynchronizedTaskHandle@70da2eb9]
Deadlock detected: [pt.ul.armus.edgebuffer.SynchronizedTaskHandle@70da2eb9]

How do I configure the graph model selection used for cycle detection?

By default Armus-X10 automatically chooses a graph model that better suits the application it is checking. We can force Armus to fix a graph model with a command line option. The available options are: auto (default), wfg, and sg.

We can rerun our examples set to wfg by setting -Darmus.graph=wfg:

$ x10 -Darmus.graph=wfg -np 2 -cp cc-checked.jar DistClockClock
[P0,W1,T2870412722] Configuration loaded: MainConfiguration [detection=DetectionConfiguration [isEnabled=true, delay=200, period=100], edgeBuffer=EdgeBufferConfiguration [clearBufferAtTheEnd=true, factory=pt.ul.armus.edgebuffer.redis.RedisEdgeBufferFactory@50c02524], avoidanceEnabled=false, cycleDetector=pt.ul.armus.cycledetector.jgraph.JGraphTSolver@4fe5fda6, deadlockResolver=pt.ul.armus.x10.X10DeadlockResolver@7a23a3f6, graph=WFG, printConfig=true]
[P1,W0,T2870412800] Configuration loaded: MainConfiguration [detection=DetectionConfiguration [isEnabled=true, delay=200, period=100], edgeBuffer=EdgeBufferConfiguration [clearBufferAtTheEnd=true, factory=pt.ul.armus.edgebuffer.redis.RedisEdgeBufferFactory@6851b509], avoidanceEnabled=false, cycleDetector=pt.ul.armus.cycledetector.jgraph.JGraphTSolver@583e08dd, deadlockResolver=pt.ul.armus.x10.X10DeadlockResolver@219b7ebf, graph=WFG, printConfig=true]
[P0,W4,T2870413006] deadlock found [pt.ul.armus.edgebuffer.SynchronizedTaskHandle@70da2eb9]
Deadlock detected: [pt.ul.armus.edgebuffer.SynchronizedTaskHandle@70da2eb9]

Are there more examples?

Examples of deadlocks can be found in directory src/pos-samples/x10/.

Examples of deadlock-free programs can be found in src/neg-samples/x10/.

Are there tests?

Yes. Running ant test will run more than 100 tests, including checking all the deadlocked programs in src/pos-samples/x10/. Note, the startup and teardown overhead of the X10 virtual machine takes a while, so running all tests should take some time (around 3 minutes in a CPU i7-3770).

Run ant test -Doffline=true if you do not have an internet connection.