# knots

## 1. Project Description

The goal of knots is to provide a distributed system simulation environment that acts as a sandbox to implement and test the knot detection algorithm described in the paper Distributed Deadlock Detection [1] written by K. Mani Chandy and Jayadev Misra. knots implements their proposed algorithm for deadlock detection in the communication model.

 [1] (1, 2) The paper can be found here.

## 2. Compiling

In order to compile, you must have MPICH2 installed and configured on your path.

Compilation is as simple as running the usual:

./configure && make


From the root of the repository.

## 3. Running

knots provides a launch script in the root folder of the repository. This launch script makes sure that the program has been successfully compiled and wraps the call to mpiexec as well as predefines a number of required arguments.

Common issues:

• mpd must be running on all hosts on which knots will run.

During execution, the program outputs meaningful information on the state of the simulation, on the messages being passed around and on steps being taken. Since the output is centralized on a single display, the message formats is the following:

(process <id>) <msg>


The process id is the ID of the physical MPI process. Since each MPI process can simulate more than one processes, messages will reflect which simulated process they are talking about, while the id parameter helps the user tell on which physical process the activity is happening.

## 4. Implementation Details

The main components of the program are as follows:

Process

A Process simulates a worker node in a distributed system. They are not Unix processes and must not be confused as such. In any Linux process, a number of them can be running under a single thread of execution. They can send, receive and handle messages and can decide if it is time to attempt to detect a deadlocked state.

A Process has two main event handlers that the Simulation will interface with:

signal_recv()
This method is called whenever the Process is receiving any message from another process. The processes grabs the message and pushes it into its message buffers for later processing.
signal_send()

The Simulation calls this method when it is ready to send off a Process' messages. This method causes the process to attempt to retrieve any messages available for handling and changes its execution state depending on the success of this method. If any messages are retrieved here, the Process will simulate a time increment by using a Distribution to acquire an arbitrary number of computation time.

Once it is done handling all available messages, the Process returns a list of messages that it wishes to send so that the Simulation may dispatch them.

Process
+signal_recv()
+signal_send()
+should_query()
+init_query()
Distribution

A distribution is a very simple object with a single public facing method: get_int(). Its purpose is to make it simple to change the timestamp distribution algorithm. The object itself is abstract and must be subclasses to provide a valid implementation. In this project only one Distribution is available. It returns a constant timestamp to be added to messages.

Distribution
+get_int()
Message

Messages are passed back and forth between processes and can be one of three types: MSG, QUERY and REPLY. MSG type messages are regular message that must be processed as work units. The other two types of messages are used by the deadlock detection algorithm. For more information about how they are used, see the paper [1].

Message
+get_timestamp()
+get_src()
+get_dst()
+get_type()
+get_seq()
+get_initiator()
Simulation

The simulation is the body of knots. A simulation initializes a number of Process objects, populates their initial queues according to a configuration file and performs an arbitrary (user selectable) number of rounds of communication. Those rounds cause all Process objects to send each other messages, potentially causing deadlocks and potentially causing the deadlock detection algorithm to be launched.

Simulation
+start()
Config

Encapsulates a configuration setting for the simulation. This object is a singleton and parses a textual configuration file, extracting the connectivity graph and initial messages for each process. This class is used extensively by Simulation when it initializes its environment.

Config
+get_deps()
+get_init()

The following simplified UML diagram displays how the classes interact together in knots.

### 4.1. Multiple Destinations

A process can have more than one communication link to another process, allowing to create very complex topologies for simulation. When a process has multiple links, it will send a copy of its message to every single link. Take this simple example:

A ---> B
|
v
C


In this case, if A has a message to send, it will send identical copies of it to each of C and B. It is very important to note that A has no way of sending messages to a single of its links. This will create message duplication in the system and as more and more steps are taken, the number of the messages will grow considerably.

### 4.2. Query Initialization Check

The core part of the simulation is for processes to decide when to launch a query for deadlock detection. We make the different between executing processes and idle processes. An idle process is a process that cannot process any message from its input queues and is waiting on one or more processes in its dependent set. There are two possible transitions for a process:

Executing -> Idle
The process attempts to retrieve messages for handling but none are available: become idle.
Idle -> Executing
The process attempts to retrieve messages for handling and at least one is available: resume execution.

It is important to note that the actual Linux processes do not actually block or become idle: Only the internal Process objects mark themselves as being idle since this is a simulation. We now detail our proposed heuristic to detect if a query should be initiated.

First, we extend the message passing system to include a message of type DEADLOCK that is used by processes to report being deadlocked. We observe that a process P being deadlocked means that all the processes it sends to must also be deadlocked. From this observation, a process that detects it is in a deadlock reports the fact to all of its children processes. This causes each children process in turn to report to their own children processes until every process in the knot knows it is deadlocked. A process will not rereport being deadlocked if it is already deadlocked when it receives a report. This reduces the number of messages being sent in the system greatly.

Lastly, we need a heuristic to ensure that not all processes decide to query at the same time. Our heuristic is very straightforward. We decide on a base DETECTION_TRESHOLD and add a penality based on the process ID. The formula is as such:

if (idle_time > DETECTION_TRESHOLD + 4*my_pid)
initiate query


Since we can rely on one process in the system to report being deadlocked, all other processes reset their idle timer on every QUERY or REPLY message.

This combined with the penality on initiation ensures that at most one query will run at a time. Of course this assumes that the process querying does not crash while doing so.

## 5. Testing

A number of different node configurations have been tried in order to ensure that corner cases as well as regular execution paths work as expected. This section lists the different cases that we have investigated. For each case we provide a figure showing the topology and initial messages in the system and follow with a brief explanation of what makes the scenario interesting. This list tries to be as exhaustive as possible but it is entirely possible that some cases have been overlooked.

In this case, there is a single message in the system that bounces back and forth between two processes. No deadlock can occur since processes only wait on the other one, and one process always has the message. This case demonstrates basic functionality of the message passing framework.

In this case, there are no messages at all in the system. Thus the processes are in a deadlock at the very start of the simulation. This case aims to showcase the warm-up period for the deadlock detection algorithm, as well as the proper functioning of query and reply mechanisms. It is the simplest case of a deadlock possible.

This case aims to test the proper behaviour of multiple outbound edges from a single process. Processes 1 and 2 start with a message for process 0. Process 0 waits for both messages and forwards each back to its two children. This is the simplest form of multiple links and will cause messages to be generated in the system.