Clone wiki

oshajava / Specifications

OshaJava Specifications

For now, see our OOPSLA 2010 paper, Composable Specifications for Structured Shared-Memory Communication. However, several improvements to the tool and the annotation system have been made since publication, making it easier to write the same specification using fewer annotations.

Modules: Delimiting Layered Communication Abstractions

When one thread writes a value to a memory location and a second thread later reads that value from that location, we say that the two threads communicate. What this means in terms of methods communicating is a little more subtel. Consider this producer-consumer pipeline implementation backed by a bounded buffer, where many producer threads call produce and many consumer threads call consume.

pipeline/ItemProcessingPipeline.java:

package pipeline;
import buffer.BoundedBuffer;
class ItemProcessingPipeline {
	BoundedBuffer pipe = new BoundedBuffer();

	// Called by producer threads
	void produce() {
    	...;  pipe.enqueue(...);  ...
	}

	// Called by consumer threads
	void consume() {
    	...;  ... = pipe.dequeue();  ...
	}
}

buffer/BoundedBuffer.java:

package buffer;
public class BoundedBuffer {
	Item[] buffer = new Item[10];
	int size = 0;
	...

	public synchronized void enqueue(Item i) {
		while (size == buffer.length) wait();
		buffer[...] = i;
		size++;
		notifyAll();
	}

	public synchronized Item dequeue() {
		while (size == 0) wait();
		size--;
		notifyAll();
		return buffer[...];
	}
}

When writes by one thread in enqueue communicate to reads in another thread in dequeue, it makes sense to say that enqueue is communicating to dequeue. However, at the pipeline level it also makes sense to say that produce communicates to consume, because produce called enqueue, which communicated to dequeue, called by consume. We clearly think about this communication in terms of layers of abstraction, which are visible when we compare the call stacks of the two threads from the above example at the time of their communicating access:

Thread 1:     | Thread 2:
--------------|---------------
produce ->    | consume ->      }-- pipeline
  enqueue ->  |   dequeue ->    }-- buffer
    write     |      read       }-- memory
  • At the bottom level, a memory write operation communicates to a memory read operation in another thread.
  • At the next level, the enqueue method, which performs the write operation, communicates to the dequeue method, which performs the read operation.
  • At the highest level of abstraction, the produce method, which called the enqueue method, communicates to the consume method, which called the dequeue method.

We solidify this intuition for layered abstractions with modules. Communication modules divide a program up into sets of methods that communicate in a meaningful way. Communication between methods in a module forms a layer of communication abstraction. The pipeline example above has two obvious modules: the pipeline and the buffer. Memory itself can be considered a lowest-level module, containing communicating write and read operations. When a pair of call stacks of communicating operations can be broken down into these layers by module, we say they have equivalent segmentations. (See the paper for a precise definition.)

Module Annotations

By default, every Java package p has a module p.Default and every method in package p belongs to this module. A specification may assign invidividual methods or sets of methods to other modules arbitrarily using the @Module annotation.

To assign method foo to module bar.Qux, we annotate it as follows:

@Module("bar.Qux")
void foo() { ... }

Important things to remember about module annotations:

  • Module names are always fully qualified. Using @Module("Qux") refers to the module Qux in the top-level (unnamed) package even if we're using the annotation in the bar package. We have no equivalent of the import construct in Java proper.
  • Modules are created "on demand." There is no single declaration of a module. By virtue of foo being declared a member of module bar.Qux, bar.Qux will be created if it does not already exist. (This is largely to simplify the compilation process.)

Cascading Module Membership

Often, we want all (or most) of the methods in a given class or package to belong to the same non-default module. To avoid annotating every single method with @Module, both classes and packages can also take @Module annotations. When a class or package is annotated with a @Module annotation, all of its members (and their members, and so on) will belong to this module as well unless they are annotated otherwise. For example, the following class is annotated with @Module annotations and each method is commented with the module it belongs to as a result.

@Module("A")
class C {
	void f() { ... }  // belongs to module A
	
	@Module("B")
	void g() { ... }  // belongs to module B

	class D {
		void h() { ... }  // belongs to module A
	}

	@Module("B")
	class E {
		void i() { ... }  // belongs to module B
	}
}

Packages can also be annotated by placing the annotation in a file package-info.java in the package source location:

@Module("A")
package p;
import oshajava.annotation.Module;

When is Communication Allowed?

If the call stacks of the two threads at the times of their communicating operations do not have equivalent segmentations, then the communication is summarily disallowed. If they do have equivalent segmentations, then from the callee-most end of the two call stacks we check each layer: if every writer method in the layer is allowed to communicate to every reader method in the layer, then the communication is allowed in this layer. If communication is allowed in all layers, then it is legal, otherwise it is not.

The next section describes how module-internal communication specifications are declared. The following section describes a modification of the simple checking policy above to allow for encapsulated (hidden) communication.

Module-Internal Communication Specifications

Within a communication module, we define which pairs of methods are allowed to communicate.

Groups

Rather than specifiy every pair of methods that may communicate, we use communication groups. A communication group is a set of writer methods and a set of reader methods (all from the same module), allowing any method in the writer set to communicate to any method in the reader set. Groups (as opposed to lists of pairs) avoid worst-case quadratic size specifications for sets of methods many of which communicate with each other and attach an intuitive name to a set of communicating methods.

Unlike modules, groups must be declared using a @Group(id="GroupId"). Group names are not qualified; they belong to the module in which they are declared. In other words, a @Group annotation on an element declares a group that belongs to whatever module is attached to that element (by cascading module annotations...). Due to the limits of Java annotations, @Groups must be declared on some element, typically a class declaration.

Group Memberships

Group memberships are declared in a distributed fashion: each method is annotated with the groups in which it is a reader (@Reader({...})) and the groups in which it is a writer (@Writer({...})). For example, we might annotate the bounded buffer methods from the code above as follows:

package buffer;
@Group(id="BufferAndSize")
public class BoundedBuffer {
	@Writer("BufferAndSize")
	@Reader("BufferAndSize")
	public synchronized void enqueue(Item i) {
		while (size == buffer.length) wait();
		buffer[...] = i;
		size++;
		notifyAll();
	}

	@Writer("BufferAndSize")
	@Reader("BufferAndSize")
	public synchronized Item dequeue() {
		while (size == 0) wait();
		size--;
		notifyAll();
		return buffer[...];
	}
}

The BufferAndSize group belongs to the module buffer.Default (recall that by default each package is a module). Both enqueue and dequeue are made writers and readers in the group because enqueue communicates to both itself and dequeue, and dequeue communicates to both itself and enqueue (note the size field updates).

Convenience:

  • When these sets are singleton sets, the set delimiter braces ( {} ) may be elided, as in the example above.
  • Read-only methods and write-only methods may use just a @Reader or just a @Writer annotation. Behavior is as expected.

Inline Methods

Some methods perform communication only on behalf of their callers. (Consider System.arrayCopy(...).) Rather than complicating our simple layered model to specify all the methods such a method may communicate with (potentially scattered across many modules), we treat it as though it were inlined into its caller. (We use "inlined" in the sense of the common compiler transformation.) For the purposes of segmenting two communicating call stacks, we essentially remove inlined methods from the stacks.

Even methods at the root of a call stack may be inlined; they simply disappear form the stack. In addition, the empty call stack is always allowed to communicate to the empty call stack (they have equivalent segmentations), giving us the useful property that the specification in which every method is inlined allows all communication. This makes a good starting point for incremental development of a spec and makes it easy to integrate unannotated libraries by setting inlined as the default method specification.

To mark a method as inlined, use the @Inline annotation instead of @Reader or @Writer.

Non-Communicating Methods

Methods that should not communicate at all can be declared with empty reader and writer memberships: @Reader({}) @Writer({}). The @NonComm annotation is syntactic sugar for this. @NonComm is a useful default when trying to derive a precise specification as it avoids unintentionally inlining methods that should have their communication restricted.

Cascading Method Annotations

Like module membership annotations, method communication annotations may be applied to packages and classes, setting the default annotation for member methods, with one minor nuance. Both @Inline and @NonComm complete override a cascading annotation of any kind. However, if a @Reader("A") is cascading, note that annotating a method as just @Writer("B") results in the method having the effective annotation of @Writer("B") @Reader("A"). To override the cascading @Reader("A") and achieve the effective annotation @Writer("B") @Reader({}), the method must also be explicitly annotated @Reader({}).

Module Communication Interfaces

When describing the layering of communication abstractions in the producers-consumers program above, we only considered communication in the "logical" direction from enqueue to dequeue and from produce to consume. However, it turns out that, among the other permutations we specified with the group memberships described above, dequeue may communicate to enqueue via the the size field of the bounded buffer. This communication makes sense in the bounded buffer, but it is really just an internal implementation detail that is not part of the external buffer abstraction. As such, this kind of communication should be encapsulated by the bounded buffer, meaning that callers of dequeue and enqueue from a different layer should not appear to communicate because of it.

To provide communication encapsulation, we use module communication interfaces. A module's communication interface describes those pairs of methods whose communication is visible (and meaningful) to callers outside the module. When the top (caller-most) pair of methods in a layer is in its module's interface, the communication is visible to higher layers. Otherwise, it is encapsulated, and the callers of these methods need not be allowed to communicate.

For the bounded buffer, we only expose communication from enqueue to dequeue, meaning the following communication is encapsulated by the buffer layer since communication from dequeue to enqueue is not in the buffer module's interface:

Thread 1:     | Thread 2:
--------------|---------------
consume ->    | produce ->      
--------------|-------------------- encapsulation boundary
  dequeue ->  |   enqueue ->    }-- buffer
    write     |      read       }-- memory

Interface Annotations

Module interfaces are declared by a set of interface groups; methods declare membership in interface groups with their writer and reader membership annotations. (In @Writer and @Reader sets, both regular communication groups and interface groups may appear.) Here is the fully annotated producers-consumers program:

pipeline/ItemProcessingPipeline.java:

package pipeline;
import buffer.BoundedBuffer;
@Group(id="Pipe")
class ItemProcessingPipeline {
	BoundedBuffer pipe = new BoundedBuffer();

	// Called by producer threads
	@Writer("Pipe")
	void produce() {
    	...;  pipe.enqueue(...);  ...
	}

	// Called by consumer threads
	@Reader("Pipe")
	void consume() {
    	...;  ... = pipe.dequeue();  ...
	}
}

buffer/BoundedBuffer.java:

package buffer;
@Group(id="BufferAndSize")
@InterfaceGroup(id="BufferInterface")
public class BoundedBuffer {
	Item[] buffer = new Item[10];
	int size = 0;
	...

	@Writer({"BufferAndSize", "BufferInterface"})
	@Reader("BufferAndSize")
	public synchronized void enqueue(Item i) {
		while (size == buffer.length) wait();
		buffer[...] = i;
		size++;
		notifyAll();
	}

	@Writer("BufferAndSize")
	@Reader({"BufferAndSize", "BufferInterface"})
	public synchronized Item dequeue() {
		while (size == 0) wait();
		size--;
		notifyAll();
		return buffer[...];
	}
}

Specification Checking with Encapsulation

We modify the checking algorithm slightly to support encapsulation. Communication between two operations is allowed if it is allowed by our previous definition or if:

  1. The call stacks at the two communicating operations have equivalent segmentations up to some layer that encapsulates the communication; AND
  2. The internal communication within each layer up to (and including) the encapsulating layer is allowed.

Returning to the running example, the following communication is allowed by the specification above, even though the pipeline module does not allow consume to communicate to produce.

Thread 1:     | Thread 2:
--------------|---------------
consume ->    | produce ->      
--------------|-------------------- encapsulation boundary
  dequeue ->  |   enqueue ->    }-- buffer
    write     |      read       }-- memory

Note that in this example, it happens that the two stacks have fully equivalent segmentations, even beyond the ecnapsulating layer (consume and produce do belong to the same module), but in general this need not be the case, since checking stops at the encapsulation boundary.

Known Limitations and Annoyances

Anonymous Classes and Named Classes Declared in Methods

Annotations on inner classes declared within methods (both anonymous and named classes) will not be processed. This is a limitation of the Java annotation processing framework.

Multiple Group Declarations

Since only one annotation of a given type is allowed, it is necessary to use an extra annotation when declaring more than one group or more than one interface group on a given source element:

@Groups(comm={@Group("A"), @Group("B")}, intfc={@InterfaceGroup("C"), @InterfaceGroup("D")})
class E { ... }

Updated