Overview

<!-- -*- markdown -*- -->

Gproc
======

Gproc is a new system, written in Go, that combines features of the LANL version of bproc (http://sourceforge.net/projects/bproc/) and the xcpu software (xcpu.org). Unlike bproc, gproc requires no kernel patch. Like both of them, it provides a process startup mechanism for lightweight cluster nodes which have only a small ramdisk as the root file system, with no local disks or NFS root at all. Lightweight cluster nodes can be very easy to manage if configured correctly (see, e.g., http://portal.acm.org/citation.cfm?id=1132314).

Gproc currently is only used by us to run cluster processes as root. If this is a problem for you you might want to wait to use it; multi-user use is in the works (it is not hard but Go did not in the early days support some things we needed for it to work). 

Gproc provides a system for running programs across clusters. A command and a set of nodes upon which to run it are specified at the command line; the binary, any required libraries, and any additional files specified at the command line are packaged up and sent out to the selected nodes for execution. The libraries needed for the binary are determined by gproc programatically. The outputs are then forwarded back to the control node. Currently, we do not support stdin for the tree of processes as bproc did; it's coming but we have not needed it yet.

Gproc can be used from one architecture type to run commands on another. The library determination will work even on, e.g., OSX, for binaries to run on an ARM. One needs to have a reasonable copy of a root file system for the other architecture locally. For example, to run on an ARM CPU from my OSX machine, I just use the -r switch with the path to the root file system for the ARM tree on my OSX laptop. 

Files that are transferred to nodes are stored in a replicated root under a certain directory of the node (default /tmp/xproc). This keeps gproc's operations from interfering with the rest of the node. For example, to run /bin/date, the binary would be copied to /tmp/xproc/bin/date and any necessary libraries would go to /tmp/xproc/lib/. The current working directory is also created on the nodes and the remote process starts in it. The mount is a private mount and disappears when the remote process and all of its children exit. 

At its most basic level, gproc needs at least a single "master" node and a single "slave" node; for testing purposes, these can even be the same system. The master, started with the "gproc m" command (see below), accepts connections from slave nodes (started with "gproc s"). The user then runs a "gproc e <nodes> <program>" command on the master; this sends a command over a Unix Domain Socket to the listening "gproc m" process, which then instructs the appropriate slave nodes to execute the specified program.

Getting started with gproc
--------------------------

Every gproc tree begins with a master node. The gproc master delegates commands to the slave nodes, which in turn run the requested command. A gproc master is started with at least:
	
	gproc m

With the master node running, we can begin to start slaves by invoking gproc with the 's' parameter. Additionally, each slave must be told where to find its parent node (which isn't necessarily the master, as gproc can form a tree of nodes), what its ID is, and what its address is. These parameters are given via the -myParent, -myId, and -myAddress flags. 

Bootstrapping many nodes using these flags can become quite cumbersome. To facilitate easy setup, these flags can optionally take a small but powerful set of commands to programmatically determine the parent, id, and address. The command set uses a simple stack based, postfix grammar.

Node specification commands
---------------------------

Nodes are bootstrapped via a simple but powerful postfix command set used on the command line. The commads are:

*	- Multiple the top two stack elements
+	- Add the top two stack elements
-	- Subtract the top two stack elements (tos[1] - tos[0])
/	- Divide the top two stack elements (tos[1] - tos[0])
%	- Mod the top two stack elements (tos[1] % tos[0])
roundup	- Round up tos[1] to the next highest integer defined by tos[0]
strcat	- Concatenate the top two stack elements (tos[1],tos[0])
ifelse	- if tos[0] == 0, replace tos[0] with tos[1], else tos[2]
hostname- Push the os.Hostname field onto the stack
hostbase- Strip the leading characters in the range a-z and . from tos[0], leaving only numbers
dup	- Duplicate tos[0]
swap	- Swap tos[0] with tos[1]

Note: tos[0] is the top of the stack, tos[1] is the second elements, and so on. 

Command line options
--------------------

There are several important command line options for use with gproc, the most important of which are a set of one-letter options which set the mode for gproc. The basic mode options for gproc are:

	  gproc [switches] m
	  gproc [switches] s
	  gproc [switches] e <nodes> <command>

"gproc m" starts the master process and should be executed on the front-end node. "gproc s" starts the slave process and should be run on every node you wish to control. "gproc e" is used to actually run a command on the specified nodes.

There are a number of switches which can modify the behavior of gproc; some of the most important ones are described here. Some only make sense in certain modes; each switch's appropriate mode(s) can be found in parentheses after the description. The default value for the option is listed as well.

*	  -localbin=false # If set, programs will be run from each slave node's local directories, rather than copying binaries from the node where "gproc e" was executed. (e)
*	  -p=true # If set, binRoot is mounted privately during execution. This prevents unwanted binaries and other files from sticking around in binRoot. (s)
*	  -debug=0 # Specifies the debug level, from 0 to 10. (s, m, e)
*	  -f="" # Comma-separated list of files to copy to the slaves along with the program being executed. (e)
*	  -binRoot="/tmp/xproc" # The location under which the binaries, libraries, and other files will be placed. Use the same value for this when running the master, slaves, and exec modes or else gproc will get confused. (m, s, e)
*	  -defaultMasterUDS="/tmp/g" # The master process puts a Unix Domain Socket into the filesystem; the "exec" stage then connects to this socket to send commands. (m, e)
*	  -cmdport="6666" # Which port gproc will listen on for incoming commands. (m, s)
*	  -r="/"	# where to find binaries. To use an arm root, for example, one can say -r=/path-to-arm-root

Node specification syntax (BNF)
-------------------------------

This syntax is used to select the nodes on which to run a command.

		<nodes> ::= <nodeset> | <nodeset> "/" <nodeset> | <nodeset> "," <nodes>
		<nodeset> ::= <node> | <node> "-" <node>
		<node> ::= "." | <number>

Examples:

	1-80	# Specifies nodes 1 through 80, inclusive
	.		# Specifies all available nodes (flat network) or all first-level nodes (tree)
	1/.		# Specifies all second-level nodes under the first level-1 node
			# In the "kane" config, this would select the first 20 nodes
	./.		# All nodes, all levels. Note that this example is 2 levels, but there is no limit on depth. 

Example usage
-------------

In this example we have the following cluster:

A master node, with the hostname "master"
64 slave nodes, with hostnames slave0..slave63

The gproc tree will be set up as a 2-level tree, with the master as the root node, and slave nodes 0, 1, 2, and 3 as the first level internal nodes. The rest of the nodes will use their hostname % 4 to determine their parent.

First, we start the master node:

	./gproc m

Then we start the first level slave nodes. This command is the same for each of the internal nodes:

	./gproc -myParent="master" -myId="hostname hostbase" -myAddress="hostname"

For the leaf nodes, we use:

	./gproc -myParent="slave hostname hostbase 4 % strcat" -myId="hostname hostbase" -myAddress="hostname"

For large scale bootstrapping it may be useful to use partially bootstrapped tree to start lower levels of the tree, i.e. manually start the first level of nodes, and then invoke gproc to have the first level nodes start the second level, and so on.

Finally, we issue a command on the master node:

	./gproc_linux_amd64 e ./. /bin/date

That should result in 64 dates returning, from each slave node.

If you want to get fancy, we can have gproc send some files with the command. If gproc is setup as a tree, then the master node will transfer these files to the first level of nodes only, and the first level will in turn send the files to each of their children, and so on:

	# The -f flag specifies files that should be copied along with the command
	./gproc_linux_amd64 -f="/scratch/megawin2.img,/scratch/migrate" e ./. /bin/cp /tmp/xproc/scratch/* /tmp/ramdisk

When you're done, Ctrl-C the master process; all the slaves should then automatically exit.