Wiki

Clone wiki

gimli / Monitor

This page was automatically generated from monitor.1

NAME

monitor - Run a process and monitor its health

SYNOPSIS

monitor [ options ] "process to run ..."

DESCRIPTION

monitor detaches itself from the spawning session, spawns the child process, and then goes into a monitoring state. Monitoring the child process consists of two main tasks:

fault monitoring is implemented by catching the SIGCHLD signal from the child process. A gimli compatible child process will use libgimli to establish a signal handler that will send itself the SIGSTOP signal in the event of a fault. The monitor will receive SIGCHLD when the child stops and knows to then initiate a trace.

watchdog is implemented by setting up shared memory between the monitor process and the child. The shared memory segment is referred to as the heartbeat, and contains two pieces of state information. The first is the general stage of life for the child process, which is one of starting, running and stopping. The other state information is a counter to indicate activity in the child process. monitor inspects the state and counter and watches for change. If no change is detected after a certain amount of time has passed, then the child process is deemed to have become wedged in some fashion, and monitor will initiate a trace and then arrange to terminate the child process.

CONFIGURATION

monitor takes its configuration from an optional configuration file, environmental variables and command line options. The following configuration options are possible and may be specified in the configuration file or as command line options. In addition, each one has a corresponding environmental variable equivalent.

monitor will read the configuration file specified by the GIMLI_CONFIG_FILE environmental or the --config-file command line option. Command line options always override environmental variables.

If a configuration file is specified, options are read from it and applied as though they were passed on the command line. The configuration syntax follows the Unix tradition of using the # character to denote that the rest of the line is a comment, and uses a simple name = value syntax for expressing options. Value names are the names of the options listed below, but without a leading -- prefix.

After processing an optional configuration file, each of the possible options environmental variable equivalents is checked and applied as though those values were passed on the command line.

After processing environmental variables, the command line options are checked and applied. Command line options always trump any other form of option setting. Command line options may have any number of leading - characters.

OPTIONS

watchdog-interval=seconds
Configures the interval after which the child process will be deemed wedged if it is in the running state. The default value is 60 seconds. The corresponding environmental variable is GIMLI_WATCHDOG_INTERVAL
watchdog-start-interval=seconds
Configures the interval after which the child process will be deemed wedged if it is in the starting state. The default value is 200 seconds. The corresponding environmental variable is GIMLI_WATCHDOG_START_INTERVAL
watchdog-stop-interval=seconds
Configures the interval after which the child process will be deemed wedged if it is in the stopping state. The default value is 60 seconds. The corresponding environmental variable is GIMLI_WATCHDOG_STOP_INTERVAL
debug=1
Enables debugging output. If used without specifying a value, thusly: --debug the debugging state will be toggled. The corresponding environmental variable is GIMLI_DEBUG
detach=0
Prevents monitor from detaching from the controlling process. If used without specifying a value, the detach state will be toggled. The corresponding environmental variable is GIMLI_DETACH
quiet=1
Enables quiet mode when detach is enabled. This will redirect the standard output and standard error to /dev/null. If used without specifying a value, thusly: --quiet the quiet state will be toggled. The corresponding environmental variable is GIMLI_QUIET
setsid=0
Prevents monitor from establishing its own session. If used without specifying a value, the setsid state will be toggled. The default is to use the setsid(2) system call to establish its own session, but this may be incompatible with other process management frameworks. The corresponding environmental variable is GIMLI_SETSID
glider=/path/to/trace/program
When monitor needs to initiate a trace, it will invoke the glider(1) utility. The default value for the glider option is the installation path configured when the Gimli utilities were installed. If for some reason you need to specify an alternate path, or an alternate trace utility, you may do so via this option. The corresponding environmental variable is GIMLI_GLIDER_PATH
trace-dir=/path/to/store/traces
If monitor initiates a trace, it will create a trace file in the trace directory. The default location is /tmp and you are strongly encouraged to set this to a more appropriate location. The corresponding environmental variable is GIMLI_TRACE_DIR
pidfile=/path/to/file.pid
If specified, the monitor will record its process id in this file, assuming that it can successfully obtain an exclusive (advisory) lock. If it is unable to lock the file, it will exit. The corresponding environmental variable is GIMLI_PID_FILE
uid=uid
If specified, the monitor will attempt to setuid to the specified numeric user id. The corresponding environmental variable is GIMLI_UID
gid=gid
If specified, the monitor will attempt to setgid to the specified numeric group id. The corresponding environmental variable is GIMLI_GID
immortal=1
Will monitor the child, tracing it in the event of a fault, and will restart the child regardless of how the child is terminated; whether it was due to abnormal termination or due to the child process exiting. The corresponding environmental variable is GIMLI_IMMORTAL
respawn-frequency=seconds
In the event that monitor needs to respawn the process, it will not do so more than once every respawn-frequency seconds. This acts as a brake to avoid torturing your system in the event of a critical system resource shortage or in the case of a brown paper bag configuration change. The default value is 15 seconds. The corresponding environmental variable is GIMLI_RESPAWN_FREQUENCY
run-once=1
Will monitor the child, tracing it in the event of a fault, but will not restart the child once it terminates. This is useful primarily for fault capture in scenarios where some other machinery will ensure that work is resumed and completed. The corresponding environmental variable is GIMLI_RUN_ONCE

TRACING

When monitor decides that it needs to trace a child process, it will create a trace file in the configured trace-dir using the basename(3) of the child process executable concatenated with the process id of the child and using the suffix .trc

The trace file will be created with a header describing the reason for the trace and the time of the incident. monitor will then spawn the configured glider utility to perform tracing. The glider process will be run with its standard output and standard error streams redirected to the trace file.

If a file with the same name as the intended trace file already exists, monitor will overwrite it.

RESPAWN

monitor will respawn the child process if it terminates abnormally. Abnormal termination is any situation where the child process terminates due to the receipt of any one of the SIGSEGV SIGABRT SIGBUS SIGILL SIGFPE or SIGKILL signals. If the child process terminates for any other reason, then monitor will exit and return the exit code from the child process.

SIGNALS

SIGUSR1
If monitor is sent the SIGUSR1 signal, it will treat it as an alternative means of incrementing the counter in the heartbeat. If it provided to allow processes implemented in script to take advantage of the watchdog facility, without requiring the scripting environment to be extended.
SIGTERM SIGINT SIGQUIT
If monitor receives any of these signals, it will treat them as an indication that it should exit. Before exiting, monitor will relay the signal to the child process and wait for it to exit.

AUTHOR

Wez Furlong

SEE ALSO

glider(1), pstack(1), gstack(1)

Updated