Batch computing/cluster management tool with Docker.

Manage user job (batch/interactive) submissions with Docker (like SUN Grid Engine, Slurm, ... on nodes). User does not need to have Docker rights, everything will go through the application. Jobs that cannot be scheduled are kept in queue until contraints/resources are available.
Linked to the system users (or ldap), user will have their home mounted automatically as well as other configured directories according to required priviledges in container.
They will be able to run jobs with their uid/gid or as root in the container.

The user specifies his job command line or script and its requirements (docker image, cpu, ram) via a CLI tool and go-d-web Web UI.

Go-Docker supports a plugin system to manage authentication, scheduling and execution of containers.

Go-Docker does not compete against Kuberbenets, Mesos frameworks etc... on the contrary.
Most of those tools manage efficiently the dispatch of containers, but they are
not "user aware". All users are equal and equal in time. They do not focus either usually on the need to connect (ssh) into a container for interactive tasks, they focus on executing a task based on what user asks. Go-Docker wants to act as a top layer above those tools and to benefit from those tools. The plugin mechanism provides a way to create an executor for Swarm (provided among others with the tool), but also Kubernetes etc...

They also usually focus on running "production" long running tasks (such as a webapp), or regular tasks (Chronos). GO-Docker focus on one computational task. If it fails, there is no automatic restart, user will decide after anaylisis to reschedule it if necessary.

1 task = 1 computation = 1 result

This application mainly targets systems where you know your users (not an open cloud platform) which have a home directory they usually access to via SSH/SCP/... , working dirs, ... With Go-Docker, they can ask to get their home directroy mounted in the container (read-only when root access is required for example, and rw for basic access), as well as other pre-defined directories to do... what they need to do. The Authentication plugin will define what users can mount or not.
With the plugin schedulers, it is possible to reorder the pending jobs before running them, based on previous user usage for example, or to reject a job because user reached some quota.



docker images used to submit jobs must have bash installed.
On Alpine linux, if bash is not installed, GoDocker will try to install it but it will delay command startup.

For interactive jobs, GoDocker expects openssh server to be installed. If not in image, it will automatically install it but this will delay session startup.


  • job submission
  • job watch (check if job is over)
  • kill job
  • suspend/resume job
  • reschedule a job
  • quotas: user or project (cpu, memory, disk)
  • use a private registry
  • rate limiting: limit number of pending tasks for a user or globally
  • dependency between tasks
  • process monitoring
  • temporary local storage on nodes
  • optional guest support, i.e. accepting users connecting with Google, GitHub, ... and not in system. Guest will map to a system user for execution.
  • node reservation (mesos)
  • experimental resource reservation (GPU, etc.)
  • open some ports in container
  • FTP server for user data upload (with quota)
  • CNI Docker network plugins support
  • simultaneous resource usage limits


See COPYRIGHT file. Go-Docker is developed at IRISA.

Other components


go-docker and its components (go-docker-web, ...) are Python 2.7+ and Python 3 compatible.


apt-get install python-dev libldap2-dev gcc libsasl2-dev openssl libpython-dev libffi-dev libssl-dev protobuf-compiler (Debian/Ubuntu)

yum install python-devel libldap-devel gcc cyrus-sasl-devel openssl python-devel libffi-devel openssl-devel protobuf-compiler (Fedora/CentOS)

pip install -r requirements.txt

Prior to version 1.3.2, for mesos, one need to install python libraries (mesos-interface and mesos-native) corresponding to your Mesos installation. From 1.3.2, plugin makes use of the Mesos HTTP API and does not need the python libraries anymore.


Application needs Mongodb and Redis. Setup Redis to save data on disk for

For time based statistics, you can optionally install InfluxDB (if you do not
with to use it, let influxdb_host to empty in configuration file).
If InfluxDB is used, databases must be created first.
Graphics can be displayed with Grafana on series god_task_usage

InfluxDB can also be used with cAdvisor to archive usage statistics (see go-docker-web)

InfluxDB is optional. Grafana will help getting charts of usage recorded in InfluxDB.


go-d-docker needs a shared directory between go-d-scheduler, go-d-watcher and
container nodes.






The scheduler is in charge of scheduling pending jobs and running them on an

There is only one scheduler per executor running. At startup, scheduler will check if Redis
database is populated, if not, it will sync it with mongo database. This
automatic check prevents Redis flushes or loss of data.

While only one scheduler is active for a specific executor, it is possible to execute multiple instances for high-availability.
If one instance is stopped, an other will take leadership and be the new scheduler master.
In case of failure, one need to wait for the previous keep-alive timeout for the new scheduler to be the new master (config parameter master_timeout, default to 600s).
Timeout must be higher than the maximum time needed for a scheduling run.

It is possible to run multiple scheduler instances for different executors (2 mesos for different roles for example), in this case GOD_EXECUTOR environment variable must be set to match the executor name defined in go-d.ini. If only one executor is defined in config, then GOD_EXECUTOR is not needed (first one will be used)

For multiple executors, with mesos for example, in go-d.ini:

# default scheduler policy
scheduler_policy: 'FairShare'

  - 'mesos1'
  - 'mesos2'
  - 'sge1'

  executor: 'mesos'

  executor: 'mesos'
  # Define a specific mesos role, overriding default mesos role
  role: 'god_r'
  # Define a whitelist of allowed project to use this executor
  # If not defined all projects can use it
    - 'test'

  executor: 'sge'
  # Use a different scheduler policy for this executor
  scheduler_policy: 'FiFo'

  master: ...

Then to start a scheduler for a specific executor:

GOD_EXECUTOR=mesos1 python start
GOD_EXECUTOR=mesos2 python start


watcher checks job status and manage jobs kill/suspend/resume. There can be
multiple watchers running in parallel


The archiver is in charge of the cleanup of old jobs. can be put as a cron task to delete old jobs based on configuration. Job identifiers can also be given as parameter to the script to force the archive of those jobs, even if expiration date is not reached. sends archive requests to the go-d-archive daemon.

Archiver service is required to let user archive jobs themselves.


Configuration is in file go-d.ini. One can define the scheduler and executor to use. It must be one of the classes defined in plugins. One can easily add new ones following godocker/iExecutorPlugin and iSchedulerPlugin interfaces

File is in YAML format.

One can set environment variable GOD_CONFIG to specify go-d.ini location for go-d-scheduler and go-d-watcher.

Once application is configured, it is necessary, the first time, to initialize the database:

python3 init

Logging is defined in log_config field. One can set logging for logstash, graylog, file etc...

Some configuration parameters can be overriden with environment variables:

  • GODOCKER_DOCKER_TLS: set to 1 to activate
  • GODOCKER_WATCHERS: comma separated list of watcher names
  • GODOCKER_EMAIL_SMTP_TLS: set to 1 to activate
  • GODOCKER_LIVE_EVENTS: set to 1 to activate


FairShare plugin adds additional, optional configuration to manage different weight to apply on parameters:

fairshare_waiting_time_weight = 0.1  # weight apply to waiting time of tasks
fairshare_user_time_weight = 1       # weight to cpu time spent in previous tasks for defined duration user_reset_usage_duration
fairshare_user_cpu_weight = 0.2      # weight to number of cpu used in previous tasks for defined duration user_reset_usage_duration
fairshare_user_ram_weight = 0.2      # weight to quantity of ram used in previous tasks for defined duration user_reset_usage_duration
fairshare_group_time_weight = 0.1    # same as user but for user group
fairshare_group_cpu_weight = 0.01    # same as user but for user group
fairshare_group_ram_weight = 0.01    # same as user but for user group
fairshare_user_weight = 1            # weight to apply on user priority
fairshare_group_weight = 1           # weight to apply on group priority


To identify processes, each process must have a unique id. It is possible to set this id with the environment variable GOD_PROCID.
This variable is needed when multiple processes are executed on the same server and to set a process id (one watcher for example) identical after each restart.

Environment variable GOD_CONFIG can specify go-d.ini location.

export GOD_PROCID = 1
python3 go-d-scheduler start

export GOD_PROCID = 2
python3 go-d-watcher start

python3 go-d-archive start

Executables have start and stop commands to manage their execution (ran in background)

For debug, it is possible to run processes in foreground with run option, or only one time (no loop) with option once.

Log level can be modified with env variable GOD_LOGLEVEL (DEBUG, INFO, ERROR)


Running in containers

See docker-files/prod/

Running on Amazon EC2

Look at for
contextualization scripts to deploy an EC2 GO-Docker cluster.

Chef install

You can install GoDocker master and slaves using Chef cookbooks, see
Cookbooks can be used on EC2 or bare metal.


Tool support plugins for Authorization, Scheduling and Execution. A few ones are provided but it is easy to create new one, following existing ones using Yapsy.

To create a new plugin, create a xx.yapsy-plugin file and a file following Yapsy convention. In you should extend one of the available interfaces according to plugin goal (,,

  • iScheduler plugins look at pending jobs and basically reorder them according to their scheduling strategy.
  • iExecutor execute the job and checks its status
  • iAuth is used by the web interface, to authenticate a user (ldap bind for example), get some user information (uidnumber, home directory...) and ACLs (which volumes can be mounted for this user request for example).
  • iWatcher checks, once a job is running if it should continue or be killed, ...
  • iStatus register processes and get status of processes to an external system (etcd, consul, ...), if none is set, no monitoring will be done
  • iNetwork plugins are used by executors to use Docker CNI plugins to assign an IP to containers avoiding port mapping

Base classes are documented here:

The Utils class can also be used with a few helpers.

Available plugins are:

  • Scheduler:
    • fifo: First In First Out basic strategy
    • fairshare: Fair share user tasks repartition (prio, previous usage, ...)
  • Executor:
    • swarm (Docker Swarm standalone)
    • dockerswarm (Docker native Swarm)
    • mesos (Apache Mesos, tested with mesos 1.0.0 and 1.2.0)
    • Kubernetes (tested with 1.2.0, using API v1)
    • SGE (tested with SGE 8.1.9)
    • Slurm (tested with slurm 18.08)
    • fake (not be used, for test only, simulate a job execution)
  • Auth:
    • goauth: ldap based authentications.
    • fake: fake authentication for tests
    • local: match db users (using with system users.
  • Watcher:
    • maxlifespan: Checks max life duration of a job
  • Status:
    • etcd: register processes to etcd
    • consul: register processes to Consul
  • Network:
    • weave: Weave network Mesos/Docker plugin support, only 1 public network is supported for the moment.
    • calico: Calico network Mesos/Docker plugin


WARNING: To use swarm or dockerswarm, you need to uninstall docker-py python package before the install of docker package.

To integrate with Swarm, swarm process needs to listen on a tcp socket (tcp://ip_address:2376 for example). Swarm is in charge of dispatching tasks to the Docker instances on slaves.
The swarm executor also allows, for testing, to connect to a Docker daemon, with the need to install Swarm (limited to a single, local, node).

With dockerswarm (using swarm integrated with Docker), CNI networks are not supported at this time as weave network is not in the scope of swarm
(Docker Engine running in swarm mode currently only supports the built-in overlay driver for networking. Therefore existing networking plugins will not work in swarm mode.), see


The Mesos plugin integrates go-docker in as a mesos framework ( Framework is managed via the scheduler process.

It does not manage for the moment Mesos master failover nor reoffers (offer rejected)/slave failure. See bugs for remaining issues/missing feaures for Mesos.

Tasks are executed using Mesos executor with Docker or unified containers. Port allocation (for interactive jobs) are managed via the port resources declared on slaves.

Before godocker 1.3.2, slaves should be started with an attribute "hostname" matching the slave hostname or ip, for example: --master= --containerizers=docker,mesos --attributes='hostname:'

Since v1.3.2 this is not needed anymore, however mesos slave hostname must be routable, else attribute should be kept.

GoDocker supports both Docker and Mesos Unified containerizers, choice is made in config file in mesos section.

For interactive jobs and when using some additional ports, port mapping
information is available via environment variables, named PORT_XX, XX being the
port requested in container (example PORT_22 for SSH)


GoDocker has experimental support with Kubernetes (see Kubernetes section in configuration).
Tasks that cannot be planned doe to overcapacity are deleted from Kubernetes but remain in pending state in GoDocker.
It does not support GPU reservation as Kubernetes does not support yet arbitrary resource counters (though in roadmap).
Regarding temporary storage, software makes use of emptyDir volume capability of Kubernetes. A local directory is created on node, but no capacity is guaranteed (per user or global). Other kinds of persistent volumes are not supported here.

User scripts

User submits a bash script that will be executed as root or user id (according to request) in the container.
Volumes will be mounted in container according to requested volumes and iAuth interface.

'home' is a reserved name for volumes, expecting to match user home directory (if mounted). An environment variable will point to the mounted directory.

Several environment variables are available in user script:

  • GODOCKER_JID: Job identifier
  • GODOCKER_PWD: Job directory, container will execute script in this directory
  • GODOCKER_HOME: mount path for user home directory.

For job arrays the following variables are also available:

  • GOGOCKER_TASK_ID: Job array task identifier
  • GOGOCKER_TASK_FIRST: First job array task identifier
  • GOGOCKER_TASK_LAST: Last job array task identifier
  • GOGOCKER_TASK_STEP: steps in job array identifiers

If root access is not required, script will be executed with user id/gid under the container user name godocker.
All files in GODOCKER_PWD will be chowned to the user id/gid.

If root access is required, user should take care not to write to other directories (if allowed) as files will have root ACLs.

Network access is required if sudo is not installed, go-docker will install it at container start.

Interactive sessions

If user selects an interactive session, a port will be opened and the sshd process started in the container.
For interactive sessions, the selected image SHOULD have the openssh server package installed. If not installed GoDocker will try to install openssh-server package via apt- or yum.

Mapped port will be available in task description.

The user public key will be copied in the container for access. If not set, user can cannot using his user id and his API Key as password.

It is possible to get SSH connection to the container via a Web SSH proxy such as GateOne (, running on a public IP (https).
Nodes name should be known in the network (dns) to access them through the proxy.

Job arrays

Job arrays allows to execute multiple jobs with same parameters. A
start🔚step definition generate X jobs in separate job directories with
additional environment variables (GOGOCKER_TASK_ID, ...).

Main task has an id and refers the X sub tasks. Main task does not execute any
command and ends when all sub tasks are over. Killing a main tasks also kills
all the sub tasks.


User can submit a task on a project. If not specified default is used.
default project has no quota, and all users can submit to the project. In this
case, only user quotas will apply.

Only administrators can create projects and assign users to projects. Other
projects can get a global quota applying to all members (cumulative usage of


User and projects can have a priority between 0 and 100. This value may be used by schedulers to order the pending tasks. default project has a priority of 50.


User and projects can have quotas. If quota value is 0, then quota is disabled for the resource.
Value is in resource/s meaning for example that if a quota of cpu is set to 3600, then user can use 1 CPU for one hour, or 2 CPU for half an hour, ...
If quota is reached, then new jobs are rejected. Running jobs reaching quota will be killed if configuration option kill_on_limit is set, else jobs will be allowed to end (but new jobs will still be rejected).

Quota time range, before reset, is defined in user_reset_usage_duration configuration field, in days.


Projects (or default project) can have limits, i.e. limits on the number of simultaneous resources.
It is disabled by default but can be activated with projects/limits/has_limit parameter in go-d.ini.
Per resource limits are defined per project. Some defaults are set in config but can be customized by project via Web UI/API.

DRMAA support

A drmaa library is available at
It is a C implementation of DRMAA v1. Not all methods are implemented but basic
use to submit/check jobs are present.

Python DRMAA modules can work with this library.

Process monitoring

Process monitoring is optional, if you do not wish to use etcd default plugin, simple set status_policy to None or empty string.


Etcd must be running on a server. One can use Docker to do so:

docker run -p 4001:4001 --name etcd --rm \
    -listen-client-urls, \
    -advertise-client-urls http://${HostIP}:2379,http://${HostIP}:4001


status_policy: 'etcd'
etcd_prefix: '/godocker'
etcd_host: 'a.b.c.d' # Ip address where is running etcd
etcd_port: 4001


Installation/setup of Consul is out of the scope here.

Setting up Consul as status manager will register web/scheduler/watchers to Consul with ttl or http checks.
In addition, web services will be registered in Consul DNS with their ip and port for DNS queries.
Consul DNS can be used for load balancing to web servers using for example HAProxy.

Temporary local storage

This feature needs the Docker volume plugin docker-plugin-zfs on each node (see Docker plugins).
User can specify a local Temporary storage size on tasks. System will create and mount a ZFS storage volume for the container and will be deleted at the end of the task. Volume has quota according to task requirements. This provides a local disk storage for intermediate computation requiring fast disk access.

At this moment, this works only with mesos plugin, not swarm due to swarm management/scheduling with Volume plugins.

Guest support

The guest feature is optional and supported by the guest_XXX parameters in go-d.ini file. Guest users connect with Google, GitHub, ... and are not in system (ldap, ...). All guests are mapped/binded to a system user for execution. They do not have direct access to their home directory, which will be created as a subdirectory of a specified, existing, directory.

Administrator, via the web interface, must activate the users after their first login. Guest can be disabled afterward.

Care should be taken regarding disk space used in the guest home directory. Usage is visible in administrator panel of the web interface.

Node reservation

With mesos, a node can be reserved for a set of users or projects.

Simply add to mesos-slave the attribute:




Resource reservation

With mesos, one can ask for specific resources (GPU, etc.) according to resources available set in mesos-slave.

If native GPU is supported in your Mesos version (>= 1.x), configure mesos for GPU resources then simply use "gpus" resource in task.requirements. this requires native_gpu and unified parameters to be set in config

warning: unified containerizers in Mesos do not support port mapping without CNI, e.g. no interactive jobs
To support interactive jobs, one need to setup mesos with netowrk/cni isolation and setup bridge/port mapper plugin or other external supported cni plugin (weave, calico)

User will be able to select one or more GPU resources.

For lower mesos versions, it is possible to define GPUs in constraints.
Resources are defined in go-docker as constraints, in constraint named resource:

constraints: [
    { name: 'storage', value: 'ssd,disk' },
    { name: 'resource', value: 'gpu_10.5' }

One can define multiple resource with a comma (gpu_10.5,my_specific_resource).

On mesos-slave, resources should be added as following fro a GPU use case.

admin:/etc/mesos-slave$ cat attributes
admin@ip-10-72-136-115:/etc/mesos-slave$ cat resources

In resources files, we define the resource name and the available quantity for this resources. In above example, we have 2 GPUs (a range with left and right bounds included).
For each resource, it is possible to define volumes to mount in container. To do so, we add to the attributes file one attribute per resource with the name scheme: nameOfResource_resourceSlot. Volumes must be comma separated if multiple ones are needed.

Still in previous example, if system selects in available resources the slot 0 of resource gpu_10.5, then volumes defined in gpu_10.5_0 will be mounted in container.

It is possible to specify in task multiple resource requirements.

Private registry/images


To access a Docker registry/image with authentication, user needs to:

  • Create an archive in his home directory named docker.tar.gz containing the .docker directory (.docker/, /.docker/config.json ) after a successful login
  • Add home volume in his task (to access docker.tar.gz archive)

CNI integration

Network plugins allows to assign a job to a network (public/user/project) and assign an IP per container, avoiding port mapping.
The choice of networks supported depends on the plugin. The provided network plugins only support the public network at this time.

In public network, all containers are in the same sub network and can dialog with each other.
In user or project network, containers are placed in a network where only containers from the same user or project can talk to each other.

CNI is activated in the network section of the configuration file. If not activated, basic Docker bridge and port mapping is used.

SGE integration

If docker for SGE is not enabled, native SGE jobs will be executed (no containers). GoDocker will simply add mem and cpu requirements for the job and execute the job under user id (no root access possible, no mounts as job is directly executed on the node).
If docker is set, GoDocker will submit job in SGE and execute a Docker container. To prevent users from using direclty Docker, the web interface will act as an authorization plugin for docker. To do so, docker must be configured to:

  • listen on a tcp port (docker/url parameter of config)
  • define an authorization plugin

    cat /etc/docker/plugins/godocker-auth.spec

  • Add authorization plugin to docker daemon

    cat /etc/systemd/system/docker.service.d/docker.conf
    ExecStart=/usr/bin/docker daemon --host=fd:// --host=tcp:// --authorization-plugin=godocker-auth

  • Reload config and restart docker (godocker web must already be running)

    systemctl daemon-reload
    service docker restart

When a docker command is executed, docker will trigger godocker web to ask for authorization. A specific token is created at job creation only and only docker commands using this token are accepted. GoDocker will check that volumes, scripts etc. are the same than those requested at job creation time to prevent reusing a token for a different job.
Without this token, users won't be able to use Docker directly on your cluster.

To allow job kill operation, SGE must be configured to either send a SIGTERM on qdel instead of SIGKILL or support the -notify option, else container will continue to run and won't be deleted on job completion.

SGE docker jobs cannot be monitored (cAdvisor/prometheus) at this time.


Remove old containers

docker -H ps -a | awk 'NR > 1 {print $1}' | xargs docker -H rm

swarm binpacking strategy:

bin/swarm manage --strategy binpacking -H 127.0.1:2376 nodes://

Cleanup of counters in case of incoherency (due to some framework

redis-cli> get god:jobs:queued
"-2"> set god:jobs:queued 0
OK> llen god:jobs:running
(integer) 2> lpop god:jobs:running
"501"> lpop god:jobs:running
"502"> lpop god:jobs:running


Running cAdvisor on nodes gives live monitoring of the container (ram/cpu/network) in the web interface.

sudo docker run --volume=/:/rootfs:ro --volume=/var/run:/var/run:rw --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --publish=8080:8080 --detach=true --name=cadvisor google/cadvisor:latest

End user tutorials


Web interface

High availability

Databases (mongo and redis) can be in high availability mode, switching master in case of failure.
For mongo, the mongo_url should be like mongodb://mongo1:27017,mongo2:27017,mongo3:27017/i?replicaSet=test. In case of master switch, mongo raises an exception and manages the auto reconnect to the newly elected master. During the switch, however, some writes may fail. To avoid this, GoDocker retries the mongo operation several times with an increasing delay to give time for the switch. A warning log message Mongo:AutoReconnect is added to inform about the event.

If read/write operation still fails after several attempt, status may be inconsistent and a restart of processes should be done to reconcile tasks with backend executor.


Go-Docker is developed at IRISA - France

Thanks for other contributors:

  • Samsung R&D Institute Ukraine