Clone wiki

go-docker / Home



Blog, wiki, tutorials, development

you will find here the official documentation for GoDocker

API documentation


go-docker is a tool to submit batch jobs on a multi-node/multi-user architecture. It can be compared to other tools like GridEngine/Torque/... It schedules and execute the jobs on an available node and manage its life-cycle. Jobs are executed in Docker containers.

Get more info at FeaturesDetails


End user






Logs of web server or scheduler/watchers can be sent to a central log system like graylog or logstash. This is specified in go-d.ini or production.ini (web), following Python logging configuration. Example configuration are available in go-d.ini.sample. You just need to update host/port information and add the handler to the loggers.


GO-Docker can export several statistics to InfluxDB, which can be linked to Grafana to get dashboards and charts. It also provides a Prometheus ( endpoint (http://ip_add:6543/metrics). Some statistics are: number of running jobs, total number of jobs, scheduler timer, ... Both solution can connect to cAvdisor and go-docker to get detailled statistics on usage.

Security and network considerations

Go-Docker makes use of Docker, so as implications on network and security. Running applications in containers does not mean full isolation. Lean more SecurityNetwork

Task life cycle


the reschedule is the workflow -> kill -> set back to pending.

Development tips


Start swarm with a list of nodes:

bin/swarm manage -H nodes://


restart docker to listen on tcp

On debian, use DOCKER_OPTS (/etc/default/docker)

DOCKER_OPTS="--dns a.b.c.d -H tcp://"

On Fedora, use OPTIONS (/etc/sysconfig/docker)

OPTIONS=" -H tcp://"

List running/stopped containers:

docker  -H  ps -a

Delete old stopped containers:

docker  -H  ps -a | awk 'NR > 1 {print $1}' | xargs docker  -H rm


To clean the database, connect to mongodb database (with 'mongo god' command) and execute:


To reset database, connect to redis database (with 'redis-cli' command) and execute:


Tech tips


Issue observed on ubuntu 14.04 image :

to install an SSH server on a docker image, directory creation is needed before apt-get. In the Dockerfile :

RUN mkdir /var/run/sshd
RUN apt-get install ssh -y


Increase executor timeout for image pulls:

echo '5mins' > /etc/mesos-slave/executor_registration_timeout

typical slave config for GoDocker

[mesos-slave]# ls
attributes  containerizers  executor_registration_timeout

attributes => storage:disk;hostname:
containerizers => docker,mesos
executor_registration_timeout => 5mins

Track mesos logs with Graylog (

Install fluentd, gelf plugin and add in configuration:

  type tail
  path /var/log/mesos/mesos-master.ERROR
  pos_file /tmp/mesos-master.ERROR.pos
  tag graylog2.mesos
  format /^(?<code>[A-Z])\d+\s+(?<time>[0-9:]+).*\] (?<message>.*)/
<match graylog2.**>
  type copy
    type gelf
    host localhost
    port 12201
    flush_interval 5s

Tasks management

Sometimes, mesos fails to kill a job. Following steps will help to kill the container

  1. On slave execute

    docker stop XXXXX (container id)

  2. Wait for container to stop and check if job has been killed in web interface (after refresh)

  3. If container still appear in mesos interface and container is stopped, kill the mesos-executor process linked to the container

#ps -ef|grep mesos-executor
root     23110 22419  0 17:35 ?        00:00:00 /usr/libexec/mesos mesos-executor --override /bin/sh -c exit `docker wait mesos-6a7f2dba-6368-42c6-b5a4-19012c9b0834`
#kill 23110
  1. If container is killed in Mesos and does not appear anymore as a Mesos job, but still appear in web interface (framework did not received kill confirmation), connect to redis:

    set god:mesos:over:XXXX 7 with XXXX your task id.

To kill a mesos framework:

curl -d@/tmp/post.txt -X POST http://your_mesos:5050/master/shutdown
#/tmp/post.txt is a file with the follow content:


CAdvisor can be executed in a container (ip and ports to be adapted of course)

docker  -H run   --volume=/:/rootfs:ro   --volume=/var/run:/var/run:rw   --volume=/sys:/sys:ro   --volume=/var/lib/docker/:/var/lib/docker:ro   --publish=8080:8080   --detach=true   --name=cadvisor  google/cadvisor:latest -docker="tcp://local_ip:2375"


-storage_duration=X (in minutes)
For 10 minutes:


Needs to listen on UDP and set host in go-d.ini

bin/logstash -e 'input { udp { port => 59590} } filter { json { source => "message" } output { elasticsearch {  }  }'


Using consul as status manager, it is possible to use Consul DNS features to load-balance requests to the web servers in HA and scalable mode. More info:


To query prometheus about a container, you need the container name (available in job details), then you can use query like:

rate(container_cpu_usage_seconds_total{name="mesos-05f4011f-faa9-4a3c-bbaf-128585555ce1"} [5m])