Wiki
Clone wikigo-docker / Home
Presentation
http://fr.slideshare.net/OlivierSallou/godocker-presentation
Screencast
https://www.youtube.com/watch?v=juw_foi-Q0c
https://www.youtube.com/watch?v=3fu2aLocTbI
Blog, wiki, tutorials, development
you will find here the official documentation for GoDocker
https://godocker.atlassian.net/wiki/display/GOD/GODOCKER
API documentation
http://go-docker.readthedocs.io/en/latest/
Features
go-docker is a tool to submit batch jobs on a multi-node/multi-user architecture. It can be compared to other tools like GridEngine/Torque/... It schedules and execute the jobs on an available node and manage its life-cycle. Jobs are executed in Docker containers.
Get more info at FeaturesDetails
Tutorial
End user
Administrator
Ecosystem
Logging
Logs of web server or scheduler/watchers can be sent to a central log system like graylog or logstash. This is specified in go-d.ini or production.ini (web), following Python logging configuration. Example configuration are available in go-d.ini.sample. You just need to update host/port information and add the handler to the loggers.
Monitoring
GO-Docker can export several statistics to InfluxDB, which can be linked to Grafana to get dashboards and charts. It also provides a Prometheus (http://prometheus.io/) endpoint (http://ip_add:6543/metrics). Some statistics are: number of running jobs, total number of jobs, scheduler timer, ... Both solution can connect to cAvdisor and go-docker to get detailled statistics on usage.
Security and network considerations
Go-Docker makes use of Docker, so as implications on network and security. Running applications in containers does not mean full isolation. Lean more SecurityNetwork
Task life cycle
the reschedule is the workflow -> kill -> set back to pending.
Development tips
Swarm
Start swarm with a list of nodes:
bin/swarm manage -H 127.0.0.1:2376 nodes://127.0.0.1:2375
Docker
restart docker to listen on tcp
On debian, use DOCKER_OPTS (/etc/default/docker)
DOCKER_OPTS="--dns a.b.c.d -H tcp://0.0.0.0:2375"
On Fedora, use OPTIONS (/etc/sysconfig/docker)
OPTIONS=" -H tcp://0.0.0.0:2375"
List running/stopped containers:
docker -H 127.0.0.1:2376 ps -a
Delete old stopped containers:
docker -H 127.0.0.1:2376 ps -a | awk 'NR > 1 {print $1}' | xargs docker -H 127.0.0.1:2376 rm
Database
To clean the database, connect to mongodb database (with 'mongo god' command) and execute:
db.users.drop()
db.jobs.drop()
db.jobsover.drop()
To reset database, connect to redis database (with 'redis-cli' command) and execute:
flushdb
Tech tips
SSH
Issue observed on ubuntu 14.04 image :
to install an SSH server on a docker image, directory creation is needed before apt-get. In the Dockerfile :
RUN mkdir /var/run/sshd
RUN apt-get install ssh -y
Mesos
Increase executor timeout for image pulls:
echo '5mins' > /etc/mesos-slave/executor_registration_timeout
typical slave config for GoDocker
[mesos-slave]# ls
attributes containerizers executor_registration_timeout
attributes => storage:disk;hostname:192.168.1.37
containerizers => docker,mesos
executor_registration_timeout => 5mins
Track mesos logs with Graylog (http://www.fluentd.org/guides/recipes/graylog2)
Install fluentd, gelf plugin and add in configuration:
<source>
type tail
path /var/log/mesos/mesos-master.ERROR
pos_file /tmp/mesos-master.ERROR.pos
tag graylog2.mesos
format /^(?<code>[A-Z])\d+\s+(?<time>[0-9:]+).*\] (?<message>.*)/
</source>
<match graylog2.**>
type copy
<store>
type gelf
host localhost
port 12201
flush_interval 5s
</store>
</match>
Tasks management
Sometimes, mesos fails to kill a job. Following steps will help to kill the container
-
On slave execute
docker stop XXXXX (container id)
-
Wait for container to stop and check if job has been killed in web interface (after refresh)
-
If container still appear in mesos interface and container is stopped, kill the mesos-executor process linked to the container
#!shell #ps -ef|grep mesos-executor root 23110 22419 0 17:35 ? 00:00:00 /usr/libexec/mesos mesos-executor --override /bin/sh -c exit `docker wait mesos-6a7f2dba-6368-42c6-b5a4-19012c9b0834` #kill 23110
-
If container is killed in Mesos and does not appear anymore as a Mesos job, but still appear in web interface (framework did not received kill confirmation), connect to redis:
set god:mesos:over:XXXX 7 with XXXX your task id.
To kill a mesos framework:
curl -d@/tmp/post.txt -X POST http://your_mesos:5050/master/shutdown
#/tmp/post.txt is a file with the follow content:
#frameworkId=23423-23423-234234-234234
CAdvisor
CAdvisor can be executed in a container (ip and ports to be adapted of course)
docker -H 127.0.0.1:2375 run --volume=/:/rootfs:ro --volume=/var/run:/var/run:rw --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --publish=8080:8080 --detach=true --name=cadvisor google/cadvisor:latest -docker="tcp://local_ip:2375"
Optional:
-storage_duration=X (in minutes)
For 10 minutes:
-storage_duration=10m0s
Logstash
Needs to listen on UDP and set host in go-d.ini
bin/logstash -e 'input { udp { port => 59590} } filter { json { source => "message" } output { elasticsearch { } }'
Consul
Using consul as status manager, it is possible to use Consul DNS features to load-balance requests to the web servers in HA and scalable mode. More info: https://bitbucket.org/osallou/go-docker-haproxy-consul
Prometheus
To query prometheus about a container, you need the container name (available in job details), then you can use query like:
rate(container_cpu_usage_seconds_total{name="mesos-05f4011f-faa9-4a3c-bbaf-128585555ce1"} [5m])
Updated