HTTPS SSH

Go-Docker airflow operator

Airflow is a platform to programmaticaly author, schedule and monitor data pipelines.

Because Go-Docker likes Airflow (http://pythonhosted.org/airflow/), we developped an Airflow operator to submit Bash command to Go-Docker infrastructure. With this operator, you can manage your workflow and submit some remote tasks on a large cluster of nodes.

Installation

Put the operator in a python accessible path, and install go-docker-cli (https://bitbucket.org/osallou/go-docker-cli).

Example

test.py file contains a sample worflow that makes of of the GoDockerBashOperator.

You should at first login to your GoDocker server with the gologin command (see go-docker-cli).

Contraints

Data are not automatically transfered from/to remote nodes. If local data access is needed, one need to take care to transfer your data to/from a remote acccessible directory (home for example), and mount this directory in your container.

Config

The following environment variables will control some of the parameters of your go-docker task

GOD_IMAGE: Docker image to use (default: centos) GOD_VOLUMES: volumes to mount in container (from available volumes, comma separared example: /home:rw,/db:ro, default: None) GOD_CPU: requested number of CPU (default: 1) GOD_RAM: requested RAM (default: 1Go) GOD_ROOT: request root access (if allowed, default False)

else they are available as operator parameters