1. Rémy HUBSCHER
  2. natim-blog

Source

natim-blog / src / circus_clustering_management.rst

Circus clustering management

Date: 2012-09-16 15:28
tags:python, circus
category:Python
Author: Rémy Hubscher
lang:en
Status: draft

Introduction

During PyConFr 2012, we spent, Jonathan Dorival, Mathieu Agopian and me, two days sprinting on Circus.

Circus is a process & socket manager. It can be used to monitor and control processes and sockets.

At Novapost, we usually launch processes on different (virtual) machines.

So we wanted Circus to manage processes launched on different servers.

Brainstorming

We had the chance to discuss this with Tarek Ziadé and Alexis Metaireau at PyconFr 2012. They have the same needs at Mozilla so we seized the opportunity and brainstormed about our needs. We arrived to this conclusion :

We want

  • An unique interface to manage processes on different circusd called circusmeta
  • To manage a unique circusd node or a pool of circusd nodes
  • To run a new circusd and automatically be able to manage it
  • To add a new worker on a specific circusd node
  • To add a new worker on a service and let circusmeta choose which node will start it
  • Have global statistics about the cluster and use them in plugins
  • To run a command on a specific node or every nodes

We don't want

  • To start a new virtualmachine
  • To register some watcher on an empty circusd

Implementation

So after this brainstorming we ended up with this implementation roadmap:

  • Have a default name for the circusd server but also be able to rename it with the configuration and with a circusctl command.
  • Modify the stats_endpoint protocol, to prefix stats with the circusd unique name of the node
  • Create a socket on circusmeta that will agregate every circusd stats_endpoint on a unique socket base of the pool configuration.
  • Adapt existing circus tools (circus-top, circushttpd, to manage circusd nodes)

A word about circusmeta

With that in mind, circusmeta don't need to be a server. It is just a tool which will manage a pool of nodes by connecting node's sockets (stats, endpoint and pubsub).

So circusmeta just need to be running when accessing the pool. (When we use a circustool on the pool)

circusmeta will be configured with the list of servers and some information about the strategy it will use when adding watchers on the pool.

Conclusion

This proposal doesn't change the core of circus, there is no master/slave thing or complex architecture to configure or understand.

The only changing point is that each stat message need to be identified with the node name, in order to use the same command for a unique server or for a pool behind circusmeta.

The codebase is also allready there, we just need some code to take one step back and manage a list of node in circus tools.