Circus clustering management
Circus is a process & socket manager. It can be used to monitor and control processes and sockets.
At Novapost, we usually launch processes on different (virtual) machines.
So we wanted Circus to manage processes launched on different servers.
We had the chance to discuss this with Tarek Ziadé and Alexis Metaireau at PyconFr 2012. They have the same needs at Mozilla so we seized the opportunity and brainstormed about our needs. We arrived to this conclusion :
- An unique interface to manage processes on different circusd called circusmeta
- To manage a unique circusd node or a pool of circusd nodes
- To run a new circusd and automatically be able to manage it
- To add a new worker on a specific circusd node
- To add a new worker on a service and let circusmeta choose which node will start it
- Have global statistics about the cluster and use them in plugins
- To run a command on a specific node or every nodes
We don't want
- To start a new virtualmachine
- To register some watcher on an empty circusd
So after this brainstorming we ended up with this implementation roadmap:
- Have a default name for the circusd server but also be able to rename it with the configuration and with a circusctl command.
- Modify the stats_endpoint protocol, to prefix stats with the circusd unique name of the node
- Create a socket on circusmeta that will agregate every circusd stats_endpoint on a unique socket base of the pool configuration.
- Adapt existing circus tools (circus-top, circushttpd, to manage circusd nodes)
A word about circusmeta
With that in mind, circusmeta don't need to be a server. It is just a tool which will manage a pool of nodes by connecting node's sockets (stats, endpoint and pubsub).
So circusmeta just need to be running when accessing the pool. (When we use a circustool on the pool)
circusmeta will be configured with the list of servers and some information about the strategy it will use when adding watchers on the pool.
This proposal doesn't change the core of circus, there is no master/slave thing or complex architecture to configure or understand.
The only changing point is that each stat message need to be identified with the node name, in order to use the same command for a unique server or for a pool behind circusmeta.
The codebase is also allready there, we just need some code to take one step back and manage a list of node in circus tools.