an upgrade from 5.14 -> 5.20 regression quesion/issue

Issue #524 new
sandy currier created an issue

We are upgrading from 5.14 to 5.20 and seem to have hit a potential regression.

First a short background:

We use monit to manage services within a cloud environment and not just on isolated nodes/machines. Our clouds (multiple) are comprised of many service stacks (for example zookeeper-hadoop-accumulo, or elasticsearch-logstash-kibana, etc). And these stacks are comprised of internode / inter-machine dependent services (databases on one set of machines, web services on another etc). And each stack has its own way of dealing with its inter-node/inter-machine dependencies.

Out of the box monit is good for handling the service management and the intra-node service dependencies (service stacks on the same box), but the internode dependencies are outside of monic's current realm as far as I can tell. Which is fine - we use orchestration tooling to handle the inter-node service dependencies.

Here is the problem and the regression. In 5.14, we could set the mode==manual for these internode service stacks, and that had the effect that on a cluster coldstart, which is our term for provisioning and orchestrating a cluster for the very first time from scratch (where there is NO prior .monit.state file etc), monit would as desired correctly NOT start any service. This is because mode==manual - or at least we that is why it works that way.

However, once the service was correctly started by the orchestration layer in the correct inter-node dependent manner, monit would then correctly keep the service running and monitor it. And when a machine was rebooted, since the stack was already up, monit would correctly restart the service on reboot. Since the stack was up, this worked.

All good.

But, in 5.20 I cannot seem to replicate this behavior. Upon a coldstart of the cluster, starting monit will apparently now automatically start all the services. The mode==manual no longer seems to be supported (it is no longer documented). Since monit starts the services on each node regardless of the services that need to be running on the other nodes, the service stacks end up dying on coldstart.

And yes we have set onreboot==laststate. We do not necessarily want onreboot==nostart - if the service stacks are up when a system is rebooted, we want monit to restart them when the system comes back up. This is because this scenario happens most when the rest of the stacks are up on different nodes, so having monit restart them on a rebooting node is generally the correct thing.

From experimentation if we set mode==passive, then we have the correct coldstart behavior as in 5.14. But that requires us to then switch to mode==active AFTER coldstart and AFTER the services have been correctly started by the orchestration layer. Which is not the 5.14 behavior and requires undesirable additional orchestration.

Am I am missing something perhaps?

If not, can we regain the ability to have monit NOT start services on a coldstart (our terminology) but still have the onreboot==laststate functionality and mode==active functionality.

Thanks!

Comments (0)

  1. Log in to comment