Does Monit manage Hardware timer/watchdog on Rasperry?

Issue #819 new
Barabba created an issue

Hi everyone, thank you a lot for your kind support. I've a Raspberry with latest updates and I'm suffering from some "hangs", TCP stack after about 10 days stop working, some other services like log2ram too. I'm sure I've good power supply, maybe SD is not the best but the log is on ram, anyway I've to protect the system and assure it keeps running, you have severar interesting functions here.. my question is:

Does it supports also the Raspberry Broadcom BCM2835 internal timer that may trig an hardware reset if the system become not recoverable? I've read the manual but can't find any info. If not may you please include it, as last "chance"? It would be useful to comand a dismount of volumes (if it may work anyway) before let the timer reset the system. Thank you

Comments (4)

  1. Avraham Shukron

    I don't think it supports refreshing the WD on its own, but you can always write a simple script that does that, and then run it every N seconds with the check program option

  2. Massimo Sala

    I agree with Avraham.

    In the past I worked on embedded systems.

    Usually when you think about a hardware watchdog… it is the last and only safeguard, you cannot do nothing at higher levels (software) to “recover” the corrupted system, only power down / reboot.

    The major problem: if the system is in a faulty state… who does assure you the running processes, as Monit itself, are still running properly and can interact correctly with the operating system and other processes?

    The idea to trigger from software a hardware watchdog, IMHO it is feasible only if you will find

    • how-to foresee possible errors conditions
    • and these test are very critical: are you sure they will run also in a corrupted system?

    I strongly suggest you to look on the Raspberry Pi Stack Exchange

  3. Linus s.Gates

    I also agree with @Avraham Shukron and applied that solution on my embedded linux board as well, however is does has a big drawback - it spams the kernel log every N seconds with the message: watchdog: watchdog0: watchdog did not stop!

    It looks like watchdog devices aren’t meant to be closed which happens every N seconds….

    I suggest that a new check would be introduced into monit so it will open & close the fd only once.

  4. Log in to comment