Wrong memory usage reported in KVM

Issue #960 resolved
Rudy RudyBzh created an issue

Hi,

I’m having troubles with monit 5.27.1, backport, on debian 10.
I installed this backported version to have #843 patch (file change), included since 5.27.0.

This commit consists of looking at “MemAvailable” in /proc/meminfo, if its present, to mitigate a bug due to LXC/KVM not reporting perfect informations about MemFree/Cached/Swap/SReclaim…).
But, unfortunately, I’m still facing to lots of mails & logs complaining about low memory usage (more KO status than OK status) :

[...]
Feb 10 17:14:25 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 17:14:55 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 0.6%]
Feb 10 17:15:25 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 17:16:55 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 17:17:25 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 0.6%]
Feb 10 17:17:55 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 17:43:56 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 17:44:26 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 2.1%]
Feb 10 17:47:26 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 21:52:04 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 21:52:34 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 0.2%]
Feb 10 21:53:34 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 22:10:05 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 22:10:35 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 1.5%]
Feb 10 22:28:36 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 22:41:37 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 22:42:07 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 0.1%]
Feb 10 22:42:37 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 23:03:08 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 23:03:38 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 0.1%]
Feb 10 23:14:39 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]

I run a watch during a long period to find a relation between monit & /proc/meminfo :

sudo watch   "egrep 'MemTotal\:|MemAvailable\:|MemFree\:|Buffer\:|Cached\:|SReclaimable\:' /proc/meminfo && monit status '\xxxxxxxxxxx' && tail -3 /var/log/syslog"

but MemAvailable is always there, with a good value, and I don’t understand why monit doesn’t use it from time to time…?! Following pictures are alternative OK/KO/OK/KO of monit status & content of /proc/meminfo at the same time :

Notice in the previous screenshot that the value of “memory usage” is not correct…

Have you got any idea please ?

Thank you.

Comments (11)

  1. Michael Hadorn

    Same here with v5.28.0.

    Before with latest 5.27 it was working. But before it already happens.

    Using: Ubuntu 20.04.2 LTS

  2. Tildeslash repo owner

    Thank you for data. It seems that the MemTotal is frequently changing on your system. The MemTotal is initiated only on Monit start and used as systeminfo.memory.size.

    It seems that the MemAvailable sporadically exceeds the MemTotal value, which was read on Monit start, which triggers the probem (unsigned integer overflow, which ends up as huge value).

    The same error had to be present in <= 5.27.0 (MemTotal was initiated also on Monit start only), but due to different formula, the overflow probably didn’t occur that often (the calculated memory usage had to be wrong, if MemTotal is dynamic).

    We’ll fix

  3. Tildeslash repo owner

    Fixed: Issue #960: The memory usage may report wrong value if system memory size changes after Monit start. The problem was frequent on KVM/LXC containers where MemTotal is dynamicaly updated.

    → <<cset ba407bd1d4f3>>

  4. Log in to comment