- edited description
Wrong memory usage reported in KVM
Hi,
I’m having troubles with monit 5.27.1, backport, on debian 10.
I installed this backported version to have #843 patch (file change), included since 5.27.0.
This commit consists of looking at “MemAvailable” in /proc/meminfo, if its present, to mitigate a bug due to LXC/KVM not reporting perfect informations about MemFree/Cached/Swap/SReclaim…).
But, unfortunately, I’m still facing to lots of mails & logs complaining about low memory usage (more KO status than OK status) :
[...]
Feb 10 17:14:25 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 17:14:55 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 0.6%]
Feb 10 17:15:25 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 17:16:55 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 17:17:25 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 0.6%]
Feb 10 17:17:55 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 17:43:56 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 17:44:26 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 2.1%]
Feb 10 17:47:26 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 21:52:04 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 21:52:34 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 0.2%]
Feb 10 21:53:34 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 22:10:05 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 22:10:35 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 1.5%]
Feb 10 22:28:36 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 22:41:37 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 22:42:07 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 0.1%]
Feb 10 22:42:37 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
Feb 10 23:03:08 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
Feb 10 23:03:38 omv monit[723]: '\xxxxxxxxxxxx' mem usage check succeeded [current mem usage = 0.1%]
Feb 10 23:14:39 omv monit[723]: '\xxxxxxxxxxxx' mem usage of 251500511232.0% matches resource limit [mem usage > 90.0%]
[...]
I run a watch during a long period to find a relation between monit & /proc/meminfo :
sudo watch "egrep 'MemTotal\:|MemAvailable\:|MemFree\:|Buffer\:|Cached\:|SReclaimable\:' /proc/meminfo && monit status '\xxxxxxxxxxx' && tail -3 /var/log/syslog"
but MemAvailable is always there, with a good value, and I don’t understand why monit doesn’t use it from time to time…?! Following pictures are alternative OK/KO/OK/KO of monit status & content of /proc/meminfo at the same time :
Notice in the previous screenshot that the value of “memory usage” is not correct…
Have you got any idea please ?
Thank you.
Comments (11)
-
reporter -
reporter - edited description
-
reporter - edited description
-
reporter - edited description
-
Same here with v5.28.0.
Before with latest 5.27 it was working. But before it already happens.
Using: Ubuntu 20.04.2 LTS
-
repo owner Thank you for data. It seems that the MemTotal is frequently changing on your system. The MemTotal is initiated only on Monit start and used as
systeminfo.memory.size
.It seems that the MemAvailable sporadically exceeds the MemTotal value, which was read on Monit start, which triggers the probem (unsigned integer overflow, which ends up as huge value).
The same error had to be present in <= 5.27.0 (MemTotal was initiated also on Monit start only), but due to different formula, the overflow probably didn’t occur that often (the calculated memory usage had to be wrong, if MemTotal is dynamic).
We’ll fix
-
repo owner - changed status to resolved
Fixed: Issue
#960: The memory usage may report wrong value if system memory size changes after Monit start. The problem was frequent on KVM/LXC containers where MemTotal is dynamicaly updated.→ <<cset ba407bd1d4f3>>
-
repo owner Fixed: Issue
#960→ <<cset fc590519f86f>>
-
Thanks! You’re awesome!
-
For my understanding: so the version 5.28.1 will contain this fix?
-
repo owner yes, 5.28.1 will contain the fix
- Log in to comment