- changed status to invalid
"TOTAL CPU" usage calculation incorrect
This is related to the fix for Issue #230, commit 215da7aa86fd
Your calculation needs to sum up all the threads of the parent AND child processes, not just the threads of the parent process.
Say you have one parent process and 10 child processes (all single thread) on a 20 core machine. Even though you could be using 11 cores, your current calculation causes monit to report 100% utilization if each process uses 1/11th of a core.
Comments (14)
-
repo owner -
reporter I realize this, BUT "total cpu" check and the "total cpu" shown on the main page is wrong and exhibits the behavior I describe. "total cpu" does NOT account for the child processes and threads, when it should.
-
reporter - changed status to open
"total cpu" check and the "total cpu" shown on the main page is wrong and exhibits the behavior I describe.
-
reporter Example: A two threaded process with 9 two threaded child processes running on a 20 core machine will show 100% total cpu utilization if each and trigger a 100% total cpu check even if each process is using 20% of the CPU. But total cpu should really only show 20% utilization (since you have 10 total processes, 20 total threads, on a 20 core machine, each running at 20% utilization).
-
repo owner - changed title to "TOTAL CPU" usage calculation incorrect
-
reporter I believe the root cause of this bug is in how you calculate cpu_total. https://bitbucket.org/tildeslash/monit/src/215da7aa86fdd33040980deaa1aff73189dd6e00/src/process.c?fileviewer=file-view-default
Line 108.
Here, you add up the percentage usages of all child processes into the parent. However, you don't adjust/compute the parent's divisor to account for number of child processes or threads.
So what happens is that each child can be operating at, say, 20% utilization, but if you have 5 or more (even if you have, say, 100 processors), it will always come up with 100%.
-
reporter FWIW, if you wanted to solve the original issue of detecting pegged child processes as well, you'd want to introduce a new check, like:
"if per cpu..." that triggers if anyone of the child processes or parent processes goes above the given threshold.
"if cpu..." would trigger if the parent process goes above threshold
"if total cpu..." would trigger if the aggregate goes above threshold (based on a corrected calculation)
"if per cpu..." would trigger if any of the processes went above the threshold
-
repo owner - changed status to resolved
Fixed: Issue
#657: Fix the "total cpu usage" test for processes with children.→ <<cset 4f861369b197>>
-
repo owner Hello, the problem is fixed.
If you want, you can test the development version :
wget https://bitbucket.org/tildeslash/monit/get/master.tar.gz tar -xzf master.tar.gz cd tildeslash* ./bootstrap ./configure make
-
repo owner Issue
#526was marked as a duplicate of this issue. -
reporter Hi, 5.24.0 appears to have CPU % calculation broken (for total CPU). It is showing 0% CPU utilization in the cases above (child processes with activity).
-
reporter - changed status to open
5.24.0 appears to have CPU % calculation broken (for total CPU). It is showing 0% CPU utilization in the cases above (child processes with activity).
-
repo owner - changed status to resolved
Fixed: Issue
#657: Fix the "total cpu usage" test for processes with children.→ <<cset 9ab6a3a00505>>
-
repo owner I'm sorry, the problem is fixed now, you can test if you want:
wget https://bitbucket.org/tildeslash/monit/get/master.tar.gz tar -xzf master.tar.gz cd tildeslash* ./bootstrap ./configure make
- Log in to comment
The current calculation is correct.
There are two CPU usage tests:
1.) "if cpu ..." - this test calculates only the process itself
2.) "if total cpu ..." - this test calculates the process AND all its children