Confusing behavior if timeout is present on front-end but not on compute node
The current logic behind make check
(among other targets) attempts to use the timeout
utility to detect hung processes. It tries to validate that the utility exists. However, that validation is performed on the host running make
, while the eventual launch of a test will run timeout
on the compute nodes.
Lacking any special cases, the failure mode for lack of timeout
on (only) the compute nodes will be a message like "FAILED (exitcode=1)", where the exit code may vary. I think this could be improved upon.
IF the command is run by bash
(or other shell(s) we can test), then it may be sufficient to scan the output for : command not found
as we already do for messages regarding fatal signals. However, it is also possible that the command is run directly by the batch system without an intervening shell. I need to look into that.
Comments (4)
-
reporter -
I am not feeling as comfortable about special-casing ": No such file or directory" as I was with ": command not found".
I'm not worried about that, I'd say add both.
This is only a filter and only active when the test has already been determined to have failed, so at worst it prints irrelevant lines in an oddball failure mode
-
reporter Argh. Chrome "ate my homework". So here is a short version of what I'd typed up (with complete outputs) last night:
At least
srun
,jsrun
andhydra
frommpich3
includeNo such file or directory
in their error output for this case.OpenMPI has a long message w/o that string, but I don't currently care sufficiently to worry about recognizing it.
-
reporter - changed status to resolved
Resolve issue 584
This commit resolves issue
#584"Confusing behavior if timeout is present on front-end but not on compute node" by adding two new strings to those wegrep
for in the output of failing test runs.→ <<cset bffd3fe2e419>>
- Log in to comment
Sigh. Even setting aside batch systems, treatment of "wrapper" elements in the command by our own
ssh-spawner
yields an error message other than "command not found" from bash:I am not feeling as comfortable about special-casing ": No such file or directory" as I was with ": command not found".