- removed comment
The test system should treat a nonzero exit code from Cactus as a failure
The test system seems to ignore the fact that Cactus exits with a nonzero exit code. It displays
Cactus exited with error code 1 Please check the logfile...
No files created in test directory
Success: 0 files identical
And in the summary at the end, it treats this as a passing test. In this case, there were no test reference files and no files output, because the test (by design) does not produce any data, it just aborts if the test fails.
Keyword:
Comments (14)
-
-
- removed comment
Ticket
#1689is already open about the issue of how to determine whether a Cactus run was successful or not. We should probably treat this bug here as dependent on#1689, as the solution to#1689might implement something that could then be used by the test system to determine whether a run was indeed successful or not. This "something" might not need to be an exit code. -
reporter - changed status to open
- marked as
- removed comment
Why is it difficult to obtain the exit code of Cactus? Is this because mpirun only returns the exit code of the root process, or because some mpirun implementations are buggy? I think if a nonzero exit code is returned, something has definitely gone wrong, though the converse may not be true.
Setting priority to major because a test will appear to have passed even if there was a fatal error when running Cactus, if that test does not have any output files.
-
- removed comment
It should be possible, with most of the MPI implementations at least, to get a non-zero exit code from a dying simulation. It might not be the one actually triggering the crash, but it would likely be non-zero, and mpirun is probably unlikely to return non-zero for a succeeding run. Detecting this would be good.
Still, in addition I don't see the harm to generate a "sentinel" file that is only created after Cactus successfully reaches TERMINATION. We could (and, if this works, should) then test for both.
-
reporter - removed comment
I would like Cactus to provide a little more information about its termination than just creating a file when it reaches "Done.". For example, I would like to know the reason for termination. Was it due to TerminationTrigger running out of walltime, in which case the simulation is not complete? Or was it due to reaching cctk_final_time, in which case it is. Alternatively, if termination is from CCTK_Error, it would be good to get the error message in a file which is easy to parse, so that it can be displayed in tables of simulation statuses. Probably we want to write a termination file, as you suggest, but we would need to define the format for the content. We should brainstorm on what other things we might want in this file. Does Cactus already "know" the reason for a termination, or do we need to extend the flesh for this? I think this is outside the scope of this ticket, so I have created another one (#1720). I would still like to detect a nonzero exit code, as described in the current ticket. For example, an error which occurred after the termination file was written might cause this.
-
- removed comment
This was independently rediscovered in
#2113. -
- removed comment
I just found a test that likely has been failing for 14 years and was not reported as such. See https://trac.einsteintoolkit.org/ticket/2186
-
- removed comment
This is a bit more complex. I just looked into RunTestUtils.pl which contains these lines:
$retcode = &RunCactus($output,$test,$cmd); chdir $config_data->{"CCTK_DIR"}; # Deal with the error code if($retcode != 0) { print "Cactus exited with error code $retcode\n"; print "Please check the logfile $testdata->{\"$thorn $test TESTRUNDIR\"}$sep$test.log\n\n"; $testdata->{"$thorn FAILED"} .= "$parfile "; $testdata->{"NFAILED"}++; }
and indeed I do see the output
Cactus exited with error code 137 Please check the logfile [...]/TEST/sim/TestArrays/arrays0.log
It just turns out that the fields in $testdata that are set but it seems they have always been ignored (incl. in the commit "cbe7f3d1 - (HEAD) Fixed bug where par files which core dumped would pass (16 years ago)" which I have tried).
-
- changed status to open
- removed comment
-
Unless objected I will commit after 2019-05-14
-
- changed status to resolved
- edited description
-
- changed status to closed
-
- changed status to open
This pushed change fails to handle the case where a failing test has no output file (eg Carpet's 64k2.par test).
-
- changed status to resolved
- Log in to comment
It is difficult to obtain the exit code of Cactus. Instead, we should reject tests without output, e.g. treating them as failing all the time.