- attached run_test.sub
some tests fail on JFRS-1 Cray XC50 systems with Cray compiler
The following tests fail.
asym_geo_fourier asym_geo_global job_manage (avail_cpu_time) gs2_gryfx_zonal nonlinear_terms asym_geo_miller gs2_diagnostics_new (all 4 cases) theta_grid le_grids gs2_optimization (2nd one) gs2_init asym_geo_genElong wstar_units cyclon_itg (3rd one)
I attach full output messages. If I change the compiler to Intel or GNU, everything works fine.
Comments (16)
-
reporter -
Also related issue
#45 -
@rnumata is this resolved by PR #62?
-
reporter No. These problems on tests are runtime problems. PR #59,
#62fix compile time errors.I've looked at some tests, and have found there certainly exist some problems which are ignored by most of compilers. For example, in gs2_diagnostics, write_omega is called for istep=-1 causing the out-of-bounds error for omegahist_woutunits. Probably, these are not problems on the main code, but just on the drivers of unit tests.
I think these should be fixed, but I'm getting tired of checking all of them as most of users (and compilers) do not care...
-
I'll have a look at fixing these as I have access to a cray compiler -- I find this is often the main challenge of maintaining support for a large range of compilers!
-
-
assigned issue to
-
assigned issue to
-
Thanks for the detailed report.
-
reporter I'm asking a JFRS support to help investigate this issue. Some failures are avoided by setting stacksize unlimited and by changing optimization options.
-
reporter With the help by JFRS support, I’ve figured out all the problems and solutions for JFRS with Cray compiler.
- increase stacksize: On JFRS, stacksize is limited to 8192kb by default. So, users must set stacksize unlimited by hand. (This is not a GS2 problem.)
# ulimit -s unlimited
- reduce optimization level: The Cray compilers try to do aggressive optimization, which may cause runtime error. PR #25 of Makefiles reduces the default optimization level.
- Cray compiler or MPICH bug: Due to a bug, the cyclone_itg test fails. I will create a PR to sidestep this bug for the moment. This problem has been reported to Cray by JFRS support, so will be fixed. See utils' Issue
#9and PR #22
There’s another Cray compiler bug, which prevents the next branch to be compiled. Using the Cray compiler, the module files (.mod) are placed in the object file location without the -J option, then the compilation fails because the module files cannot be found. This is inconsistent behavior with the online manual. I will create another PR to sidestep this problem. (See Makefiles' PR #27)
-
That’s brilliant, thanks for digging into all of these issues and finding solutions for them. We’ll try to make sure these fixes all get into 8.0.2.
-
Is this resolved now?
-
reporter Yes. All the problems have been resolved now.
-
- changed status to resolved
Fixed in release 8.0.2
-
reporter - changed status to open
It turns out that one of the problem remains unresolved.
gs2_diagnostics_new test fails because an out-of-bounds error occurs in write_omega. This occurs when calling run_diagnostics with istep=-1. For unknown reasons, this out-of-bounds is caught only by Cray. GNU and Intel cannot catch this.
This problem looks harmless, but is clearly a bug, so should be resolved.
-
So it looks like during initialisation for builds with new diagnostics we call
run_diagnostics
twice, first withistep=-1
and then withistep = 0
. Internally it seems new diagnostics usesistep == -1
to indicate that variables should be created but not read or written (seegnostics%create
) so the first call is to ensure the variables are created and the second call is meant to populate these variables with the initial values. I think this ideally could do with a lot of redesign.I think the simplest fix is to use the
istep=-1
case to set thegnostics%create
andgnostic%write
flags as intended but to then replace istep with 0.I’ve pushed a quick attempt at a fix to the branch bugfix/fix_istep_minus_one_issue_46
-
- changed status to resolved
Further issue fixed in 8.0.6
- Log in to comment