some tests fail on JFRS-1 Cray XC50 systems with Cray compiler

Issue #46 resolved

Ryusuke Numata created an issue 2019-01-22

The following tests fail.

asym_geo_fourier asym_geo_global job_manage (avail_cpu_time) gs2_gryfx_zonal nonlinear_terms asym_geo_miller gs2_diagnostics_new (all 4 cases) theta_grid le_grids gs2_optimization (2nd one) gs2_init asym_geo_genElong wstar_units cyclon_itg (3rd one)

I attach full output messages. If I change the compiler to Intel or GNU, everything works fine.

Comments (16)

Ryusuke Numata reporter
- attached run_test.sub
- 2019-01-22T12:23:25+00:00
David Dickinson
Also related issue ~~#45~~
- 2019-01-22T12:51:38+00:00
Joseph Parker
@rnumata is this resolved by PR #62?
- 2019-02-11T14:27:39+00:00
Ryusuke Numata reporter
No. These problems on tests are runtime problems. PR #59, ~~#62~~ fix compile time errors.

I've looked at some tests, and have found there certainly exist some problems which are ignored by most of compilers. For example, in gs2_diagnostics, write_omega is called for istep=-1 causing the out-of-bounds error for omegahist_woutunits. Probably, these are not problems on the main code, but just on the drivers of unit tests.

I think these should be fixed, but I'm getting tired of checking all of them as most of users (and compilers) do not care...
- 2019-02-12T06:33:54+00:00
David Dickinson
I'll have a look at fixing these as I have access to a cray compiler -- I find this is often the main challenge of maintaining support for a large range of compilers!
- 2019-02-12T08:42:39+00:00
David Dickinson
- assigned issue to
  
  David Dickinson
- 2019-02-12T08:42:55+00:00
David Dickinson
Thanks for the detailed report.
- 2019-02-12T08:43:14+00:00
Ryusuke Numata reporter
I'm asking a JFRS support to help investigate this issue. Some failures are avoided by setting stacksize unlimited and by changing optimization options.
- 2019-04-25T04:41:43+00:00
Ryusuke Numata reporter
With the help by JFRS support, I’ve figured out all the problems and solutions for JFRS with Cray compiler.
- increase stacksize: On JFRS, stacksize is limited to 8192kb by default. So, users must set stacksize unlimited by hand. (This is not a GS2 problem.)
```
# ulimit -s unlimited
```
- reduce optimization level: The Cray compilers try to do aggressive optimization, which may cause runtime error. PR #25 of Makefiles reduces the default optimization level.
- Cray compiler or MPICH bug: Due to a bug, the cyclone_itg test fails. I will create a PR to sidestep this bug for the moment. This problem has been reported to Cray by JFRS support, so will be fixed. See utils' Issue #9 and PR #22
There’s another Cray compiler bug, which prevents the next branch to be compiled. Using the Cray compiler, the module files (.mod) are placed in the object file location without the -J option, then the compilation fails because the module files cannot be found. This is inconsistent behavior with the online manual. I will create another PR to sidestep this problem. (See Makefiles' PR #27)
- 2019-05-07T07:10:33+00:00
David Dickinson
That’s brilliant, thanks for digging into all of these issues and finding solutions for them. We’ll try to make sure these fixes all get into 8.0.2.
- 2019-05-07T07:48:09+00:00
David Dickinson
Is this resolved now?
- 2019-06-11T18:41:42+00:00
Ryusuke Numata reporter
Yes. All the problems have been resolved now.
- 2019-06-12T00:53:09+00:00
David Dickinson
- changed status to resolved
Fixed in release 8.0.2
- 2019-06-12T07:41:46+00:00
Ryusuke Numata reporter
- changed status to open
It turns out that one of the problem remains unresolved.

gs2_diagnostics_new test fails because an out-of-bounds error occurs in write_omega. This occurs when calling run_diagnostics with istep=-1. For unknown reasons, this out-of-bounds is caught only by Cray. GNU and Intel cannot catch this.

This problem looks harmless, but is clearly a bug, so should be resolved.

‌
- 2020-12-02T01:24:38+00:00
David Dickinson
So it looks like during initialisation for builds with new diagnostics we call run_diagnostics twice, first with istep=-1 and then with istep = 0. Internally it seems new diagnostics uses istep == -1 to indicate that variables should be created but not read or written (see gnostics%create) so the first call is to ensure the variables are created and the second call is meant to populate these variables with the initial values. I think this ideally could do with a lot of redesign.

I think the simplest fix is to use the istep=-1 case to set the gnostics%create and gnostic%write flags as intended but to then replace istep with 0.

I’ve pushed a quick attempt at a fix to the branch bugfix/fix_istep_minus_one_issue_46
- 2020-12-02T08:56:37+00:00
David Dickinson
- changed status to resolved
Further issue fixed in 8.0.6
- 2021-08-19T10:35:58+00:00
Log in to comment

Assignee: David Dickinson

Type: bug

Priority: major

Status: resolved

Component: –

Milestone: –

Version: –

Votes: 0

Watchers: 2