Tests Fail with USE_PARALLEL_NETCDF
(1) Build with parallel_netcdf: $make USE_PARALLEL_NETCDF=on
(2) Run tests: $make tests USE_PARALLEL_NETCDF=on
(3) Unit tests then fail:
FAILED: gs2_diagnostics_new (mpirun -np 2 ./test_gs2_diagnostics_new test_gs2_diagnostics_new_append.in)
$ more tests/unit_tests/gs2_diagnostics_new/test_gs2_diagnostics_new_append.error
ERROR: No such file or directory in file: ./test_gs2_diagnostics_new_start.nc
ERROR: NetCDF: Not a valid ID in variable: vnm1
ERROR: NetCDF: Not a valid ID in variable: vnm2
(4) This is a knock-on error arising from the fact that in the preceding test (test_gs2_diagnostics_new_start) parallel_netcdf failed to write the restart file needed in the …_append test (which should be a single restart file because we are specifying USE_PARALLEL_NETCDF)
$more tests/unit_tests/gs2_diagnostics_new/test_gs2_diagnostics_new_start.error
nf90_create error: NetCDF: Parallel operation on file opened for non-parallel access
I find this on Fedora30. Is this affecting other OS distributions? Does anyone else have the same issue?
Is USE_PARALLEL_NETCDF broken?
Comments (7)
-
-
I should note that the example at https://github.com/Unidata/netcdf-fortran/blob/master/examples/F90/simple_xy_par_wr.f90#L67 shows creating a parallel netcdf file but using
nf90_create
rather thannf90_create_par
. I note also theyior
NF90_NETCDF4
withNF90_MPIIO
rather thanNF90_HDF5
withNF90_MPIIO
– I don’t know if this is important.
Did you delete all the existing restart files in the directory before running the test?
-
Which version/package did you install on Fedora30?
-
@Colin Malcolm Roach I just tried to reproduce this on a Fedora 27 machine using the netcdf-fortran-openmpi package on master of GS2 (with GK_SYSTEM=fedora) but I couldn’t reproduce the issue I’m afraid. I got a single restart file correctly created by the test.
Could you report the output of
ldd bin/gs2
? -
Offline conversation suggests this could be due to path order in
LD_LIBRARY_PATH
or equivalent resulting in the executable picking up the serial netcdf library instead of the parallel one. -
- changed status to resolved
Update Makefiles commit to fix issue
#80→ <<cset 2b1edb180bb5>>
-
Merged in bugfix/fix_issue_80_parallel_netcdf_for_fedora (pull request #190)
Update Makefiles commit to fix issue
#80Approved-by: Joseph Parker joseph.parker@stfc.ac.uk Approved-by: Peter Hill peter.hill@york.ac.uk
→ <<cset 4a4e8fe6397f>>
- Log in to comment
Are you running this on a machine with a parallel file system?