ExternalLibraries/libjpeg thorn adds problematic system library path to Cactus
I have been trying to work with the new ET_2020_11 version, but while the compilation seems to work, upon submission of a run, MPI immediately generates an error message connected to a failure of the MPI_INIT routine (output in "run_name".err):
*** An error occurred in MPI_Init_thread *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [cn076:4193] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init_thread *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [cn076:4192] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[62070,1],1] Exit code: 1 --------------------------------------------------------------------------
This is indicating some clash with OpenMPI. This turned out to be connected to the ExternalLibraries/libjpeg thorn. The thorn ExternalLibraries/libjpeg finds a jpeg library in the directory "/usr/lib/x86_64-linux-gnu", and adds this path via the -L option to
the library search path of Cactus. Unfortunately, this is a system library path. As a consequence, the linker then picks the MPI library in this directory instead of the one specified via MPI_DIR, which creates the inconsistencies that cause the failure.
Upon exclusion of this thorn and the connected CactusIO/IOJpeg thorn and recompilation this error does not occur anymore and all libraries are imported correctly. I do not use these thorns, so this is not a problem for me, but it would be great, if someone could have a look at this.
(error encountered by me and source identified by @Erik Schnetter )
Comments (4)
-
-
- changed status to open
-
reporter I will look into this. Thanks for the explanation!
-
hackathon (Shell scripting)
- Log in to comment
This is basically the inverse of https://bitbucket.org/einsteintoolkit/tickets/issues/2428/possible-issue-between-externallibraries and a partial fix was implemented in git hash b78b0a1f "Cactus: New scripts to strip standard directory from include and library paths" of cactus 7 years ago.
A workaround without changing code is to set
LIBJPEG_DIR = BUILD
forcing a build from source and thus avoiding the system directory.A proper (for some notion of proper, really more a hack) fix would be to ensure that in
ExternalLibraries/libjpeg/configure.sh
theLIBJPEG_LIB_DIRS
andLIBJPEG_INC_DIRS
variables are passed through${CCTK_HOME}/lib/sbin/strip-libdirs.s
and${CCTK_HOME}/lib/sbin/strip-incdirs.s
respectively as is done eg inarrangements/ExternalLibraries/hwloc/src/detect.sh
.If you have time and could produce a pull request with those changes that would be great. If you have more time and would also add the same code to
arrangements/ExternalLibraries/HDF5/src/detect.sh
andGSL/src/detect.sh
, where it is also missing, that would be even better.