ExternalLibraries/libjpeg thorn adds problematic system library path to Cactus

Issue #2528 open
Michael Müller created an issue

I have been trying to work with the new ET_2020_11 version, but while the compilation seems to work, upon submission of a run, MPI immediately generates an error message connected to a failure of the MPI_INIT routine (output in "run_name".err):

*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[cn076:4193] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[cn076:4192] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[62070,1],1]
  Exit code:    1
--------------------------------------------------------------------------

This is indicating some clash with OpenMPI. This turned out to be connected to the ExternalLibraries/libjpeg thorn. The thorn ExternalLibraries/libjpeg finds a jpeg library in the directory "/usr/lib/x86_64-linux-gnu", and adds this path via the -L option to
the library search path of Cactus. Unfortunately, this is a system library path. As a consequence, the linker then picks the MPI library in this directory instead of the one specified via MPI_DIR, which creates the inconsistencies that cause the failure.

Upon exclusion of this thorn and the connected CactusIO/IOJpeg thorn and recompilation this error does not occur anymore and all libraries are imported correctly. I do not use these thorns, so this is not a problem for me, but it would be great, if someone could have a look at this.

(error encountered by me and source identified by @Erik Schnetter )

Comments (4)

  1. Roland Haas

    This is basically the inverse of https://bitbucket.org/einsteintoolkit/tickets/issues/2428/possible-issue-between-externallibraries and a partial fix was implemented in git hash b78b0a1f "Cactus: New scripts to strip standard directory from include and library paths" of cactus 7 years ago.

    A workaround without changing code is to set LIBJPEG_DIR = BUILD forcing a build from source and thus avoiding the system directory.

    A proper (for some notion of proper, really more a hack) fix would be to ensure that in ExternalLibraries/libjpeg/configure.sh the LIBJPEG_LIB_DIRS and LIBJPEG_INC_DIRS variables are passed through ${CCTK_HOME}/lib/sbin/strip-libdirs.s and ${CCTK_HOME}/lib/sbin/strip-incdirs.s respectively as is done eg in arrangements/ExternalLibraries/hwloc/src/detect.sh.

    If you have time and could produce a pull request with those changes that would be great. If you have more time and would also add the same code to arrangements/ExternalLibraries/HDF5/src/detect.sh and GSL/src/detect.sh, where it is also missing, that would be even better.

  2. Log in to comment