- removed comment
Without MPI installed, Cactus doesn't build
I attempted to test our automatic building of Cactus on a system without MPI installed. In principle, thorn MPI should build and the system should work. Instead, I get this:
MPI: Building...
Making all in config
Making all in contrib
Making all in opal
Making all in include
Making all in asm
CC asm.lo
ln -s "../../opal/asm/generated/atomic-amd64-linux.s" atomic-asm.S
CPPAS atomic-asm.lo
CCLD libasm.la
../../libtool: line 6000: cd: NO_BUILD/lib: No such file or directory
libtool: link: cannot determine absolute directory name of `NO_BUILD/lib'
Makefile:1584: recipe for target 'libasm.la' failed
make[6]: *** [libasm.la] Error 1
Makefile:2153: recipe for target 'all-recursive' failed
make[5]: *** [all-recursive] Error 1
Makefile:1702: recipe for target 'all-recursive' failed
make[4]: *** [all-recursive] Error 1
Died at /home/etuser/Cactus/arrangements/ExternalLibraries/MPI/src/build.pl line 74.
/home/etuser/Cactus/arrangements/ExternalLibraries/MPI/src/make.code.deps:9: recipe for target '/home/etuser/Cactus/configs/sim/scratch/done/MPI' failed
make[3]: *** [/home/etuser/Cactus/configs/sim/scratch/done/MPI] Error 17
/home/etuser/Cactus/lib/make/make.thornlib:112: recipe for target 'make.checked' failed
make[2]: *** [make.checked] Error 2
/home/etuser/Cactus/lib/make/make.configuration:181: recipe for target '/home/etuser/Cactus/configs/sim/lib/libthorn_MPI.a' failed
make[1]: *** [/home/etuser/Cactus/configs/sim/lib/libthorn_MPI.a] Error 2
Makefile:256: recipe for target 'sim' failed
make: *** [sim] Error 2
The command '/bin/sh -c ./simfactory/bin/sim build -j8 --thornlist ../einsteintoolkit.th' returned a non-zero code: 1
The test system is made from docker
FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y libfftw3-dev libssl-dev libhdf5-dev subversion gcc curl libjpeg-turbo?-dev git make pkg-config g++ libpapi-dev patch libgsl-dev libhwloc-dev python liblapack-dev numactl gfortran
RUN adduser etuser
USER etuser
WORKDIR /home/etuser
ENV USER etuser
RUN curl -kLO https://raw.githubusercontent.com/gridaphobe/CRL/master/GetComponents
RUN chmod a+x GetComponents
RUN ./GetComponents --parallel https://bitbucket.org/einsteintoolkit/manifest/raw/master/einsteintoolkit.th
RUN echo testme > .hostname
WORKDIR /home/etuser/Cactus
RUN ./simfactory/bin/sim setup-silent
RUN ./simfactory/bin/sim build -j8 --thornlist ../einsteintoolkit.th
Keyword: None
Comments (10)
-
-
reporter - removed comment
Roland, this uses generic.cfg. I didn't think I needed the full log since the included docker file completely reproduces the problem. I can regen it, though.
-
- removed comment
Having instructions to reproduce is definitely a plus. Yet having log files would mean less of a burden I think for those who would like to help fix it since they can already guess what could be happening. Docker is quite a hurdle for me for example since I have to look up every single command for it on the internet.
Using generic.cfg and getting errors about NO_BUILD is very strange indeed.
-
reporter - changed status to open
- removed comment
It turns out the problem is fairly simple. The configuration of HWLOC was broken. This fixes it:
=================================================================== --- arrangements/ExternalLibraries/MPI/src/build.pl (revision 87) +++ arrangements/ExternalLibraries/MPI/src/build.pl (working copy) @@ -62,7 +62,7 @@ print "MPI: Configuring...\n"; chdir(${NAME}); my $hwloc_opts = ''; -if ($ENV{HWLOC_DIR} ne '') { +if ($ENV{HWLOC_DIR} ne '' and $ENV{HWLOC_DIR} ne 'NO_BUILD') { $hwloc_opts = "--with-hwloc='$ENV{HWLOC_DIR}'"; } # Cannot have a memory manager with a static library on some systems
-
- removed comment
I see. This seems somewhat ugly, since we have to look for "magic" directory names. It seems workable but is required everywhere else where we may refer to
XXX_DIR
.I would also try to improve hwloc's
HWLOC_DIR
setting logic? Ie.HWLOC_DIR="$(echo ${HWLOC_INC_DIRS} NO_BUILD | sed 's!/[^/]* *!!')"
seems a bit strange to me. I kind of understand what it wants to do, namely use
HWLOC_INC_DIRS
then remove the last part of the path (presumably "include") from the first one found, or useNO_BUILD
ifHWLOC_INC_DIRS
is empty. Would it make sense to try and return only a single word inHWLOC_INC_DIRS
? Right now it may be "/home/sw/ NO_BUILD".I would try for the output of
pkg-config hwloc --variable=prefix
as well. This is not foolproof as the variable does not have to exist.
-
reporter - removed comment
My recollection is that we decided sometime ago that these packages would use the special string NO_BUILDDIR rather than the empty string to signify that the build dir was not set. I believe the hwloc thorn is doing the correct thing when it sets that option.
-
- removed comment
Looking at what HWLOIC is doing, it seems that it is setting HWLOC_DIR to "NO_BUILD" if it can find all the "-l" options from pkg-config but cannot extract a directory from the CFLAGS options that pkg-config reports. Which is not the same as if a variable was not set (by a user). It is an inability by hwloc to determine an installation prefix directory for itself. Admittedly given that Ubuntu now uses /usr/lib/x86_64-linux-gnu/ instead of the traditional /usr/lib, a prefix (namely "/usr") is lees useful since one cannot expect anymore that given a prefix the include directory is prefix/include and the lib is prefix/lib (other than using the compatibility directories like /usr/lib/x86_64-linux-gnu/hdf5/serial/ that are sometimes provided).
I am not sure what the current agreed upon convention is? Can one expect that HWLOC_DIR/lib and HWLOC_DIR/include exist and contain the libs and header files or is all one can rely on that HWLOC_DIR is some directory associated with HWLOC (eg HWLOC_DIR/bin contains utilities)?
Having HWCLOC_DIR set to NO_BUILD does indeed allow other externallibraries that require HWLOC take steps in case HWLOC is in a system location (which is why there is no -I in CFLAGS) and at least abort the build if the, for whatever reason, do need a prefix directory that is valid.
Such a convention should ideally be documented in the minutes or wiki if possible to make sure that all thorns use the same magic value HWLOC uses NO_BUILD right now (so a different magic value). The same logic (and magic value) should then be implemented in all other ExternalLibraries.
-
- removed comment
Steve, since your patch seems to make things work and no better alternative has been proposed, would you mind applying it, please?
At the same time, it would be great if you could document the convention that this patch uses in the wiki: https://docs.einsteintoolkit.org/et-docs/Improving_the_treatment_of_external_libraries
-
reporter - changed status to resolved
- removed comment
Fixed in revision 88 of the svn repo.
-
- edited description
- changed status to closed
- Log in to comment
Unless this is ticket to serve only as a reminder for yourself, could you include the full build log obtained from:
as well as whatever option list, machine.ini ended up being used, please?
Otherwise this seems like poking in the dark for all of us. I would be particularly curious where the NO_BUILD is coming from (which does not eg appear in generic.cfg).