- removed comment
make utils and simfactory ignore HWLOC=BUILD and try to copy hwloc executables from /usr/bin
Hello!
Trying to install the Einstein toolkit on a new machine, we came across this seemingly wrong behaviour:
setting HWLOC=BUILD in the optionlist, the ET is built with the bundled hwloc successfully.
However, upon building the utilities, it gives an error when it tries to copy hwloc-ls from /usr/bin to /exe/sim
The error results from hwloc-ls being a broken symlink in the system, but the weird thing is that it actually copies the executables from /usr/bin when we have actually built hwloc using the bundled version.
The same behaviour can also be seen here, where some hwloc executables are copied from /usr/bin, while others are copied from the hwloc bundle:
http://lists.einsteintoolkit.org/pipermail/test/2014-January/000047.html
Copying hwloc-assembler from https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/configs/sim/scratch/external/hwloc/bin/hwloc-assembler to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-assembler-remote from https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/configs/sim/scratch/external/hwloc/bin/hwloc-assembler-remote to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-bind from /usr/bin/hwloc-bind to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-calc from /usr/bin/hwloc-calc to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-distances from https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/configs/sim/scratch/external/hwloc/bin/hwloc-distances to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-distrib from /usr/bin/hwloc-distrib to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-info from /usr/bin/hwloc-info to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-ls from /usr/bin/hwloc-ls to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-ps from /usr/bin/hwloc-ps to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying lstopo from /usr/bin/lstopo to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying lstopo-no-graphics from https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/configs/sim/scratch/external/hwloc/bin/lstopo-no-graphics to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim
Is this wanted? I guess if one specifies HWLOC_DIR=BUILD, it should only use hwloc executables from the actual hwloc that comes bundled, right?
I have attached the optionlist as well as the config-info and a tar of the config-data folder for the build.
Keyword:
Comments (11)
-
-
- removed comment
https://build.barrywardell.net/job/EinsteinToolkitReleased/101/consoleFull also mentions: "hwloc selected, but HWLOC_DIR not set.". HWLOC_DIR=build should not trigger this output. However, build 101 uses ubuntu.cfg that doesn't define HWLOC_DIR, while build 25 (the one references above) probably didn't (Jenkins switched options lists). On the other hand, even without HWLOC_DIR defined, since Jenkins didn't find a system installation, it built the bundled version in 101. Why it didn't find the system version (that apparently is installed) is another matter.
-
- removed comment
Replying to [comment:1 knarf]:
Looking at MPI's configure script, it does seem to strip $ENV{MPI_LIB_DIRS}. However, I don't know why it does contain this line:
print "HWLOC_DIR = $ENV{HWLOC_DIR}\n";
What does MPI have to do with HWLOC? But then, this isn't HWLOC_LIB_DIRS, and shouldn't cause the problem mentioned in this ticket.
The reason for that seems to be that the build script makes use of HWLOC_DIR to point the MPI install to that directory in order to build using hwloc.
-
- attached hwloc_utils.patch
-
- attached MPI_utils.patch
-
The rule is a bit more complex (see https://stackoverflow.com/a/29108622) and the two patches above demonstrate what needs to be done. Basically one can restrict a pattern-rule to apply to only certain files by listing those files before the rule.
-
- edited description
- changed status to open
Please review.
-
Adding what I apparently forgot to post: the reason this happens is the the currently used rules by ExternalLibraries look like this:
$(UTIL_DIR)/%: $(MPI_DIR)/bin/% @echo "Copying $* from $< to $(UTIL_DIR)" -$(MKDIR) $(MKDIRFLAGS) $(UTIL_DIR) 2> /dev/null cp $< $@
it a pattern-rule that says that if you need to build
%
you can build it assuming you have$(MPI_DIR)/bin/%
. Now if$(MPI_DIR)/bin
contains a filehwloc-info
then make will happily run this recipe instead of say the one that has$(HWLOC_DIR)/bin/%
.The fix (above) is to make the rules more specific.
-
Unless objected I will push the changes in the patches above to hwloc and MPI and the other ExternalLibraries copying executables after 2019-11-20.
-
Applied as git hash 69abd31 "hwloc: limit rules for utils to ours only" of ExternalLibraries-hwloc
Applied as git hash 76e2cb8 "MPI: limit rules for utils to ours only" of ExternalLibraries-MPI
Applied as git hash d7929e5 "HDF5: limit rules for utils to ours only" of ExternalLibraries-HDF5
Applied as git hash a7eb905 "PAPI: limit rules for utils to ours only" of ExternalLibraries-PAPI
-
- changed status to resolved
I applied the fix to all ExternalLibraries that use "Copying" in their make.configuration.deps files. This should fix the issue.
However the fix is fragile. A single ExternalLibrary that uses the incorrect pattern:
$(UTIL_DIR)/%: $(PAPI_DIR)/bin/%
instead of
$(patsubst %,$(UTIL_DIR)/%,$(PAPI_UTILS)): $(UTIL_DIR)/%: $(PAPI_DIR)/bin/%
will bring back the bug.
- Log in to comment
This mix is certainly not wanted. Some observations:
For reference: the files that are copied from /usr/bin are also present in the built version. With that, the question is still why some of the files are taken from /usr/bin and some from the built version. Could it be that by default they are taken from /usr/bin, but in case some of them are not present there, the built version is used? To answer that, it would be interesting to know whether, e.g., /usr/bin/hwloc-assembler exists.
In the thorn itself I don't see why, e.g., hwloc-assembler and hwloc-ls should be treated any different. Both are mentioned as standard hwloc binaries.
Looking at https://build.barrywardell.net/job/EinsteinToolkitReleased/101/consoleFull I find "-L/usr/lib" in GENERAL_LIBRARIES, as well as -Wl,-rpath,/usr/lib. I wouldn't expect that there, but I am also not sure if this is related. Both LAPACK and BLAS define their dir to be /usr/lib, but both should be stripped in their configure.sh script (and from the output it looks like that happens).
GENERAL_LIBRARIES also looks odd because of paths for PETSc and hwloc to appear multiple times (but could be due to dependencies).
The option list adds LIBS=-L/usr/lib64, which looks not right, but since this is lib64 and not lib it might also not be related.
Also, /usr/lib appears in INC_DIRS_F, which is strange, and points to MPI (only difference between INC_DIRS and INC_DIRS_F are MPI related). Looking at the output from when PETSc is built, at this point: MPI_LIB_DIRS=/usr/lib. That shouldn't happen.
Looking at MPI's configure script, it does seem to strip $ENV{MPI_LIB_DIRS}. However, I don't know why it does contain this line:
What does MPI have to do with HWLOC? But then, this isn't HWLOC_LIB_DIRS, and shouldn't cause the problem mentioned in this ticket.
One problem, I believe, is found in detect.pl of MPI:
The regex does not trigger on /usr/lib paths, but should. Thus, MPI adds /usr/lib to the -L options.
Now, whether and/or why this might trigger the hwloc-util problem I don't know. At this point I am probably at least as confused as you are reading this. Too many things look iffy: