- marked as critical
infinite make loop building on Blue Waters
Building on Blue Waters am right now ending up with make entering an infinite loop and using 100% cpu time when building in parallel. Running with “make --debug=vmj” one can see that it enters some infinite loop and the lines in the attached file make-780.log (which is from the 780th loop and exactly identical to the output of the 779th loop).
This may be a bug in make v3.81 that is used on Blue Waters and was encountered by others here: https://www.bountysource.com/issues/65269718-cockroach-2-1-0 with them finding this make commit http://git.savannah.gnu.org/cgit/make.git/commit/?id=b9f831b858761366e0db418e6f226a053ed550af to fix it (and not finding a workaround).
Compiling with ‘-j1’ works fine but is very slow on Blue Waters due to the slow file system.
I will try and see if I can somehow perturb the problem out of existence. Failing that -j1
may be required on BW.
Comments (13)
-
reporter -
reporter - marked as blocker
-
reporter I had hoped the reverting Formaline commits A ff8e96f "Formaline: mark tarballs as INTERMEDIATE rather than PRECIOUS", 780936d "Formaline: mark non-atomically created files SECONDARY" , and fa3e623 "Formaline: remove temporary source files once no longer needed" would be sufficient perturbation since they were one of the more recent changes to the build system.
-
reporter The issue persists even if I wind back all repos to ET_2019_11 (except wvuthorns which contains new thorns in master), at least on mike (I have not tested the others).
I will try what happens if I rewind wvuthorns as well to check if this is triggered solely by the length of out thornlist.
-
reporter Even reverting everything to ET_2019_10 and using the ET_2019_10 thornlist did not fix this. Next will be a fresh ET_2019_10 checkout though I suspect that one may have to live with this.
-
reporter It turns out that starting a clean compile from ET_2019_10 does not show the problem, however if a compile on master shows the loop then reverting everything to ET_2019_10 does not fix the issue unless one also removes configs/sim/scratch/Formaline which triggers a rebuild of the tarballls. The Formaline changes changed for how long tarballs are kept (they are intermediate files only).
I will try and see if merging the two make rules in question would help, since they were only ever executed serially one after the other anyway, and if that does not help revert the Formaline changes. Reverting them will make a compiled tree a couple hundred MB larger which is normally not an issue (except on mike, shelob, qb where $HOME is small).
-
reporter - changed status to resolved
A workaround was applied in git hash 93d691f "Formaline: combine rules to make tarball and blob" of cactusutils.
I added a not to
#2316. -
reporter - changed status to open
Happens once more on mike, most likely to still existing
.SECONDARY
targets. -
reporter Since this is a bug in make and there seems to be no workaround, all we can do is try and make this less likely to happen.
Pull request:
tries to do that by declaring (one) fewer files as secondary.
-
reporter Please review.
-
reporter Unless objected I will apply this after 2020-10-07.
-
reporter Applied as git hash 464de37 "Formaline: do not declare sentinel file for code change secondary" of cactusutils
-
reporter - changed status to resolved
- Log in to comment
Also affects: mike, shelob, qb all of which use the same make version.