infinite make loop building on Blue Waters

Issue #2345 resolved
Roland Haas created an issue

Building on Blue Waters am right now ending up with make entering an infinite loop and using 100% cpu time when building in parallel. Running with “make --debug=vmj” one can see that it enters some infinite loop and the lines in the attached file make-780.log (which is from the 780th loop and exactly identical to the output of the 779th loop).

This may be a bug in make v3.81 that is used on Blue Waters and was encountered by others here: https://www.bountysource.com/issues/65269718-cockroach-2-1-0 with them finding this make commit http://git.savannah.gnu.org/cgit/make.git/commit/?id=b9f831b858761366e0db418e6f226a053ed550af to fix it (and not finding a workaround).

Compiling with ‘-j1’ works fine but is very slow on Blue Waters due to the slow file system.

I will try and see if I can somehow perturb the problem out of existence. Failing that -j1 may be required on BW.

Comments (13)

  1. Roland Haas reporter

    I had hoped the reverting Formaline commits A ff8e96f "Formaline: mark tarballs as INTERMEDIATE rather than PRECIOUS", 780936d "Formaline: mark non-atomically created files SECONDARY" , and fa3e623 "Formaline: remove temporary source files once no longer needed" would be sufficient perturbation since they were one of the more recent changes to the build system.

  2. Roland Haas reporter

    The issue persists even if I wind back all repos to ET_2019_11 (except wvuthorns which contains new thorns in master), at least on mike (I have not tested the others).

    I will try what happens if I rewind wvuthorns as well to check if this is triggered solely by the length of out thornlist.

  3. Roland Haas reporter

    Even reverting everything to ET_2019_10 and using the ET_2019_10 thornlist did not fix this. Next will be a fresh ET_2019_10 checkout though I suspect that one may have to live with this.

  4. Roland Haas reporter

    It turns out that starting a clean compile from ET_2019_10 does not show the problem, however if a compile on master shows the loop then reverting everything to ET_2019_10 does not fix the issue unless one also removes configs/sim/scratch/Formaline which triggers a rebuild of the tarballls. The Formaline changes changed for how long tarballs are kept (they are intermediate files only).

    I will try and see if merging the two make rules in question would help, since they were only ever executed serially one after the other anyway, and if that does not help revert the Formaline changes. Reverting them will make a compiled tree a couple hundred MB larger which is normally not an issue (except on mike, shelob, qb where $HOME is small).

  5. Log in to comment