Header deps are unreliable and break build tree

Issue #370 resolved
Dan Bonachea created an issue

If a header file disappears from the source tree, it currently breaks a build directory.

Steps to reproduce (on dirac, gcc 10.1, make 4.3):

  1. git checkout master @ upcxx-2020.3.0
  2. configure and make in a clean directory
  3. git checkout develop
  4. make

Result:

make[8]: 'gen_include/upcxx/upcxx_config.hpp' is up to date.
make[8]: *** No rule to make target '/home/pcp1/bonachea/UPC/bxx-deps/bld/upcxx.assert1.optlev0.dbgsym1.gasnet_seq.smp/include/upcxx/future/impl_mapped.hpp', needed by 'vis.d'.  Stop.
make[7]: *** [/home/pcp1/bonachea/UPC/upcxx/bld/Makefile:399: do-libupcxx] Error 2
make[6]: *** [/home/pcp1/bonachea/UPC/upcxx/bld/Makefile:408: do-upcxx-single] Error 2
make[5]: *** [/home/pcp1/bonachea/UPC/upcxx/bld/Makefile:414: upcxx-single] Error 2
make[4]: *** [/home/pcp1/bonachea/UPC/upcxx/bld/Makefile:428: upcxx.assert1.optlev0.dbgsym1.gasnet_seq.smp/libupcxx.a] Error 2
make[3]: *** [/home/pcp1/bonachea/UPC/upcxx/bld/Makefile:457: do-upcxx-all] Error 2
make[2]: *** [/home/pcp1/bonachea/UPC/upcxx/bld/Makefile:505: do-upcxx-all-debug] Error 2
make[1]: *** [/home/pcp1/bonachea/UPC/upcxx/bld/Makefile:548: all] Error 2

********
UPC++ build failed. Please report the ENTIRE log above to: upcxx@googlegroups.com
********

Comments (6)

  1. Paul Hargrove

    Discussion is Slack reveals that make clean is probably an effective work-around.
    If that is NOT the case, please respond to this comment to refute.

  2. Paul Hargrove

    The root problem is that the dependency information contained in the .d is out-of-date but the regeneration of the dependency info is blocked by it own stale dependency. Since that same dependency information has already been read in by make for the .o, there might be no graceful recovery.

    However, I think I can remove the .d and fail the build with a message such as
    Build aborted due to stale dependency information (removed). Please reissue your command..

    Ideally I could recover and complete the build, but I am not confident that can be done without significant work.

  3. Paul Hargrove
    • changed status to open

    I have begun work on this issue.

    It turns out the *** No rule to make target errors are from the include $(wildcard *.d) line (because GNU make ensures files are up-to-date before inclusion). This makes the problem a bit harder to resolve, but I've found a solution that not only resolves this problem, but decouples dependency tracking in a way that may allow other future improvements.

    I also plan to take this opportunity to follow-up on at least one "TODO" recorded in comments on the existing dependency tracking logic.

  4. Paul Hargrove

    For the record: this issue also impacts the dependency tracking used by the exe and run make targets.

  5. Paul Hargrove

    Makefile: improve dependency tracking

    This commit introduces a level of indirection between the generation and inclusion of files used by GNU make to track dependencies, which resolves issue 370.

    The No rule to make target errors reported in issue 370 were generated at the include $(wildcard *.d) as it is GNU make's policy to ensure all includes are up-to-date before inclusion. By using a level of indirection, we introduce a "shadow" dependency file for which we can catch the case of broken dependencies, and simply force regeneration. A shadow file is copied to become the included file each time it is updated. Since the included files no longer have any explicit dependencies, they cannot lead to failures at include-time, even if they name removed files.

    Additionally, a shadow dependency file now expresses only itself (not the object or executable) as dependent upon the headers reached via preprocess. The object or executable now depends on the dependency file, and thus only indirectly on its contents. However, since the dependency file is rebuilt anytime the files it names change, this is sufficient. This change reduces the volume of dependency information fed to GNU make, eliminates a hack used for multiple-target dependencies from PGI compilers, and generally simplifies the compiler-specific dependency generation "bits".

    The net result is that moving between two branches which both have this commit as ancestors will "do the right thing" with respect to dependency tracking in the case header files have been removed.

    For movement from older branches (such as the 2020.3.0 release) to one containing this work, it is not possible to entirely fix the problem reported as issue 370. Empirically it has been observed that non-parallel builds work, but this is more a "happy accident" than something designed into the solution.

    Where previously the generation of the initial dependency files and object files was concurrent, these are now two distinct steps (with the first competing before the second can begin). However, each step exposes its own full concurrency and no measurable slow down was observed over NFS on Dirac. On a slower shared filesystem such as on Cori, the cost of additional stat calls is small, but measurable.

    This restructuring of the dependency tracking introduces an interesting possibility for the future. Currently the dependency information is generated at the first compile, and each subsequent compile ensures it is up-to-date. However, it is not strictly necessary to construct dependency files on the first compile, since the lack of any object files is sufficient to see them all built. With some additional work, it may be possible to build the dependency files only when the source and object both exist. This means that the end-user who compiles exactly once would see a measurable reduction in built time, while the developer would continue to have dependency tracking beginning with the second compilation.

    → <<cset a0679283b644>>

  6. Log in to comment