Header compile failure with GCC 10.1 and -std=c++2a on non-trivial collectives

Issue #368 resolved
Dan Bonachea created an issue

Now that GCC 10.1 is officially released (and deployed in our dirac CI) we have a new compile error in our public headers when compiled with GCC 10.1 and -std=c++2a

  1. Example (test/collectives)
  2. Example (issue 234)
  3. Example (issue 267)

Representative compile errors:

 /home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx-inst/bin/upcxx -o collectives-par /home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx/test/collectives.cpp -std=c++2a
In file included from /home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx-inst/upcxx.debug.gasnet_par.ibv/include/upcxx/backend.hpp:214,
                 from /home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx/test/collectives.cpp:2:
/home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx-inst/upcxx.debug.gasnet_par.ibv/include/upcxx/backend/gasnet/runtime.hpp: In instantiation of 'void upcxx::backend::bcast_am_master(const upcxx::team&, Fn&&) [with upcxx::progress_level level = upcxx::progress_level::user; Fn = upcxx::bound_function<upcxx::broadcast_nontrivial<short unsigned int, upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, upcxx::progress_level::user> >, short unsigned int>::<lambda(short unsigned int&&)>, short unsigned int>]':
/home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx-inst/upcxx.debug.gasnet_par.ibv/include/upcxx/broadcast.hpp:78:53:   required from 'typename upcxx::detail::completions_returner<upcxx::detail::event_is_here, upcxx::detail::broadcast_scalar_event_values<T>, Cxs>::return_t upcxx::broadcast_nontrivial(T1&&, upcxx::intrank_t, const upcxx::team&, Cxs) [with T1 = short unsigned int; Cxs = upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, upcxx::progress_level::user> >; T = short unsigned int; typename upcxx::detail::completions_returner<upcxx::detail::event_is_here, upcxx::detail::broadcast_scalar_event_values<T>, Cxs>::return_t = upcxx::future1<upcxx::detail::future_kind_shref<upcxx::detail::future_header_ops_promise>, short unsigned int>; upcxx::intrank_t = int]'
/home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx/test/collectives.cpp:37:72:   required from here
/home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx-inst/upcxx.debug.gasnet_par.ibv/include/upcxx/backend/gasnet/runtime.hpp:563:37: error: use of deleted function 'upcxx::backend::gasnet::bcast_payload_header::bcast_payload_header()'
  563 |     bcast_payload_header *payload = new(am_buf.buffer) bcast_payload_header;
      |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx-inst/upcxx.debug.gasnet_par.ibv/include/upcxx/backend.hpp:214,
                 from /home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx/test/collectives.cpp:2:
/home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx-inst/upcxx.debug.gasnet_par.ibv/include/upcxx/backend/gasnet/runtime.hpp:172:10: note: 'upcxx::backend::gasnet::bcast_payload_header::bcast_payload_header()' is implicitly deleted because the default definition would be ill-formed:
  172 |   struct bcast_payload_header {
      |          ^~~~~~~~~~~~~~~~~~~~
/home/data2/upcnightly/dirac/EX-dirac-ibv-gcc/work/dbg/upcxx-inst/upcxx.debug.gasnet_par.ibv/include/upcxx/backend/gasnet/runtime.hpp:176:33: error: union member 'upcxx::backend::gasnet::bcast_payload_header::<unnamed union>::rdzv_refs' with non-trivial 'constexpr std::atomic<long int>::atomic()'
  176 |       std::atomic<std::int64_t> rdzv_refs;
      |                                 ^~~~~~~~~

In apparently all cases, the errors are being generated while instantiating calls to the unspecified entry points upcxx::broadcast_nontrivial or upcxx::reduce_all_nontrivial.

We should either repair the implementation to be C++2a compliant, or remove these unspecified entry points.

Comments (6)

  1. Paul Hargrove

    In my testing on Dirac, I find that upcxx::broadcast_nontrivial can be trivially removed from test/collective.cpp. Simply wrapping them in #if 0 yields a test which compiles and passes its own validation with default compiler flags.

    Removing use of upcxx::reduce_all_nontrivial (if we want to go that direction) looks to require more surgical edits, but I don't see evidence of producer/consumer relationships which would be broken by their removal. Trying this out is now a task in my personal "backfill queue".

  2. Paul Hargrove

    This attached simple patch (just #if 0 and #endif lines) is sufficient to resolve the problem in my testing on Dirac using GCC-10.1.0 (module load PrgEnv/gnu/10.1.0) and configured using --with-cxxflags=-std=c++2a. Of course the test continues to PASS at 4 ranks (tried smp, udp and ibv). I also checked RANKS=8, 12, 13 and 15 w/ ibv to be sure.

    It is interesting to note that this patch comments-out only 3 of the 4 uses of the "nontrivial" collectives. This fact might help John to narrow in on a solution by observing what differs in the "bad" and "ok" instances.

    To be clear, I've provided this as an attachment rather than a PR because I see this as a "fall back" solution for the upcoming patch release, with actually fixing the root cause being my strong preference.

  3. Dan Bonachea reporter

    Merging release upcxx-2020.3.2

    • upcxx-2020.3: ChangeLog: update release date 2020.3.2 package version bumps Make future and serialization tests "full" upc++ ChangeLog cleanups Update ChangeLog
    • Added test/regression/issue380.cpp - Fixed issue 380 where rput with source+operation cx failed to compile. Fix issue 368. Needed to add a default constructor for bcast_payload_header in runtime.hpp since it has a union containing a non-trivially constructible member (std::atomic). Add a Serialization example with an abstract base class ChangeLog.md bugfix entry Bugfix issue 369. <completion>::as_future() was typically leaking a small 56 byte heap struct per completion notification due to a refcount bungling in detail::persona_tls::fulfill_during_user_of_active(). configure: search PATH for python3 and python2 configure: implement check_tool_path() Update ChangeLog Add issue371 test Fix issue 371: team_id's are not "universal" as documented Update ChangeLog Add issue343 test fix issue 343: ensure that default-construced team_id provide a unique invalid team_id Update ChangeLog Improve bad_shared_alloc to output shared heap status Add a test to demonstrate shared heap allocation failures Tweak shared heap allocation exceptions test/memberof.cpp: silence a harmless warning with -std=c++2a docs: revise Intel/libstdc++ recommendation bench/misc_perf: avoid C++20 deprecation warnings INSTALL.md: Clarify recommendations for using parallel make ChangeLog.md: list fix #356 Fully qualified std::foo to ::std::foo in UPCXX_SERIALIZED_FOO macros. For members injected into user classes by UPCXX_SERIALIZED_* macros: This fixes issue 356 where non-public constructors were inaccessible to UCPXX_SERIALIZED_XXX macros. The fix entails smuggling a public static function into the class that then calls the constructor. Add issue356 regression test Update ChangeLog ChangeLog: document CROSS to UPCXX_CROSS rename configure: automatically cross-compile on Cray XC Start a ChangeLog for hotfix release

    → <<cset e35f012d9670>>

  4. Log in to comment