Regression: upcxx::copy(remote_cx::as_rpc), PR 289 fix breaks PGI optimizer

Issue #421 resolved
Dan Bonachea created an issue

Test program:

#include <upcxx/upcxx.hpp>
#include <iostream>
#include <assert.h>

using namespace upcxx;

int main() {
  upcxx::init();

  global_ptr<int> gp1 = new_<int>();
  global_ptr<int> gp2 = new_<int>();
  *gp1.local() = 42;
  static bool done = false;
  copy(gp1, gp2, 1, remote_cx::as_rpc([]() { done = true; }));
  while (!done) upcxx::progress();
  assert(*gp2.local() == 42);

  barrier();
  delete_(gp1);
  delete_(gp2);
  if (!rank_me()) std::cout << "SUCCESS" << std::endl;
  upcxx::finalize();
  return 0;
}

This compiles and works as expected in all of 2019.9.0, 2020.3.2 and 2020.3.8. However on our pending release branch (effectively equal to develop) this fails to compile as follows:

In file included from /usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/team_fwd.hpp:4,
                 from /usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/team.hpp:4,
                 from /usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/backend.hpp:7,
                 from /usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/allocate.hpp:8,
                 from /usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/upcxx.hpp:5,
                 from copy-remote_cx.cpp:1:
/usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/bind.hpp: In instantiation of 'struct upcxx::detail::deserialized_bound_function_base<upcxx::detail::copy<upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > > >::<lambda(void*)>::<lambda()>::<lambda(cxs_remote_t&&)>, std::tuple<upcxx::detail::completions_state<upcxx::detail::event_is_remote, upcxx::detail::rput_event_values, upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > >, 0> >, upcxx::detail::index_sequence<0>, true>':
/usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/bind.hpp:281:12:   required from 'struct upcxx::detail::deserialized_bound_function<upcxx::detail::copy<upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > > >::<lambda(void*)>::<lambda()>::<lambda(cxs_remote_t&&)>, upcxx::detail::completions_state<upcxx::detail::event_is_remote, upcxx::detail::rput_event_values, upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > >, 0> >'
/usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/bind.hpp:301:70:   required from 'struct upcxx::bound_function<upcxx::detail::copy<upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > > >::<lambda(void*)>::<lambda()>::<lambda(cxs_remote_t&&)>, upcxx::detail::completions_state<upcxx::detail::event_is_remote, upcxx::detail::rput_event_values, upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > >, 0> >'
/usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/copy.hpp:238:24:   required from 'typename upcxx::detail::completions_returner<upcxx::detail::event_is_here, upcxx::detail::rput_event_values, typename std::decay<_Tp>::type>::return_t upcxx::detail::copy(int, upcxx::intrank_t, void*, int, upcxx::intrank_t, void*, std::size_t, Cxs&&) [with Cxs = upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > >; typename upcxx::detail::completions_returner<upcxx::detail::event_is_here, upcxx::detail::rput_event_values, typename std::decay<_Tp>::type>::return_t = void; upcxx::intrank_t = int; std::size_t = long unsigned int]'
/usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/copy.hpp:87:24:   required from 'typename upcxx::detail::completions_returner<upcxx::detail::event_is_here, upcxx::detail::rput_event_values, typename std::decay<Cxs>::type>::return_t upcxx::copy(upcxx::global_ptr<const T, KindSet>, upcxx::global_ptr<T, K>, std::size_t, Cxs&&) [with T = int; upcxx::memory_kind Ks = upcxx::memory_kind::host; upcxx::memory_kind Kd = upcxx::memory_kind::host; Cxs = upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > >; typename upcxx::detail::completions_returner<upcxx::detail::event_is_here, upcxx::detail::rput_event_values, typename std::decay<Cxs>::type>::return_t = void; std::size_t = long unsigned int]'
copy-remote_cx.cpp:14:61:   required from here
/usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/bind.hpp:213:7: error: no type named 'type' in 'class std::result_of<upcxx::detail::copy<upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > > >::<lambda(void*)>::<lambda()>::<lambda(cxs_remote_t&&)>&&(upcxx::detail::completions_state<upcxx::detail::event_is_remote, upcxx::detail::rput_event_values, upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::detail::deserialized_bound_function<main()::<lambda()> > > >, 0>&&)>'
  213 |       operator()() && {
      |       ^~~~~~~~
/usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/bind.hpp: In instantiation of 'struct upcxx::bound_function<upcxx::detail::copy<upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > > >::<lambda(void*)>::<lambda()>::<lambda(cxs_remote_t&&)>, upcxx::detail::completions_state<upcxx::detail::event_is_remote, upcxx::detail::rput_event_values, upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > >, 0> >':
/usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/copy.hpp:238:24:   required from 'typename upcxx::detail::completions_returner<upcxx::detail::event_is_here, upcxx::detail::rput_event_values, typename std::decay<_Tp>::type>::return_t upcxx::detail::copy(int, upcxx::intrank_t, void*, int, upcxx::intrank_t, void*, std::size_t, Cxs&&) [with Cxs = upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > >; typename upcxx::detail::completions_returner<upcxx::detail::event_is_here, upcxx::detail::rput_event_values, typename std::decay<_Tp>::type>::return_t = void; upcxx::intrank_t = int; std::size_t = long unsigned int]'
/usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/copy.hpp:87:24:   required from 'typename upcxx::detail::completions_returner<upcxx::detail::event_is_here, upcxx::detail::rput_event_values, typename std::decay<Cxs>::type>::return_t upcxx::copy(upcxx::global_ptr<const T, KindSet>, upcxx::global_ptr<T, K>, std::size_t, Cxs&&) [with T = int; upcxx::memory_kind Ks = upcxx::memory_kind::host; upcxx::memory_kind Kd = upcxx::memory_kind::host; Cxs = upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > >; typename upcxx::detail::completions_returner<upcxx::detail::event_is_here, upcxx::detail::rput_event_values, typename std::decay<Cxs>::type>::return_t = void; std::size_t = long unsigned int]'
copy-remote_cx.cpp:14:61:   required from here
/usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/bind.hpp:301:70: error: no match for call to '(upcxx::detail::deserialized_bound_function<upcxx::detail::copy<upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > > >::<lambda(void*)>::<lambda()>::<lambda(cxs_remote_t&&)>, upcxx::detail::completions_state<upcxx::detail::event_is_remote, upcxx::detail::rput_event_values, upcxx::completions<upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main()::<lambda()> > > >, 0> >) ()'
  301 |       std::declval<detail::deserialized_bound_function<Fn, B...>&&>()()
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/usr/local/pkg/upcxx-dirac/gcc-10.2.0/nightly-2020.10.14/upcxx.debug.gasnet_seq.smp/include/upcxx/bind.hpp:303:5: warning: 'decltype (declval<upcxx::detail::deserialized_bound_function<Fn, B ...>&&>()()) upcxx::bound_function<Fn, B>::operator()() && [with Fn = main()::<lambda()>; B = {}]' used but never defined
  303 |     operator()() &&;
      |     ^~~~~~~~

This was not noticed until now because we have no automated coverage of remote_cx applied to copy() (yet).

We need to quickly triage this problem and make a decision to either fix it or determine what to rollback for the release. Based on git history, pull request #279 seems the very likely root cause - it's the only modification to copy.hpp during the interval in question and it changed the type of the cxs_remote variable which appears to be what broke things.

Comments (18)

  1. Amir Kamil

    This compilation failure was introduced by PR #268.

    However, if my understanding is correct, the root issue is that remote-completion handling in copy() is broken in several ways, making assumptions that are invalid:

    • In the first case (upcxx::rank_me() != rank_d && upcxx::rank_me() != rank_s), it captures cxs_remote by value in a lambda, then sends that lambda out over the wire. This only works for TriviallySerializable remote completions.
    • In the second case and third cases (upcxx::rank_me() == rank_d), it directly invokes the bound function inside of cxs_remote. As of PR #268, this does not link, but more fundamentally, it violates the semantics of RPC completion which require the arguments to be serialized an deserialized.
    • In the fourth case (upcxx::rank_me() == rank_s && upcxx::rank_me() != rank_d), it makes copies (by lambda capture) of the deserialized remote completion. (Currently, the type is cxs_remote_t, but it should be deserialized_type_t<cxs_remote_t>.) This is not valid in general, as there is no requirement that the deserialized arguments to as_rpc() by copyable.

    Here’s what I think is needed to fix these cases:

    • For the first case, use upcxx::bind() to bind cxs_remote, as is currently done in the fourth case.
    • For the second and third cases, use serialization_traits<cxs_remote_t>::deserialized_value([cxs_remote,*cxs_remote_heaped]).
    • For the first and fourth cases, heap the deserialized cxs_remote and capture the pointer instead.

    Here’s the diff:

    diff --git a/src/copy.hpp b/src/copy.hpp
    index e795e827..7d5e1045 100644
    --- a/src/copy.hpp
    +++ b/src/copy.hpp
    @@ -128,14 +128,17 @@ namespace upcxx {
    
           backend::send_am_master<progress_level::internal>(
             upcxx::world(), rank_d,
    -        [=]() {
    +        upcxx::bind([=](deserialized_type_t<cxs_remote_t> &&cxs_remote) {
               auto operation_cx_as_internal_future = upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, progress_level::internal>>{{}};
    +          deserialized_type_t<cxs_remote_t> *cxs_remote_heaped =
    +            new deserialized_type_t<cxs_remote_t>(std::move(cxs_remote));
    
               detail::copy( dev_s, rank_s, buf_s,
                             dev_d, rank_d, buf_d,
                             size, operation_cx_as_internal_future )
               .then([=]() {
    -            const_cast<cxs_remote_t&>(cxs_remote).template operator()<remote_cx_event>();
    +            cxs_remote_heaped->template operator()<remote_cx_event>();
    +            delete cxs_remote_heaped;
    
                 backend::send_am_persona<progress_level::internal>(
                   upcxx::world(), initiator, initiator_per,
    @@ -146,7 +149,7 @@ namespace upcxx {
                   }
                 );
               });
    -        }
    +        }, std::move(cxs_remote))
           );
         }
         else if(rank_d == rank_s) {
    @@ -154,7 +157,7 @@ namespace upcxx {
             cuda::make_event_cb([=]() {
               cxs_here->template operator()<source_cx_event>();
               cxs_here->template operator()<operation_cx_event>();
    -          const_cast<cxs_remote_t&>(cxs_remote).template operator()<remote_cx_event>();
    +          serialization_traits<cxs_remote_t>::deserialized_value(cxs_remote).template operator()<remote_cx_event>();
               delete cxs_here;
             })
           );
    @@ -191,7 +194,7 @@ namespace upcxx {
                             if(dev_d != host_device)
                               backend::gasnet::deallocate(bounce_d, &backend::gasnet::sheap_footprint_rdzv);
    
    -                        cxs_remote_heaped->template operator()<remote_cx_event>();
    +                        serialization_traits<cxs_remote_t>::deserialized_value(*cxs_remote_heaped).template operator()<remote_cx_event>();
                             cxs_here->template operator()<operation_cx_event>();
                             delete cxs_remote_heaped;
                             delete cxs_here;
    @@ -236,8 +239,10 @@ namespace upcxx {
               backend::send_am_master<progress_level::internal>(
                 upcxx::world(), rank_d,
                 upcxx::bind(
    -              [=](cxs_remote_t &&cxs_remote) {
    +              [=](deserialized_type_t<cxs_remote_t> &&cxs_remote) {
                     void *bounce_d = dev_d == host_device ? buf_d : backend::gasnet::allocate(size, 64, &backend::gasnet::sheap_footprint_rdzv);
    +                deserialized_type_t<cxs_remote_t> *cxs_remote_heaped =
    +                  new deserialized_type_t<cxs_remote_t>(std::move(cxs_remote));
    
                     detail::rma_copy_get(bounce_d, rank_s, bounce_s, size,
                       backend::gasnet::make_handle_cb([=]() {
    @@ -245,7 +250,8 @@ namespace upcxx {
                           if(dev_d != host_device)
                             backend::gasnet::deallocate(bounce_d, &backend::gasnet::sheap_footprint_rdzv);
    
    -                      const_cast<cxs_remote_t&>(cxs_remote).template operator()<remote_cx_event>();
    +                      cxs_remote_heaped->template operator()<remote_cx_event>();
    +                      delete cxs_remote_heaped;
    
                           backend::send_am_persona<progress_level::internal>(
                             upcxx::world(), rank_s, initiator_per,
    

  2. Dan Bonachea reporter

    @Amir Kamil - I agree the current code is "doing it wrong". Clearly we should always be serializing and deserializing the as_rpc argument before invocation, even for the case that it happens to run locally - so I fully agree with my understanding of your changes, and that they probably solve a second problem I had not noticed.

    However I think the new test code introduced in pull request #289 to evaluate/trigger this secondary issue is flawed. Let's iterate on that in the PR, but regardless the copy.hpp fixes appear to be what we need.

  3. Dan Bonachea reporter

    PR 289 was merged to the memory_kinds branch, this defect remains on develop (and in forthcoming 2020.10.0).

    As documented and triaged in pull request #289, the currently best-known solution had the unpleasant side effect of breaking most/all use of upcxx::copy() with the PGI optimizer for reasons we don't yet fully understand.

    We hope to find a better resolution to this problem post-release. In the meantime, GPU users are advised to favor the forthcoming memory_kinds prototype (for this and other reasons).

  4. Dan Bonachea reporter

    Last night's results on EX-summit-ibv-pgi-opt for the memory_kinds branch @ 5cec52e post-merge of PR 289 show the expected opt-mode regression on the following tests:

    • cannon_cuda
    • copy
    • cuda_microbenchmark
    • cuda_vecadd
    • h-d
    • jac3d

    Some of these failures had not been seen before (likely because they are not part of dev-check), but they all invoke upcxx::copy() and are thus subject to the known problem. The merge of PR 289 is the only significant change on this branch since these tests all passed on 10-15 @ 322ffe7.

  5. Paul Hargrove

    I see what I will assume are the same opt-only crashes with the 20.11 release of the PGI/NVIDIA compilers on at least copy-cover.cpp, issue405.cpp and issue421.cpp in a run of make dev-tests.

  6. Dan Bonachea reporter

    pull request 311 "restores" PGI support by banning the use of upcxx::copy() when using PGI.

    NOT a solution to this issue, just an imperfect compromise in the interests of expediency to avoid stalling unrelated development.

  7. Dan Bonachea reporter

    Merge branch 'mk-develop' into develop

    All conflicts are resolved in favor of mk-develop, making the result textually identical to the last commit on mk-develop.

    conflicts:

    • .gitlab-ci.yml - replaced with new out-of-tree link
    • ChangeLog - removed a duplicate addition of 2020.3.2, otherwise trivial
    • bench/cuda_microbenchmark.cpp - minor conflict due to a bugfix on develop and a rewrite on mk-develop (kept the latter).
    • docs/spec.pdf - Revision 2020.11.0-draft is kept as latest
    • src/backend/gasnet/runtime_internal.hpp - trivial
    • src/version.hpp - trivial
    • utils/upcxx-run - trivial

    • mk-develop: src/cuda.cpp: Fix a harmless warning from PGI 20.4 bld/tests.mak: Ban upcxx::copy tests on PGI, avoiding issue 421 Update ChangeLog issue 421: Disable use of upcxx::copy with PGI Revert "INSTALL.md: Prohibit PGI compilers for memory_kinds branch" Revert "Prohibit PGI on the memory_kinds branch" re-enable test/regression/issue336.cpp for dev-check test/shared-seg-query: fix a harmless pedantic sign-compare warning Bump patch number to 2020.11.5 Update ChangeLog Add shared segment query test issue #382: Expose shared heap usage at runtime build-devel.md: update ALCF server URL issue440.cpp: Fix a harmless -pedantic warning Update ChangeLog issue #440: Deploy a temporary solution for 'Invalid GASNet call' in deserialization strengthen the issue440.cpp test Add issue440 test Add GitLab CI information to developer's docs issue #438: Report degenerate local_team in upcxx-run -vv verbose output Bump copyright year mpi-hybrid.md: Update udp-conduit recommended CSPAWN settings fixup: CI testing: add probe for OpenMP support Move GitLab CI and dev-ci scripts to a new repo Update ChangeLog configure: default to 8kb max-medium on aries-conduit dev-check: work-arounds for known issues CI testing: add probe for OpenMP support GitLab CI: renames to get junit/xml reporting GitLab CI: enable junit/xml report generation Update ChangeLog Add issue #432 test issue #432: Some upcxx::copy cases do not discharge() properly aries: cap default UPCXX_RPC_EAGER_THRESHOLD to 8kb Bump patch number to 2020.11.3 Update ChangeLog runtime.cpp: ensure max_medium calc accounts for GEX_FLAG_AM_PREPARE_LEAST_ALLOC rma_put_then_am_master_protocol(): Fix gex_AM_MaxRequestMedium arg count UPCXX_RPC_EAGER_THRESHOLD tweaks Update spelling for GASNET_NATIVE_NP_ALLOC_REQ_MEDIUM bench/misc_perf: expand rpc tests to cover the internal code paths Statically optimize am_send_buffer for NPAM Repair am_send_buffer tiny buffer optimization Add NPAM support for Eager RPC injection Add post-configure check for conduit native NPAM Add UPCXX_RPC_EAGER_THRESHOLD_LOCAL to control local threshold Upgrade prepare_am with local threshold and NPAM prereqs issue #261: tweak gasnet::rma_put_then_am_master runtime.hpp: Remove dead code that is no longer maintained RPC: Hoist TM/rank translation overheads rpc.hpp: "Invert" tm/no-tm overloads of rpc/rpc_ff bench/misc_perf: Add test of rput-then-rpc Change UPCXX_RPC_EAGER_THRESHOLD default to the maximum value issue #164: Add UPCXX_RPC_EAGER_THRESHOLD knob Add an RPC microbenchmark Update ChangeLog move remaining upcxx code in to separate header file Move UPC++ dl_malloc defines into separate header file Complete dlmalloc function list Name shift UPC++'s dlmalloc symbols Fix dev-check when configured using --without-cuda tests.mak: Add bench directory to dev-check Add issue428 test Update ChangeLog issue #428: Regression in rpc(team,rank,..,view) overload resolution leads to confusing type errors ChangeLog: spell-check issue #420: Suppress harmless GASNet debug-mode configure warning GitLab CI / CI scripts: de-version macOS bits Remove the deprecated top-level install script GitLab CI: Resume testing of Theta PrgEnv-cray Resolve issue #425 copy-cover: Fix harmless sign-compare warnings from clang 11 cuda.hpp: Fix a harmless warning from clang 11 -Wall Revert "Set GASNet 2020.11.0 tarball URL" Bump version to 2020.11.1 for mk-develop Bump package version for 2020.11.0 release Set GASNet 2020.11.0 tarball URL Merge 2020.3.2 section into ChangeLog Add PDFs for 2020.11.0 spec and 2020.10.0 guide INSTALL.md: Expand instructions for GDR memory kinds INSTALL.md: Prohibit PGI compilers for memory_kinds branch Add prototype branch notices to major documents Update ChangeLog syncx3 updated examples from guide re-sync updated examples from guide sync updated examples from guide Change configuration of private segment tests Add support for using private segment for host array Fix order of arguments being passed Add host to host transfers Use promises rather than futures in flood benchmark Local GPU and remote host tests Prohibit PGI on the memory_kinds branch Makefile: Fix Git version stamp detection copy-cover: Exercise remote-only completion events copy-cover: Add source completion validation copy-cover: sweep buf sizes fixupx2: Add new copy-cover test fixup: Implement UPCXX_BUG4148_WORKAROUND fixup: Add new copy-cover test Implement UPCXX_BUG4148_WORKAROUND and RC-only opt for puts copy.hpp: factor initiator variable upcxx::copy: Tune use_mk for no RC fixup: issue #221: Fix upcxx::copy() mishandling of private memory arguments GitLab CI: PGI testers opt-in only (issue #421) Upgrade global_ptr::check() to bounds-check device pointers Change device_allocator to throw upcxx::bad_segment_alloc device_allocator construct: prohibit empty active segments Add bad-segment-alloc test to validate device_allocator exn behavior device_allocator: fix undocumented device memory allocation Fix delete of array in view-accumulate.cpp to delete[] example/prog-guide: apply updates from guide repo cuda_microbenchmark: correct windowing logic Less specific GitLab tags for Xcode runners. correct an assert typo appearing in multiple functions Increase max_heaps in the reference implementation to 33 Avoid issue 423 for upcxx::copy() Deploy GPUDirect transfers in upcxx::copy Construct GEX EP/MK/Segment objects Implement heap_idx recycling Add negative compilation tests copy: assert additional preconditions test/regression/issue405: fix a violated copy() precondition upcxx::copy()/detail::rma_copy_local() use memcpy copy-cover: exploit fix to issue421 Add new copy-cover test issue #221: Fix upcxx::copy() mishandling of private memory arguments Change default --with-gasnet URL to memory_kinds nightly snapshot Split issue 421 test into two, one with lambdas and another with an asymmetric fuction object. Fix issue #421. Add issue421 Ban PGI 20.7+ due to known bugs fixup: Bump required GEX spec version to 0.10 and remove cruft test/copy: update for new collective requirement on device_allocator construction Add collective device_allocator creation Add a high-level coverage test for memory kinds fix an unreported bug in cuda_device move constructor fix unreported bug in device_allocator template inheritance Improve heap_state safety Probe for GASNET_MAXEPS Rework internal memory kinds objects global_ptr: Rename device_ field to heap_idx_

    → <<cset 240d2d069d40>>

  8. Dan Bonachea reporter
    • changed status to open

    This was accidentally/automatically resolved - this problem remains, breaking the PGI optimizer

  9. Dan Bonachea reporter

    Fix issue #421: PGI optimizer problem with upcxx::copy()

    This commit converts a lambda capture of a remote completion on the upcxx::copy() loopback path into explicit passing through a heap location (as done along other paths).

    This change is observed to resolve the stack corruption issues seen on dirac/x86_64/linux/{smp,udp} with PGI 20.4/opt.

    → <<cset ee6311365ce8>>

  10. Log in to comment