valgrind reports UPC++ runtime sends uninitialized bytes over AM

Issue #377 new
Dan Bonachea created an issue

I'm working on GASNet PR #355 that makes GEX more valgrind-friendly, at least for smp and udp conduits.

In the course of doing so, I've been running valgrind on UPC++ programs as a test client (and also as work on pull request #221). As part of this work, I've eliminated most of the obvious/major memory leaks within our UPC++ tests, which were sometimes not careful about freeing heap resources.

However I find that when running valgrind on distributed-memory (udp) programs, the UPC++ runtime is often triggerring the valgrind warning shown below during AMs. These warnings indicate the runtime is sometimes sending uninitialized bytes as AM arguments or AM payload. These uninitialized bytes are reported when they reach the socket system call level for udp-conduit. Note that PSHM-bypass AMs (including all of smp-conduit) will never report this type of error, because those are handled entirely in shared memory without system calls so valgrind doesn't "see" the uninitialized data leaving the process.

This type of error might represent something benign like sending internal struct padding, or something more serious like sending trailing garbage bytes (potentially wasting bandwidth for no reason). The --track-origins=yes option can be used to attribute the uninitialized values to exact stack frames in the runtime, but I think only @john bachan can decipher whether any of these are a cause for concern.

Here are some examples from dirac/udp/gcc-10/debug/valgrind:

barrier.cpp:

==31998== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==31998==    at 0x5A25993: __sendto_nocancel (syscall-template.S:81)
==31998==    by 0x720219: sendPacket(amudp_ep*, amudp_msg_t*, unsigned long, sockaddr_in, packet_type) (amudp_reqrep.cpp:91)
==31998==    by 0x726BF9: AMUDP_RequestGeneric(amudp_category_t, amudp_ep*, unsigned int, unsigned char, void*, unsigned long, unsigned long, int, __va_list_tag*, unsigned char, unsigned char) (amudp_reqrep.cpp:1040)
==31998==    by 0x727DA3: AMUDP_RequestIVA (amudp_reqrep.cpp:1218)
==31998==    by 0x492E11: gasnetc_AMRequestMedium (gasnet_core.c:843)
==31998==    by 0x493768: gasnetc_AMRequestMediumM (gasnet_core.c:875)
==31998==    by 0x41DABA: upcxx::backend::gasnet::send_am_eager_master(upcxx::progress_level, upcxx::team const&, int, void*, unsigned long, unsigned long) (runtime.cpp:1259)
==31998==    by 0x40B980: void upcxx::backend::send_prepared_am_master<upcxx::backend::gasnet::am_send_buffer<upcxx::storage_size<40ul, 8ul>, true> >(upcxx::progress_level, upcxx::team const&, int, upcxx::backend::gasnet::am_send_buffer<upcxx::storage_size<40ul, 8ul>, true>&&) (runtime.hpp:421)
==31998==    by 0x40621F: _ZN5upcxx7backend14send_am_masterILNS_14progress_levelE1ENS_14bound_functionIZNS_6detail3rpcINS_11completionsIJNS_9future_cxINS_18operation_cx_eventELS2_1EEEEEEOZ4mainEUlvE_JEEENS4_10rpc_returnIFT0_DpT1_ET_NS4_18rpc_remote_resultsISH_vE4typeEE4typeERKNS_4teamEiSI_OSE_DpOSF_EUlONS3_ISB_JEEEE_JSU_EEEEEvSQ_iSR_ (runtime.hpp:455)
==31998==    by 0x405F55: upcxx::detail::rpc_return<main::{lambda()#1}&& (), upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1> >, upcxx::detail::rpc_remote_results<, void>::type>::type upcxx::detail::rpc<upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1> >, main::{lambda()#1}&&>(upcxx::team const&, int, main::{lambda()#1}&& (), upcxx::detail::rpc_return&&, (main::{lambda()#1}&&)...) (rpc.hpp:230)
==31998==    by 0x405D8F: upcxx::detail::rpc_return<main::{lambda()#1} (), upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1> >, main::{lambda()#1}::rpc_remote_results<, void>::type>::type upcxx::rpc<main::{lambda()#1}>(int, upcxx::detail::rpc_return&&, (main::{lambda()#1}&&)...) (rpc.hpp:298)
==31998==    by 0x405970: main (barrier.cpp:30)
==31998==  Address 0x5faf5ec is 140 bytes inside a block of size 192 alloc'd
==31998==    at 0x4C29F33: malloc (vg_replace_malloc.c:307)
==31998==    by 0x6A4D15: _gasneti_malloc_inner (gasnet_internal.c:2122)
==31998==    by 0x6A5298: _gasneti_malloc (gasnet_internal.c:2185)
==31998==    by 0x6A5B4F: _gasneti_extern_malloc (gasnet_internal.c:2298)
==31998==    by 0x71962A: AMUDP_AcquireBuffer (amudp_ep.cpp:110)
==31998==    by 0x720706: AMUDP_DrainNetwork(amudp_ep*) (amudp_reqrep.cpp:214)
==31998==    by 0x725B6F: AMUDP_ServiceIncomingMessages(amudp_ep*) (amudp_reqrep.cpp:829)
==31998==    by 0x725F97: AM_Poll (amudp_reqrep.cpp:886)
==31998==    by 0x491DB7: gasnetc_AMPoll (gasnet_core.c:771)
==31998==    by 0x49A36F: _gasneti_AMPoll (gasnet_help.h:1012)
==31998==    by 0x4B85F7: gasnete_amdbarrier_wait (gasnet_extended_refbarrier.c:996)
==31998==    by 0x4BC18D: gasnete_barrier_default (gasnet_extended_refbarrier.c:2160)
==31998==  Uninitialised value was created by a stack allocation
==31998==    at 0x405E80: upcxx::detail::rpc_return<main::{lambda()#1}&& (), upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1> >, upcxx::detail::rpc_remote_results<, void>::type>::type upcxx::detail::rpc<upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1> >, main::{lambda()#1}&&>(upcxx::team const&, int, main::{lambda()#1}&& (), upcxx::detail::rpc_return&&, (main::{lambda()#1}&&)...) (rpc.hpp:199)

rput.cpp: (excerpt)

==655== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==655==    at 0x5A25993: __sendto_nocancel (syscall-template.S:81)
==655==    by 0x729AC4: sendPacket(amudp_ep*, amudp_msg_t*, unsigned long, sockaddr_in, packet_type) (amudp_reqrep.cpp:91)
==655==    by 0x7304A4: AMUDP_RequestGeneric(amudp_category_t, amudp_ep*, unsigned int, unsigned char, void*, unsigned long, unsigned long, int, __va_list_tag*, unsigned char, unsigned char) (amudp_reqrep.cpp:1040)
==655==    by 0x73164E: AMUDP_RequestIVA (amudp_reqrep.cpp:1218)
==655==    by 0x49C113: gasnetc_AMRequestMedium (gasnet_core.c:843)
==655==    by 0x49CA86: gasnetc_AMRequestMediumM (gasnet_core.c:875)
==655==    by 0x426CAA: upcxx::backend::gasnet::send_am_eager_master(upcxx::progress_level, upcxx::team const&, int, void*, unsigned long, unsigned long) (runtime.cpp:1259)
==655==    by 0x4121FA: void upcxx::backend::send_prepared_am_master<upcxx::backend::gasnet::am_send_buffer<upcxx::storage_size<24ul, 8ul>, true> >(upcxx::progress_level, upcxx::team const&, int, upcxx::backend::gasnet::am_send_buffer<upcxx::storage_size<24ul, 8ul>, true>&&) (runtime.hpp:421)
==655==    by 0x407473: _ZN5upcxx7backend14send_am_masterILNS_14progress_levelE1ENS_14bound_functionIZNS_6detail3rpcINS_11completionsIJNS_9future_cxINS_18operation_cx_eventELS2_1EEEEEEOZ4mainEUlvE_JEEENS4_10rpc_returnIFT0_DpT1_ET_NS4_18rpc_remote_resultsISH_vE4typeEE4typeERKNS_4teamEiSI_OSE_DpOSF_EUlONS3_ISB_JEEEE_JSU_EEEEEvSQ_iSR_ (runtime.hpp:455)
==655==    by 0x406818: upcxx::detail::rpc_return<main::{lambda()#1}&& (), upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1> >, upcxx::detail::rpc_remote_results<, void>::type>::type upcxx::detail::rpc<upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1> >, main::{lambda()#1}&&>(upcxx::team const&, int, main::{lambda()#1}&& (), upcxx::detail::rpc_return&&, (main::{lambda()#1}&&)...) (rpc.hpp:230)
==655==    by 0x405E74: upcxx::detail::rpc_return<main::{lambda()#1} (), upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1> >, main::{lambda()#1}::rpc_remote_results<, void>::type>::type upcxx::rpc<main::{lambda()#1}>(int, upcxx::detail::rpc_return&&, (main::{lambda()#1}&&)...) (rpc.hpp:298)
==655==    by 0x405BE2: main (rput.cpp:44)
==655==  Address 0x5faf5ec is 140 bytes inside a block of size 192 alloc'd
==655==    at 0x4C29F33: malloc (vg_replace_malloc.c:307)
==655==    by 0x6AE4A8: _gasneti_malloc_inner (gasnet_internal.c:2122)
==655==    by 0x6AEA2B: _gasneti_malloc (gasnet_internal.c:2185)
==655==    by 0x6AF2E2: _gasneti_extern_malloc (gasnet_internal.c:2298)
==655==    by 0x722ED5: AMUDP_AcquireBuffer (amudp_ep.cpp:110)
==655==    by 0x729FB1: AMUDP_DrainNetwork(amudp_ep*) (amudp_reqrep.cpp:214)
==655==    by 0x72F41A: AMUDP_ServiceIncomingMessages(amudp_ep*) (amudp_reqrep.cpp:829)
==655==    by 0x72F842: AM_Poll (amudp_reqrep.cpp:886)
==655==    by 0x49B09D: gasnetc_AMPoll (gasnet_core.c:771)
==655==    by 0x4A37A5: _gasneti_AMPoll (gasnet_help.h:1012)
==655==    by 0x4C1A2D: gasnete_amdbarrier_wait (gasnet_extended_refbarrier.c:996)
==655==    by 0x4C55C3: gasnete_barrier_default (gasnet_extended_refbarrier.c:2160)
==655==  Uninitialised value was created by a stack allocation
==655==    at 0x406771: upcxx::detail::rpc_return<main::{lambda()#1}&& (), upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1> >, upcxx::detail::rpc_remote_results<, void>::type>::type upcxx::detail::rpc<upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1> >, main::{lambda()#1}&&>(upcxx::team const&, int, main::{lambda()#1}&& (), upcxx::detail::rpc_return&&, (main::{lambda()#1}&&)...) (rpc.hpp:199)

==654== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==654==    at 0x5A25993: __sendto_nocancel (syscall-template.S:81)
==654==    by 0x729AC4: sendPacket(amudp_ep*, amudp_msg_t*, unsigned long, sockaddr_in, packet_type) (amudp_reqrep.cpp:91)
==654==    by 0x7304A4: AMUDP_RequestGeneric(amudp_category_t, amudp_ep*, unsigned int, unsigned char, void*, unsigned long, unsigned long, int, __va_list_tag*, unsigned char, unsigned char) (amudp_reqrep.cpp:1040)
==654==    by 0x731D6B: AMUDP_RequestXferVA (amudp_reqrep.cpp:1284)
==654==    by 0x49CE86: gasnetc_AMRequestLong (gasnet_core.c:903)
==654==    by 0x49D813: gasnetc_AMRequestLongM (gasnet_core.c:937)
==654==    by 0x4320D8: upcxx::backend::gasnet::rma_put_then_am_sync upcxx::backend::gasnet::rma_put_then_am_master_protocol<(upcxx::backend::gasnet::rma_put_then_am_sync)0, true>(upcxx::team const&, int, void*, void const*, unsigned long, upcxx::progress_level, void*, unsigned long, unsigned long, upcxx::backend::gasnet::handle_cb*, upcxx::backend::gasnet::reply_cb*) (runtime.cpp:1994)
==654==    by 0x407911: upcxx::backend::gasnet::rma_put_then_am_sync upcxx::backend::gasnet::rma_put_then_am_master<(upcxx::backend::gasnet::rma_put_then_am_sync)0, upcxx::bound_function<upcxx::detail::completions_state<upcxx::detail::event_is_remote, upcxx::detail::rput_event_values, upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1>, upcxx::lpc_cx<upcxx::source_cx_event, main::{lambda()#2}>, upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main::{lambda()#3}> > >, 0>::event_bound<upcxx::rpc_cx>, upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main::{lambda()#3}> > > >(upcxx::team const&, int, void*, void const*, unsigned long, upcxx::progress_level, upcxx::bound_function<upcxx::detail::completions_state<upcxx::detail::event_is_remote, upcxx::detail::rput_event_values, upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1>, upcxx::lpc_cx<upcxx::source_cx_event, main::{lambda()#2}>, upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main::{lambda()#3}> > >, 0>::event_bound<upcxx::rpc_cx>, upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main::{lambda()#3}> > >&&, upcxx::backend::gasnet::handle_cb*, upcxx::backend::gasnet::reply_cb*) (runtime.hpp:680)
==654==    by 0x406D8F: upcxx::detail::rma_put_sync upcxx::detail::rput_obj_base<upcxx::detail::rput_obj<upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1>, upcxx::lpc_cx<upcxx::source_cx_event, main::{lambda()#2}>, upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main::{lambda()#3}> > >, upcxx::detail::rput_traits<upcxx::bound_function<main::{lambda()#3}>, false> >, upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1>, upcxx::lpc_cx<upcxx::source_cx_event, main::{lambda()#2}>, upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main::{lambda()#3}> > >, true, true, false, true>::inject<upcxx::remote_cx_event<upcxx::detail::completions_state<upcxx::detail::event_is_remote, upcxx::detail::rput_event_values, upcxx::bound_function<main::{lambda()#3}>, 0>::event_bound<upcxx::rpc_cx>, upcxx::detail::event_is_remote> >(int, void*, void const*, unsigned long, upcxx::remote_cx_event<upcxx::detail::completions_state<upcxx::detail::event_is_remote, upcxx::detail::rput_event_values, upcxx::bound_function<main::{lambda()#3}>, 0>::event_bound<upcxx::rpc_cx>, upcxx::detail::event_is_remote>&&) (rput.hpp:235)
==654==    by 0x4061FD: upcxx::detail::rput_traits<upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1>, upcxx::lpc_cx<upcxx::source_cx_event, main::{lambda()#2}>, upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main::{lambda()#3}> > >, false>::return_t upcxx::rput<int, upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1>, upcxx::lpc_cx<upcxx::source_cx_event, main::{lambda()#2}>, upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main::{lambda()#3}> > > >(int const*, upcxx::global_ptr<upcxx::detail::rput_traits<upcxx::completions<upcxx::future_cx<upcxx::operation_cx_event, (upcxx::progress_level)1>, upcxx::lpc_cx<upcxx::source_cx_event, main::{lambda()#2}>, upcxx::rpc_cx<upcxx::remote_cx_event, upcxx::bound_function<main::{lambda()#3}> > >, false>, (upcxx::memory_kind)1>, unsigned long, upcxx::detail) (rput.hpp:482)
==654==    by 0x405CEE: main (rput.cpp:58)
==654==  Address 0x5fb452c is 140 bytes inside a block of size 65,192 alloc'd
==654==    at 0x4C29F33: malloc (vg_replace_malloc.c:307)
==654==    by 0x6AE4A8: _gasneti_malloc_inner (gasnet_internal.c:2122)
==654==    by 0x6AEA2B: _gasneti_malloc (gasnet_internal.c:2185)
==654==    by 0x6AF2E2: _gasneti_extern_malloc (gasnet_internal.c:2298)
==654==    by 0x722ED5: AMUDP_AcquireBuffer (amudp_ep.cpp:110)
==654==    by 0x729FB1: AMUDP_DrainNetwork(amudp_ep*) (amudp_reqrep.cpp:214)
==654==    by 0x72F41A: AMUDP_ServiceIncomingMessages(amudp_ep*) (amudp_reqrep.cpp:829)
==654==    by 0x72F842: AM_Poll (amudp_reqrep.cpp:886)
==654==    by 0x72FACC: AMUDP_RequestGeneric(amudp_category_t, amudp_ep*, unsigned int, unsigned char, void*, unsigned long, unsigned long, int, __va_list_tag*, unsigned char, unsigned char) (amudp_reqrep.cpp:933)
==654==    by 0x731D6B: AMUDP_RequestXferVA (amudp_reqrep.cpp:1284)
==654==    by 0x49CE86: gasnetc_AMRequestLong (gasnet_core.c:903)
==654==    by 0x49D813: gasnetc_AMRequestLongM (gasnet_core.c:937)
==654==  Uninitialised value was created by a stack allocation
==654==    at 0x431DF2: upcxx::backend::gasnet::rma_put_then_am_sync upcxx::backend::gasnet::rma_put_then_am_master_protocol<(upcxx::backend::gasnet::rma_put_then_am_sync)0, true>(upcxx::team const&, int, void*, void const*, unsigned long, upcxx::progress_level, void*, unsigned long, unsigned long, upcxx::backend::gasnet::handle_cb*, upcxx::backend::gasnet::reply_cb*) (runtime.cpp:1953)

In reading these stacks, keep in mind that valgrind is reporting the internal AMUDP buffer that is used to send the uninitialized data (copied from client arguments) into the socket system call, but the higher levels of the stack show the frames responsible for injecting the offending AM call with arguments containing uninitialized bytes. The final attribution line in each entry shows the function where the uninitialized bytes originated (and were subsequently copied around).

These errors unfortunately mean that distributed-memory UPC++ runs over valgrind will always be "spammy", potentially obscuring real application-level errors. It might be possible to blacklist these errors in a valgrind suppression file, but based on my understanding such a pattern would also block real errors where an application program inadvertently sent uninitialized bytes via upcxx::rpc.

We should investigate the source of these uninitialized bytes, and consider workarounds. For example, we could have an --enable-valgrind mode directing the runtime to clear struct padding before sending it via AM, to avoid triggering these errors (at some cost in performance, so off by default).

Comments (5)

  1. Dan Bonachea reporter

    I'm now relatively confident the most common sources of these warnings are internal structure padding for trivially serializable types, and alignment padding inserted by serialization. It's conceivable we might do something about the latter, but the former problem is fundamental and not something we can solve as a library (i.e. we are responsible for faithfully sending all the bytes in user-provided TriviallySerializable objects, even if some of them happen to be uninitialized internal structure padding).

  2. Log in to comment