invalid inputs to std::aligned_storage, std::aligned_alloc and posix_memalign are non-portable

Issue #246 resolved
Paul Hargrove created an issue

I am working on determining the "floor" for PGI support.
In testing pgi/18.10 on Dirac I am seeing the assertion failures below.

@john bachan can you have a look at the corresponding source and let me know if you think these indicate a genuine problem with the compiler (vs the possibility the code is making in improper assumption)?

{rpc_barrier,view}-{seq,par}, view-par,

With 8 ranks on 2 nodes yielded the following:

*** FATAL ERROR (proc 4):
//////////////////////////////////////////////////
UPC++ assertion failure:
 rank=4
 file=/home/data2/upcnightly/dirac/EX-dirac-ibv-pgi_old/work/dbg/upcxx/.nobs/art/6e4f8906d31acfdeaea319f8183a046b816c2b96/upcxx/serialization.hpp:385

Failed condition: detail::is_aligned(initial_buf, serialization_align_max)

To have UPC++ freeze during these errors so you can attach a debugger, rerun the program with GASNET_FREEZE_ON_ERROR=1 in the environment.
//////////////////////////////////////////////////

view-histogram{1,2}-{seq,par}

With 8 ranks on 2 nodes yielded the following which is the same file/line, except the installed copy of the header

*** FATAL ERROR (proc 0):
//////////////////////////////////////////////////
UPC++ assertion failure:
 rank=0
 file=/home/data2/upcnightly/dirac/EX-dirac-ibv-pgi_old/work/dbg/upcxx-inst-gasnet_seq/include/upcxx/serialization.hpp:385

Failed condition: detail::is_aligned(initial_buf, serialization_align_max)

To have UPC++ freeze during these errors so you can attach a debugger, rerun the program with GASNET_FREEZE_ON_ERROR=1 in the environment.
//////////////////////////////////////////////////

Comments (5)

  1. Dan Bonachea

    The following code in looks like it relies upon undefined behavior, according to C++11:

    serialization.hpp:          constexpr std::uintptr_t serialization_align_max = 64;
    backend/gasnet/runtime.hpp: typename std::aligned_storage<512, serialization_align_max>::type tiny_;
    

    Quoting from the C++11 spec for std::aligned_storage<std::size_t Len, std::size_t Align = default-alignment>:

    Align shall be equal to alignof(T) for some type T or to default-alignment.

    This means if the value 64 exceeds the fundamental alignment of any compiler-supported type, the behavior is undefined. (FWIW, similar UB applies to the alignas(N) specifier for unsupported alignment values of N, which I recall we previously encountered). The undefined behavior may very well exhibit as the compiler providing a smaller alignment than requested, which would directly lead to this exact assertion failure.

    The fact this error only happens to manifest so far on one particular PGI version is not a behavior I think we can rely upon. Gnu has some language extensions for alignment, which may be why it faithfully implements values greater than what's guaranteed by C++, although even the Gnu manual cautions that in some cases system limitations will prevent large attribute(aligned) values from being honored.

    IMO the hard-coded value 64 cannot be relied upon, and should either be relaxed to alignof(std::max_align_t) (which C++11 provides), or made conditional on compiler versions known to implement that extended alignment (and possibly use the Gnu attribute explicitly). Alternatively, this code could be re-designed with manual alignment arithmetic, eg if we think cache-line alignment is important (in which case 64 is also the wrong value for PPC).

  2. Paul Hargrove reporter

    FWIW, no such assertions are seen with the same compiler version when running on PPC64.
    At the moment I am expecting to documenting different floor versions (18.10 on ppc64, 19.1 on x86_64) do to this issue.

  3. Dan Bonachea

    Copied from pull request #110:

    The detail::alloc_aligned function in src/utility.hpp appears to be relying upon non-portable behavior.

    C++17 std::aligned_alloc is only guaranteed to work for "a valid alignment supported by the implementation", which is only guaranteed to include alignof() for some type (and in particular less than alignof(max_align_t)). It additionally requires that "size shall be an integral multiple of alignment". Otherwise undefined behavior (in particular, it is not required to return null if these preconditions are violated).

    posix_memalign has a different set of preconditions - in particular it requires "The value of alignment shall be a multiple of sizeof (void *), that is also a power of two.", which means for 64-bit it's only guaranteed to work for align==8*2^n for some n>=0. This function at least is specced to return EINVAL if it fails due to invalid inputs, but in that case the code in our function silently returns (potentially non-null) garbage in opt mode that is likely to lead to errors which may or may not SEGV (could just be silent data corruption). FWIW the same occurs for an out-of-memory failure. Both of these conditions could potentially be intermittent based on system state, and might even happen to only fail in opt mode (in which case the debug-only assertion doesn't help).

    I'd like to see some more careful handling of these preconditions to prevent silent misbehavior - the std::aligned_alloc branch should assert all its preconditions, and the posix_memalign branch should check for an error return even in opt mode, or alternatively initialize p to NULL to ensure a SEGV upon use of the returned pointer.

    Note that both functions also have some squirrely corner-cases for size==0, so we should probably additionally assert size > 0, unless that's a case we want to try and handle.

  4. Dan Bonachea

    pull request #110 at 0cff3e1 resolves this issue:

    • preconditions for posix_memalign are now correctly respected
    • Use of C++17 std::aligned_alloc is removed
    • Extended alignments no longer use std::aligned_storage and now use detail::xaligned_storage instead
  5. Log in to comment