Consider renaming internal UPCXX_ macros

Issue #487 resolved
Dan Bonachea created an issue

I'm starting to grow concerned over the growth and conflation of our UPC++ macro space. I'll categorize the current usages as follows:

Cat 1: We have around a half-dozen "specified" UPCXX_ macros, eg:

  • UPCXX_SERIALIZED_FIELDS
  • UPCXX_SPEC_VERSION
  • UPCXX_VERSION
  • UPCXX_DEFER_COMPLETION

Cat 2. We have a set of macros that are explicitly documented as implementation-defined:

  • UPCXX_THREADMODE
  • UPCXX_CODEMODE
  • UPCXX_NETWORK_*

Cat 3. Another set of macros that are not explicitly documented, but probably should be:

  • UPCXX_CUDA_ENABLED
  • UPCXX_MAX_VALUE_SIZE
  • UPCXX_MAX_RPC_ARG_SIZE

Cat 4. Another set of macros that are undocumented, but used in tests and are relatively "stable":

  • UPCXX_ASSERT_ALWAYS
  • UPCXX_ASSERT_ENABLED - side-note, should probably be replaced with UPCXX_CODEMODE in tests
  • UPCXX_BACKEND_GASNET_SEQ - side-note, should probably be replaced with UPCXX_THREADMODE in tests
  • UPCXX_BACKEND - side-note, should probably be excised from tests

Cat 5. Finally there's a rapidly growing list of undocumented macros that are entirely internal to the implementation, that have no guarantees of stability and are not intended for user code:

  • Nearly everything in gen_include/upcxx/upcxx_config.hpp, eg :
    • UPCXX_ISSUE400_WORKAROUND
    • UPCXX_HAVE___BUILTIN_ASSUME_ALIGNED
    • UPCXX_GASNET_NATIVE_NP_ALLOC_REQ_MEDIUM
    • UPCXX_MPSC_QUEUE_ATOMIC
  • UPCXX_UNIFORM_LOCAL_VTABLES
  • UPCXX_HAS_OPERATION_CX_AS_BLOCKING
  • UPCXX_AD_METHODS
  • UPCXX_GPTR_CHK
  • UPCXX_ETYPE
  • UPCXX_ASSERT_MASTER_CURRENT_IFSEQ
  • Around 100 other macros used internally by the UPC++ headers

My concern is that currently these all live in the same UPCXX_ macro namespace, providing no obvious distinction regarding which of the five categories a given macro falls into. Worse, the lack of documentation for category 3..5 means that a user "fishing" for undocumented macros has no way to distinguish whether a given macro falls into category 3 or 5.

I propose the following (in priority order):

  1. We rename all the macros in category 5 into a unique prefix, for example UPCXXI_ (where the "I" stands for "internal"). This has the purpose of immediately breaking any external code relying on these macros we really don't want used outside the runtime, and "giving notice" to all future macro fishermen that these are unstable.
  2. Update implementation-defined.md to clearly state that all UPCXXI_* macros are off-limits to client code, and that client-code should never define symbols into UPCXX_* or UPCXXI_* (except where a given symbol is explicitly documented as a user-provided setting)
  3. Add documentation for category 3 macros in implementation-defined.md (promoting them to category 2)
  4. "Prune" category 4 by replacing test uses with documented macros, and possibly even replacing outdated ones in the implementation.

Comments (7)

  1. Paul Hargrove

    Except for some mild skepticism regarding the "side-note"s for category 4, I fully agree with this proposal. In fact, I had a similar though when reviewing recent PRs which added new identifiers to category 5.

    Given the scope of the task, doing this soon (while still months from a release) seems advisable.

    P.S. LOL at "macro fishermen"

  2. Amir Kamil

    Here’s my audit of the current situation.

    Macros defined by the UPC++ spec

    • UPCXX_SERIALIZED_BASE
    • UPCXX_SERIALIZED_DELETE
    • UPCXX_SERIALIZED_FIELDS
    • UPCXX_SERIALIZED_VALUES
    • UPCXX_SPEC_VERSION
    • UPCXX_VERSION

    Macros defined in implementation-defined.md

    • UPCXX_THREADMODE
    • UPCXX_CODEMODE
    • UPCXX_NETWORK_*
    • UPCXX_DEFER_COMPLETION

    Macros that probably should be defined (@Dan Bonachea 's category 3)

    • UPCXX_CUDA_ENABLED
    • UPCXX_MAX_VALUE_SIZE
    • UPCXX_MAX_RPC_ARG_SIZE

    Macros that are stable (@Dan Bonachea 's category 4)

    • UPCXX_ASSERT
    • UPCXX_ASSERT_ALWAYS
    • UPCXX_ASSERT_ENABLED
    • UPCXX_BACKEND
    • UPCXX_BACKEND_GASNET_PAR
    • UPCXX_BACKEND_GASNET_SEQ

    Environment variables (currently undocumented)

    • UPCXX_BUG4148_WORKAROUND
    • UPCXX_USE_UPC_ALLOC
    • UPCXX_UPC_HEAP_COLL
    • UPCXX_VERBOSE
    • UPCXX_SHARED_HEAP_SIZE
    • UPCXX_SEGMENT_MB
    • UPCXX_RPC_EAGER_THRESHOLD
    • UPCXX_RPC_EAGER_THRESHOLD_LOCAL
    • UPCXX_OVERSUBSCRIBED
    • UPCXX_WARN_COLLECTIVE_IN_PROGRESS
    • UPCXX_WARN_EMPTY_RMA

    Additional macros used in tests (in addition to the 4 macro categories above)

    • UPCXX_HAS_OPERATION_CX_AS_BLOCKING
    • UPCXX_NOINLINE
    • UPCXX_UNIFORM_LOCAL_VTABLES

    Environment variables used in tests

    • UPCXX_SEGMENT_MB
    • UPCXX_SHARED_HEAP_SIZE

    Macros only used in tests (not in the upcxx source)

    • UPCXX_USE_COLOR

    Macros that we might consider defining (in addition to @Dan Bonachea 's)

    • UPCXX_MAX_RPC_AM_ARGS
    • UPCXX_STRICT_SEGMENT
    • UPCXX_GIT_VERSION
    • UPCXX_RPC_STACK_MAX_FN_SIZE

    Ident strings (global variables)

    • UPCXX_IdentString_AssertEnabled
    • UPCXX_IdentString_BuildTimestamp
    • UPCXX_IdentString_CUDAEnabled
    • UPCXX_IdentString_CUDAGASNet
    • UPCXX_IdentString_CodeMode
    • UPCXX_IdentString_CompilerID
    • UPCXX_IdentString_CompilerStd
    • UPCXX_IdentString_GASNetVersion
    • UPCXX_IdentString_GitVersion
    • UPCXX_IdentString_LibraryVersion
    • UPCXX_IdentString_MPSCQueue
    • UPCXX_IdentString_Network
    • UPCXX_IdentString_SpecVersion
    • UPCXX_IdentString_ThreadMode

    Here’s the full set of identifiers and envvars in the source code that start with UPCXX_:

    UPCXX_AD_ANYTYPE
    UPCXX_AD_INTONLY
    UPCXX_AD_METHODS
    UPCXX_ALL_RANKS_DEFINITELY_LOCAL
    UPCXX_AM_INDEX_BASE
    UPCXX_ASSERT
    UPCXX_ASSERT_
    UPCXX_ASSERT_1
    UPCXX_ASSERT_2
    UPCXX_ASSERT_ALWAYS
    UPCXX_ASSERT_ALWAYS_MASTER
    UPCXX_ASSERT_COLLECTIVE_SAFE
    UPCXX_ASSERT_COLLECTIVE_SAFE_NAMED
    UPCXX_ASSERT_DISPATCH
    UPCXX_ASSERT_ENABLED
    UPCXX_ASSERT_INIT
    UPCXX_ASSERT_INIT_NAMED
    UPCXX_ASSERT_MASTER
    UPCXX_ASSERT_MASTER_CURRENT_IFSEQ
    UPCXX_ASSERT_MASTER_HELD_IFSEQ
    UPCXX_ASSERT_VALID_DEFINITELY_LOCAL
    UPCXX_BACKEND
    UPCXX_BACKEND_GASNET
    UPCXX_BACKEND_GASNET_PAR
    UPCXX_BACKEND_GASNET_SEQ
    UPCXX_BUG4148_WORKAROUND
    UPCXX_CODEMODE
    UPCXX_COLLECTIVES_IN_PROGRESS
    UPCXX_COMAPRE_OP
    UPCXX_COMPARATOR
    UPCXX_COMPARE_OP
    UPCXX_CONCAT
    UPCXX_CONCAT_
    UPCXX_COPY_OPTIMIZEHOST
    UPCXX_COPY_PROMOTEPRIVATE
    UPCXX_CREDUCE_SLIM
    UPCXX_CUDA_ENABLED
    UPCXX_CUDA_USE_MK
    UPCXX_DECAYED_GP
    UPCXX_DEFER_COMPLETION
    UPCXX_EAGER_DEFAULT
    UPCXX_ETYPE
    UPCXX_FATAL_ERROR
    UPCXX_FUNC
    UPCXX_GASNET_NATIVE_NP_ALLOC_REQ_MEDIUM
    UPCXX_GIT_VERSION
    UPCXX_GPTR_CHECK_ALIGNMENT
    UPCXX_GPTR_CHECK_ENABLED
    UPCXX_GPTR_CHECK_SCALE
    UPCXX_GPTR_CHK
    UPCXX_GPTR_CHK_NONNULL
    UPCXX_GPTYPE
    UPCXX_HARDCODE_STD_HASH
    UPCXX_HAS_OPERATION_CX_AS_BLOCKING
    UPCXX_HAVE_RTTI
    UPCXX_HAVE___BUILTIN_ASSUME_ALIGNED
    UPCXX_HAVE___BUILTIN_LAUNDER
    UPCXX_HIDDEN_AM_CONCURRENCY_LEVEL
    UPCXX_INFIX_OP
    UPCXX_INTERNAL_ONLY
    UPCXX_INVOKE_UB
    UPCXX_IN_GLOBAL_PTR_HPP
    UPCXX_ISSUE400_WORKAROUND
    UPCXX_ISSUE_485_SLOW_THE_ALWAYS
    UPCXX_IdentString_AssertEnabled
    UPCXX_IdentString_BuildTimestamp
    UPCXX_IdentString_CUDAEnabled
    UPCXX_IdentString_CUDAGASNet
    UPCXX_IdentString_CodeMode
    UPCXX_IdentString_CompilerID
    UPCXX_IdentString_CompilerStd
    UPCXX_IdentString_GASNetVersion
    UPCXX_IdentString_GitVersion
    UPCXX_IdentString_LibraryVersion
    UPCXX_IdentString_MPSCQueue
    UPCXX_IdentString_Network
    UPCXX_IdentString_SpecVersion
    UPCXX_IdentString_ThreadMode
    UPCXX_KTYPE
    UPCXX_MANY_KINDS
    UPCXX_MAXEPS
    UPCXX_MAX_RPC_AM_ARGS
    UPCXX_MAX_RPC_ARG_SIZE
    UPCXX_MAX_VALUE_SIZE
    UPCXX_MPSC_PAD_SIZE
    UPCXX_MPSC_QUEUE_ATOMIC
    UPCXX_MPSC_QUEUE_BIGLOCK
    UPCXX_MPSC_QUEUE_xxx
    UPCXX_MTYPE
    UPCXX_NETWORK_IBV
    UPCXX_NETWORK_SMP
    UPCXX_NODISCARD
    UPCXX_NOINLINE
    UPCXX_NUM_AM_HANDLERS
    UPCXX_OPNEW_AS_STD
    UPCXX_OVERSUBSCRIBED
    UPCXX_PROMISE_VTABLE_HACK
    UPCXX_PTYPE
    UPCXX_REQUIRES_GEX_SPEC_VERSION_MAJOR
    UPCXX_REQUIRES_GEX_SPEC_VERSION_MINOR
    UPCXX_RETURN_DECLTYPE
    UPCXX_RPC_EAGER_THRESHOLD
    UPCXX_RPC_EAGER_THRESHOLD_DEFAULT
    UPCXX_RPC_EAGER_THRESHOLD_LOCAL
    UPCXX_RPC_EAGER_THRESHOLD_LOCAL_DEFAULT
    UPCXX_RPC_STACK_MAX_FN_SIZE
    UPCXX_SEGMENT_MB
    UPCXX_SERIALIZED_BASE
    UPCXX_SERIALIZED_DELETE
    UPCXX_SERIALIZED_FIELDS
    UPCXX_SERIALIZED_VALUES
    UPCXX_SHARED_HEAP_SIZE
    UPCXX_SPEC_VERSION
    UPCXX_STATIC_ASSERT
    UPCXX_STATIC_ASSERT_RPC_MSG
    UPCXX_STATIC_ASSERT_VALUE_RETURN_SIZE
    UPCXX_STATIC_ASSERT_VALUE_SIZE
    UPCXX_STRICT_SEGMENT
    UPCXX_STRINGIFY
    UPCXX_STRINGIFY_HELPER
    UPCXX_THREADMODE
    UPCXX_UNIFORM_LOCAL_VTABLES
    UPCXX_UPC_HEAP_COLL
    UPCXX_USE_NODISCARD
    UPCXX_USE_NPAM
    UPCXX_USE_NPAM_STATIC
    UPCXX_USE_UPC_ALLOC
    UPCXX_VERBOSE
    UPCXX_VERSION
    UPCXX_WARN_COLLECTIVE_IN_PROGRESS
    UPCXX_WARN_EMPTY
    UPCXX_WARN_EMPTY_RMA
    UPCXX_a
    

    Here’s the full set in the test code (test/, bench/, example/):

    UPCXX_ASSERT
    UPCXX_ASSERT_ALWAYS
    UPCXX_ASSERT_ENABLED
    UPCXX_BACKEND
    UPCXX_BACKEND_GASNET_PAR
    UPCXX_BACKEND_GASNET_SEQ
    UPCXX_CODEMODE
    UPCXX_CUDA_ENABLED
    UPCXX_DEFER_COMPLETION
    UPCXX_HAS_OPERATION_CX_AS_BLOCKING
    UPCXX_NOINLINE
    UPCXX_SEGMENT_MB
    UPCXX_SERIALIZED_BASE
    UPCXX_SERIALIZED_DELETE
    UPCXX_SERIALIZED_FIELDS
    UPCXX_SERIALIZED_VALUES
    UPCXX_SHARED_HEAP_SIZE
    UPCXX_SPEC_VERSION
    UPCXX_TEST_SKIPPED
    UPCXX_THREADMODE
    UPCXX_UNIFORM_LOCAL_VTABLES
    UPCXX_USE_COLOR
    UPCXX_VERSION
    

  3. Dan Bonachea reporter

    Amir - thanks for your analysis. It's worth noting that some of the identifiers listed in his comment are not actually macros and/or are not visible to user code.

    I'd like to keep environment variables separate from this issue/proposal that is focused on macros. Despite the superficial similarity in spelling, the software engineering situation with envvars is actually quite different from macros in many ways. Most importantly, all our library envvars are produced by the client (or run-scripts) and consumed (or ignored) by the library in mostly self-documenting ways, and envvar changes are usually done in ways that are either backwards compatible or produce an explanatory deprecation message; envvar changes have a pretty low chance of breaking user code in non-obvious ways. Also our envvars are deliberately control knobs, and although some settings might produce surprising or unsupportable behaviors, none of them represent internal details that "accidentally leaked out" to where users can reach them.

    I think the next step here is to build consensus on my proposal for cleaning up our macros (or discuss any counter-proposal), agree on the naming philosophy to be applied (now and going forward) and then decide the mechanics of renaming.

  4. Amir Kamil

    We discussed this on 6/29/21. The following is a summary.

    Retain with UPCXX prefix:

    • UPCXX_MAX_VALUE_SIZE: this is “self-documenting” (instructions for adjusting are in a static assert that is triggered when relevant)
    • UPCXX_MAX_RPC_ARG_SIZE: also “self-documenting”
    • UPCXX_ASSERT: add documentation to implementation-defined.md
    • UPCXX_ASSERT_ALWAYS: add documentation to implementation-defined.md
    • all environment variables, most of which are already documented in various places or are “self-documenting”

      • UPCXX_VERBOSE: add a mention in implementation-defined.md
      • UPCXX_SHARED_HEAP_SIZE: add a mention in implementation-defined.md

    All other undocumented macros and ident strings are to be renamed with the UPCXXI prefix.

    In addition, we decided to specify a UPCXX_KIND_CUDA feature macro, initially defined with the value 202103L, to be added to the UPC++ specification. This macro is only defined in a CUDA-enabled build.

    We noted that renaming UPCXX_GIT_VERSION to UPCXXI_GIT_VERSION will require a corresponding change in the pushbuild tarball generator.

    We decided to avoid internal macros in tests, except for specific unit tests, and to avoid defining new macros with the UPCXX prefix in tests.

  5. Amir Kamil

    I am reconsidering whether we should document UPCXX_VERBOSE and UPCXX_SHARED_HEAP_SIZE. These are set by upcxx-run, and I’m not sure we should be encouraging users to manually set them. Thoughts?

  6. Dan Bonachea reporter

    I am reconsidering whether we should document UPCXX_VERBOSE and UPCXX_SHARED_HEAP_SIZE. These are set by upcxx-run, and I’m not sure we should be encouraging users to manually set them. Thoughts?

    In general I agree - upcxx-run is always preferred when it applies.

    The problem is the upcxx-run interface will never be powerful to encompass all the generality of all underlying system spawners (eg jsrun, srun, etc), especially on complicated systems like Summit. upcxx-run is sufficient for single node and casual/simple use cases on straightforward clusters. However "Power users" or those running on more complex distributed hardware will eventually need to "bypass" upcxx-run and invoke the underlying spawner with job-specific options, and when they do so they'll need to manually set UPCXX_SHARED_HEAP_SIZE (almost certainly), possibly GASNET_MAX_SEGSIZE, and optionally things like UPCXX_VERBOSE. Our current documentation for this is "indirect" via upcxx-run -v -show -- see Guide section 18.0. I'm fine with keeping this current indirect form of documentation.

    In any case envvars and their docs seem 100% orthogonal to this issue (which is about macro symbols in headers), so lets please keep it distinct from this work. Feel free to open a separate documentation issue if it seems justified.

  7. Log in to comment