Consider renaming internal UPCXX_ macros
I'm starting to grow concerned over the growth and conflation of our UPC++ macro space. I'll categorize the current usages as follows:
Cat 1: We have around a half-dozen "specified" UPCXX_ macros, eg:
UPCXX_SERIALIZED_FIELDS
UPCXX_SPEC_VERSION
UPCXX_VERSION
UPCXX_DEFER_COMPLETION
Cat 2. We have a set of macros that are explicitly documented as implementation-defined:
UPCXX_THREADMODE
UPCXX_CODEMODE
UPCXX_NETWORK_*
Cat 3. Another set of macros that are not explicitly documented, but probably should be:
UPCXX_CUDA_ENABLED
UPCXX_MAX_VALUE_SIZE
UPCXX_MAX_RPC_ARG_SIZE
Cat 4. Another set of macros that are undocumented, but used in tests and are relatively "stable":
UPCXX_ASSERT_ALWAYS
UPCXX_ASSERT_ENABLED
- side-note, should probably be replaced with UPCXX_CODEMODE in testsUPCXX_BACKEND_GASNET_SEQ
- side-note, should probably be replaced with UPCXX_THREADMODE in testsUPCXX_BACKEND
- side-note, should probably be excised from tests
Cat 5. Finally there's a rapidly growing list of undocumented macros that are entirely internal to the implementation, that have no guarantees of stability and are not intended for user code:
- Nearly everything in gen_include/upcxx/upcxx_config.hpp, eg :
UPCXX_ISSUE400_WORKAROUND
UPCXX_HAVE___BUILTIN_ASSUME_ALIGNED
UPCXX_GASNET_NATIVE_NP_ALLOC_REQ_MEDIUM
UPCXX_MPSC_QUEUE_ATOMIC
UPCXX_UNIFORM_LOCAL_VTABLES
UPCXX_HAS_OPERATION_CX_AS_BLOCKING
UPCXX_AD_METHODS
UPCXX_GPTR_CHK
UPCXX_ETYPE
UPCXX_ASSERT_MASTER_CURRENT_IFSEQ
- Around 100 other macros used internally by the UPC++ headers
My concern is that currently these all live in the same UPCXX_
macro namespace, providing no obvious distinction regarding which of the five categories a given macro falls into. Worse, the lack of documentation for category 3..5 means that a user "fishing" for undocumented macros has no way to distinguish whether a given macro falls into category 3 or 5.
I propose the following (in priority order):
- We rename all the macros in category 5 into a unique prefix, for example
UPCXXI_
(where the "I" stands for "internal"). This has the purpose of immediately breaking any external code relying on these macros we really don't want used outside the runtime, and "giving notice" to all future macro fishermen that these are unstable. - Update implementation-defined.md to clearly state that all
UPCXXI_*
macros are off-limits to client code, and that client-code should never define symbols intoUPCXX_*
orUPCXXI_*
(except where a given symbol is explicitly documented as a user-provided setting) - Add documentation for category 3 macros in implementation-defined.md (promoting them to category 2)
- "Prune" category 4 by replacing test uses with documented macros, and possibly even replacing outdated ones in the implementation.
Comments (7)
-
-
Here’s my audit of the current situation.
Macros defined by the UPC++ spec
UPCXX_SERIALIZED_BASE
UPCXX_SERIALIZED_DELETE
UPCXX_SERIALIZED_FIELDS
UPCXX_SERIALIZED_VALUES
UPCXX_SPEC_VERSION
UPCXX_VERSION
Macros defined in
implementation-defined.md
UPCXX_THREADMODE
UPCXX_CODEMODE
UPCXX_NETWORK_*
UPCXX_DEFER_COMPLETION
Macros that probably should be defined (@Dan Bonachea 's category 3)
UPCXX_CUDA_ENABLED
UPCXX_MAX_VALUE_SIZE
UPCXX_MAX_RPC_ARG_SIZE
Macros that are stable (@Dan Bonachea 's category 4)
UPCXX_ASSERT
UPCXX_ASSERT_ALWAYS
UPCXX_ASSERT_ENABLED
UPCXX_BACKEND
UPCXX_BACKEND_GASNET_PAR
UPCXX_BACKEND_GASNET_SEQ
Environment variables (currently undocumented)
UPCXX_BUG4148_WORKAROUND
UPCXX_USE_UPC_ALLOC
UPCXX_UPC_HEAP_COLL
UPCXX_VERBOSE
UPCXX_SHARED_HEAP_SIZE
UPCXX_SEGMENT_MB
UPCXX_RPC_EAGER_THRESHOLD
UPCXX_RPC_EAGER_THRESHOLD_LOCAL
UPCXX_OVERSUBSCRIBED
UPCXX_WARN_COLLECTIVE_IN_PROGRESS
UPCXX_WARN_EMPTY_RMA
Additional macros used in tests (in addition to the 4 macro categories above)
UPCXX_HAS_OPERATION_CX_AS_BLOCKING
UPCXX_NOINLINE
UPCXX_UNIFORM_LOCAL_VTABLES
Environment variables used in tests
UPCXX_SEGMENT_MB
UPCXX_SHARED_HEAP_SIZE
Macros only used in tests (not in the upcxx source)
UPCXX_USE_COLOR
Macros that we might consider defining (in addition to @Dan Bonachea 's)
UPCXX_MAX_RPC_AM_ARGS
UPCXX_STRICT_SEGMENT
UPCXX_GIT_VERSION
UPCXX_RPC_STACK_MAX_FN_SIZE
Ident strings (global variables)
UPCXX_IdentString_AssertEnabled
UPCXX_IdentString_BuildTimestamp
UPCXX_IdentString_CUDAEnabled
UPCXX_IdentString_CUDAGASNet
UPCXX_IdentString_CodeMode
UPCXX_IdentString_CompilerID
UPCXX_IdentString_CompilerStd
UPCXX_IdentString_GASNetVersion
UPCXX_IdentString_GitVersion
UPCXX_IdentString_LibraryVersion
UPCXX_IdentString_MPSCQueue
UPCXX_IdentString_Network
UPCXX_IdentString_SpecVersion
UPCXX_IdentString_ThreadMode
Here’s the full set of identifiers and envvars in the source code that start with
UPCXX_
:UPCXX_AD_ANYTYPE UPCXX_AD_INTONLY UPCXX_AD_METHODS UPCXX_ALL_RANKS_DEFINITELY_LOCAL UPCXX_AM_INDEX_BASE UPCXX_ASSERT UPCXX_ASSERT_ UPCXX_ASSERT_1 UPCXX_ASSERT_2 UPCXX_ASSERT_ALWAYS UPCXX_ASSERT_ALWAYS_MASTER UPCXX_ASSERT_COLLECTIVE_SAFE UPCXX_ASSERT_COLLECTIVE_SAFE_NAMED UPCXX_ASSERT_DISPATCH UPCXX_ASSERT_ENABLED UPCXX_ASSERT_INIT UPCXX_ASSERT_INIT_NAMED UPCXX_ASSERT_MASTER UPCXX_ASSERT_MASTER_CURRENT_IFSEQ UPCXX_ASSERT_MASTER_HELD_IFSEQ UPCXX_ASSERT_VALID_DEFINITELY_LOCAL UPCXX_BACKEND UPCXX_BACKEND_GASNET UPCXX_BACKEND_GASNET_PAR UPCXX_BACKEND_GASNET_SEQ UPCXX_BUG4148_WORKAROUND UPCXX_CODEMODE UPCXX_COLLECTIVES_IN_PROGRESS UPCXX_COMAPRE_OP UPCXX_COMPARATOR UPCXX_COMPARE_OP UPCXX_CONCAT UPCXX_CONCAT_ UPCXX_COPY_OPTIMIZEHOST UPCXX_COPY_PROMOTEPRIVATE UPCXX_CREDUCE_SLIM UPCXX_CUDA_ENABLED UPCXX_CUDA_USE_MK UPCXX_DECAYED_GP UPCXX_DEFER_COMPLETION UPCXX_EAGER_DEFAULT UPCXX_ETYPE UPCXX_FATAL_ERROR UPCXX_FUNC UPCXX_GASNET_NATIVE_NP_ALLOC_REQ_MEDIUM UPCXX_GIT_VERSION UPCXX_GPTR_CHECK_ALIGNMENT UPCXX_GPTR_CHECK_ENABLED UPCXX_GPTR_CHECK_SCALE UPCXX_GPTR_CHK UPCXX_GPTR_CHK_NONNULL UPCXX_GPTYPE UPCXX_HARDCODE_STD_HASH UPCXX_HAS_OPERATION_CX_AS_BLOCKING UPCXX_HAVE_RTTI UPCXX_HAVE___BUILTIN_ASSUME_ALIGNED UPCXX_HAVE___BUILTIN_LAUNDER UPCXX_HIDDEN_AM_CONCURRENCY_LEVEL UPCXX_INFIX_OP UPCXX_INTERNAL_ONLY UPCXX_INVOKE_UB UPCXX_IN_GLOBAL_PTR_HPP UPCXX_ISSUE400_WORKAROUND UPCXX_ISSUE_485_SLOW_THE_ALWAYS UPCXX_IdentString_AssertEnabled UPCXX_IdentString_BuildTimestamp UPCXX_IdentString_CUDAEnabled UPCXX_IdentString_CUDAGASNet UPCXX_IdentString_CodeMode UPCXX_IdentString_CompilerID UPCXX_IdentString_CompilerStd UPCXX_IdentString_GASNetVersion UPCXX_IdentString_GitVersion UPCXX_IdentString_LibraryVersion UPCXX_IdentString_MPSCQueue UPCXX_IdentString_Network UPCXX_IdentString_SpecVersion UPCXX_IdentString_ThreadMode UPCXX_KTYPE UPCXX_MANY_KINDS UPCXX_MAXEPS UPCXX_MAX_RPC_AM_ARGS UPCXX_MAX_RPC_ARG_SIZE UPCXX_MAX_VALUE_SIZE UPCXX_MPSC_PAD_SIZE UPCXX_MPSC_QUEUE_ATOMIC UPCXX_MPSC_QUEUE_BIGLOCK UPCXX_MPSC_QUEUE_xxx UPCXX_MTYPE UPCXX_NETWORK_IBV UPCXX_NETWORK_SMP UPCXX_NODISCARD UPCXX_NOINLINE UPCXX_NUM_AM_HANDLERS UPCXX_OPNEW_AS_STD UPCXX_OVERSUBSCRIBED UPCXX_PROMISE_VTABLE_HACK UPCXX_PTYPE UPCXX_REQUIRES_GEX_SPEC_VERSION_MAJOR UPCXX_REQUIRES_GEX_SPEC_VERSION_MINOR UPCXX_RETURN_DECLTYPE UPCXX_RPC_EAGER_THRESHOLD UPCXX_RPC_EAGER_THRESHOLD_DEFAULT UPCXX_RPC_EAGER_THRESHOLD_LOCAL UPCXX_RPC_EAGER_THRESHOLD_LOCAL_DEFAULT UPCXX_RPC_STACK_MAX_FN_SIZE UPCXX_SEGMENT_MB UPCXX_SERIALIZED_BASE UPCXX_SERIALIZED_DELETE UPCXX_SERIALIZED_FIELDS UPCXX_SERIALIZED_VALUES UPCXX_SHARED_HEAP_SIZE UPCXX_SPEC_VERSION UPCXX_STATIC_ASSERT UPCXX_STATIC_ASSERT_RPC_MSG UPCXX_STATIC_ASSERT_VALUE_RETURN_SIZE UPCXX_STATIC_ASSERT_VALUE_SIZE UPCXX_STRICT_SEGMENT UPCXX_STRINGIFY UPCXX_STRINGIFY_HELPER UPCXX_THREADMODE UPCXX_UNIFORM_LOCAL_VTABLES UPCXX_UPC_HEAP_COLL UPCXX_USE_NODISCARD UPCXX_USE_NPAM UPCXX_USE_NPAM_STATIC UPCXX_USE_UPC_ALLOC UPCXX_VERBOSE UPCXX_VERSION UPCXX_WARN_COLLECTIVE_IN_PROGRESS UPCXX_WARN_EMPTY UPCXX_WARN_EMPTY_RMA UPCXX_a
Here’s the full set in the test code (
test/
,bench/
,example/
):UPCXX_ASSERT UPCXX_ASSERT_ALWAYS UPCXX_ASSERT_ENABLED UPCXX_BACKEND UPCXX_BACKEND_GASNET_PAR UPCXX_BACKEND_GASNET_SEQ UPCXX_CODEMODE UPCXX_CUDA_ENABLED UPCXX_DEFER_COMPLETION UPCXX_HAS_OPERATION_CX_AS_BLOCKING UPCXX_NOINLINE UPCXX_SEGMENT_MB UPCXX_SERIALIZED_BASE UPCXX_SERIALIZED_DELETE UPCXX_SERIALIZED_FIELDS UPCXX_SERIALIZED_VALUES UPCXX_SHARED_HEAP_SIZE UPCXX_SPEC_VERSION UPCXX_TEST_SKIPPED UPCXX_THREADMODE UPCXX_UNIFORM_LOCAL_VTABLES UPCXX_USE_COLOR UPCXX_VERSION
-
reporter Amir - thanks for your analysis. It's worth noting that some of the identifiers listed in his comment are not actually macros and/or are not visible to user code.
I'd like to keep environment variables separate from this issue/proposal that is focused on macros. Despite the superficial similarity in spelling, the software engineering situation with envvars is actually quite different from macros in many ways. Most importantly, all our library envvars are produced by the client (or run-scripts) and consumed (or ignored) by the library in mostly self-documenting ways, and envvar changes are usually done in ways that are either backwards compatible or produce an explanatory deprecation message; envvar changes have a pretty low chance of breaking user code in non-obvious ways. Also our envvars are deliberately control knobs, and although some settings might produce surprising or unsupportable behaviors, none of them represent internal details that "accidentally leaked out" to where users can reach them.
I think the next step here is to build consensus on my proposal for cleaning up our macros (or discuss any counter-proposal), agree on the naming philosophy to be applied (now and going forward) and then decide the mechanics of renaming.
-
We discussed this on 6/29/21. The following is a summary.
Retain with
UPCXX
prefix:UPCXX_MAX_VALUE_SIZE
: this is “self-documenting” (instructions for adjusting are in a static assert that is triggered when relevant)UPCXX_MAX_RPC_ARG_SIZE
: also “self-documenting”UPCXX_ASSERT
: add documentation toimplementation-defined.md
UPCXX_ASSERT_ALWAYS
: add documentation toimplementation-defined.md
-
all environment variables, most of which are already documented in various places or are “self-documenting”
UPCXX_VERBOSE
: add a mention inimplementation-defined.md
UPCXX_SHARED_HEAP_SIZE
: add a mention inimplementation-defined.md
All other undocumented macros and ident strings are to be renamed with the
UPCXXI
prefix.In addition, we decided to specify a
UPCXX_KIND_CUDA
feature macro, initially defined with the value202103L
, to be added to the UPC++ specification. This macro is only defined in a CUDA-enabled build.We noted that renaming
UPCXX_GIT_VERSION
toUPCXXI_GIT_VERSION
will require a corresponding change in the pushbuild tarball generator.We decided to avoid internal macros in tests, except for specific unit tests, and to avoid defining new macros with the
UPCXX
prefix in tests. -
I am reconsidering whether we should document
UPCXX_VERBOSE
andUPCXX_SHARED_HEAP_SIZE
. These are set byupcxx-run
, and I’m not sure we should be encouraging users to manually set them. Thoughts? -
reporter I am reconsidering whether we should document UPCXX_VERBOSE and UPCXX_SHARED_HEAP_SIZE. These are set by upcxx-run, and I’m not sure we should be encouraging users to manually set them. Thoughts?
In general I agree - upcxx-run is always preferred when it applies.
The problem is the upcxx-run interface will never be powerful to encompass all the generality of all underlying system spawners (eg jsrun, srun, etc), especially on complicated systems like Summit. upcxx-run is sufficient for single node and casual/simple use cases on straightforward clusters. However "Power users" or those running on more complex distributed hardware will eventually need to "bypass" upcxx-run and invoke the underlying spawner with job-specific options, and when they do so they'll need to manually set
UPCXX_SHARED_HEAP_SIZE
(almost certainly), possiblyGASNET_MAX_SEGSIZE
, and optionally things likeUPCXX_VERBOSE
. Our current documentation for this is "indirect" viaupcxx-run -v -show
-- see Guide section 18.0. I'm fine with keeping this current indirect form of documentation.In any case envvars and their docs seem 100% orthogonal to this issue (which is about macro symbols in headers), so lets please keep it distinct from this work. Feel free to open a separate documentation issue if it seems justified.
-
reporter - changed status to resolved
Resolved in pull request 368
- Log in to comment
Except for some mild skepticism regarding the "side-note"s for category 4, I fully agree with this proposal. In fact, I had a similar though when reviewing recent PRs which added new identifiers to category 5.
Given the scope of the task, doing this soon (while still months from a release) seems advisable.
P.S. LOL at "macro fishermen"