upcxx-run confusing failure when using smp on multiple nodes
Issue #191
resolved
When executing upcxx-run with -N > 1 but with an executable compiled for smp, the script automatically sets GASNET_PSHM_NODES to be equal to the number of ranks, i.e. -n. This can result in failures such as
*** FATAL ERROR: Nodes requested (448) > maximum (255)
There should be a clearer failure message indicating that smp is not supported with multiple nodes.
Comments (4)
-
-
reporter Yes, exactly. A warning would be good. Then it would be easy to diagnose that sort of error.
-
Proposed solution in Pull request #64
-
- changed status to resolved
issue
#191: issue a warning for upcxx-run -N w/ smp-conduitPassing
upcxx-run -N nodes
for nodes > 1 now issues a warning on smp-conduit, which does not support multi-node operation.Resolves issue
#191→ <<cset 193db619d3d0>>
- Log in to comment
@shofmeyr : the error message you've quoted has nothing to do with the
upcxx-run -N
argument or multi-node. That's the error message from GASNet smp-conduit from trying to spawn more than 255 processes (the default limit for this single-node conduit). You will get the same error forupcxx-run -n 448 my-smp-program
, without a-N
argument.FWIW, the smp-conduit process limit can be raised (at a small cost in conduit metadata memory) by configuring GASNet with
--enable-large-pshm
.The
upcxx-run -N
argument is simply ignored by upcxx-run for smp-conduit executables, because they do not support multi-node operation so only-N 1
makes sense.I think perhaps the behavior should be for upcxx-run to issue a warning if an smp-conduit executable is launched with
upcxx-run -N <nodes>
wherenodes != 1
.Thoughts?