Running UPCXX on a single node

Issue #615 invalid
Esmail Abdul Fattah created an issue

Hi,

I am interested in running sympack library on a single node using upcxx. I have got this error:

abdulfe@kw60890:~/sympack/testexample/nasa2146$ export UPCXX_GASNET_CONDUIT=smp
abdulfe@kw60890:~/sympack/testexample/nasa2146$ /home/abdulfe/sympack/upcpp/upcxx-2023.3.0/bin/upcxx-run -n 1 -- /home/abdulfe/sympack/symPACK/run_sympack -in /home/abdulfe/sympack/nasa2146/nasa2146.rb -ordering METIS -nrhs 1

INFO: may need to build the required runtime. Please be patient.
No protocol specified
*** WARNING (proc 0): GASNet gasnetc_init returning an error code: GASNET_ERR_RESOURCE (Problem with requested resource)
at /home/abdulfe/sympack/upcpp/upcxx-2023.3.0/bld/GASNet-2023.3.0/ibv-conduit/gasnet_core.c:1890
reason: unable to open any HCA ports
*** WARNING (proc 0): GASNet gex_Client_Init_GASNET_202330SEQpshmFASTnodebugnotracenostatsnodebugmallocnosrclines returning an error code: GASNET_ERR_RESOURCE (Problem with requested resource)
at /home/abdulfe/sympack/upcpp/upcxx-2023.3.0/bld/GASNet-2023.3.0/ibv-conduit/gasnet_core.c:2690
*** FATAL ERROR (proc 0):
//////////////////////////////////////////////////////////////////////
UPC++ assertion failure:
on process unknown (kw60890)
at /home/abdulfe/sympack/upcpp/upcxx-2023.3.0/src/backend/gasnet/runtime.cpp:491
in function: void upcxx::init()

Failed condition: ok == 0

To have UPC++ freeze during these errors so you can attach a debugger,
rerun the program with GASNET_FREEZE_ON_ERROR=1 in the environment.
//////////////////////////////////////////////////////////////////////

*** NOTICE (proc 0): Before reporting bugs, run with GASNET_BACKTRACE=1 in the environment to generate a backtrace.

*** NOTICE (proc 0): We recommend linking the debug version of GASNet to assist you in resolving this application issue.
[kw60890:3506407] *** Process received signal ***
[kw60890:3506407] Signal: Aborted (6)
[kw60890:3506407] Signal code: (-6)
[kw60890:3506407] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x141f0)[0x7ff774d851f0]
[kw60890:3506407] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7ff76d80efbb]
[kw60890:3506407] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x116)[0x7ff76d7f4864]
[kw60890:3506407] [ 3] /home/abdulfe/sympack/symPACK/run_sympack(+0x2c73e)[0x55b42cf3573e]
[kw60890:3506407] [ 4] /home/abdulfe/sympack/symPACK/run_sympack(+0x2d8f0)[0x55b42cf368f0]
[kw60890:3506407] [ 5] /home/abdulfe/sympack/symPACK/run_sympack(+0x27791)[0x55b42cf30791]
[kw60890:3506407] [ 6] /home/abdulfe/sympack/symPACK/run_sympack(+0x277b3)[0x55b42cf307b3]
[kw60890:3506407] [ 7] /home/abdulfe/sympack/symPACK/run_sympack(+0xb8ceb)[0x55b42cfc1ceb]
[kw60890:3506407] [ 8] /home/abdulfe/sympack/symPACK/run_sympack(+0x23c57)[0x55b42cf2cc57]
[kw60890:3506407] [ 9] /home/abdulfe/sympack/symPACK/run_sympack(+0xa46b5)[0x55b42cfad6b5]
[kw60890:3506407] [10] /home/abdulfe/sympack/symPACK/run_sympack(+0x3b4a5)[0x55b42cf444a5]
[kw60890:3506407] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5)[0x7ff76d7f6565]
[kw60890:3506407] [12] /home/abdulfe/sympack/symPACK/run_sympack(+0x3d10e)[0x55b42cf4610e]
[kw60890:3506407] *** End of error message ***

Primary job terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.


mpirun noticed that process rank 0 with PID 0 on node kw60890 exited on signal 6 (Aborted).

Comments (5)

  1. Dan Bonachea

    Hello Esmail - Thanks for reaching out.

    I see evidence of several problems with your current use of UPC++:

    1. The message "INFO: may need to build the required runtime. Please be patient." indicates you are invoking UPC++ from within its own build directory, rather than an installation of UPC++. This is unofficially supported, but will compile/launch slower than installing UPC++ (make install) and using the install. See our install instructions for full details.
    2. In UPC++ the network backend selection happens at compile time, not at runtime. In particular the UPCXX_GASNET_CONDUIT=smp variable only affects behavior when compiling symPACK, setting it at runtime is "too late". The error messages show that the InfiniBand backend (ibv-conduit) was selected at compile time, and failed at runtime startup because it could not find an InfiniBand network card.
    3. UPCXX_GASNET_CONDUIT is a deprecated variable name. It's still recognized for backwards-compatibility, but the new name is UPCXX_NETWORK
    4. You can also change the default network at UPC++ configure time (configure --with-default-network=smp) or UPC++ install time (make install NETWORK=smp), so that you don't need to set either variable. This might be a good idea if your system lacks InfiniBand hardware. Again, see our install instructions for full details.

    Hopefully this helps you get past these UPC++-related problems.

    I should also note that symPACK has its own issue tracker, so if you subsequently encounter problems with the behavior of symPACK itself, that is probably a better resource for such questions.

  2. Dan Bonachea

    Sounds like this issue is resolved. Feel free to open a new issue if you encounter another problem with UPC++ installation.

  3. Log in to comment