Working with multiple nodes.

Issue #285 resolved
Ngoc Phuong Chau created an issue

Dear all,

I am running my program with UPC++. I only use Global Pointers to send and receive array through the network.

I run with 1,2,4 and 8 nodes. It worked although sometimes there are some error connections and my program is topped.

However, with 16 or 32 nodes, there is the same output as below

WARNING: Found 4 IB HCAs, but GASNet was configured without multi-rail support. To utilize all your HCAs, you should reconfigure GASNet with '--enable-ibv-multirail --with-ibv-max-hcas=4'. You can silence this warning by setting the environment variable GASNET_IBV_PORTS as described in the file 'gasnet/ibv-conduit/README'.
WARNING: Found 4 IB HCAs, but GASNet was configured without multi-rail support. To utilize all your HCAs, you should reconfigure GASNet with '--enable-ibv-multirail --with-ibv-max-hcas=4'. You can silence this warning by setting the environment variable GASNET_IBV_PORTS as described in the file 'gasnet/ibv-conduit/README'.
WARNING: Found 4 IB HCAs, but GASNet was configured without multi-rail support. To utilize all your HCAs, you should reconfigure GASNet with '--enable-ibv-multirail --with-ibv-max-hcas=4'. You can silence this warning by setting the environment variable GASNET_IBV_PORTS as described in the file 'gasnet/ibv-conduit/README'.
WARNING: Found 4 IB HCAs, but GASNet was configured without multi-rail support. To utilize all your HCAs, you should reconfigure GASNet with '--enable-ibv-multirail --with-ibv-max-hcas=4'. You can silence this warning by setting the environment variable GASNET_IBV_PORTS as described in the file 'gasnet/ibv-conduit/README'.
WARNING: Found 4 IB HCAs, but GASNet was configured without multi-rail support. To utilize all your HCAs, you should reconfigure GASNet with '--enable-ibv-multirail --with-ibv-max-hcas=4'. You can silence this warning by setting the environment variable GASNET_IBV_PORTS as described in the file 'gasnet/ibv-conduit/README'.
WARNING: Found 4 IB HCAs, but GASNet was configured without multi-rail support. To utilize all your HCAs, you should reconfigure GASNet with '--enable-ibv-multirail --with-ibv-max-hcas=4'. You can silence this warning by setting the environment variable GASNET_IBV_PORTS as described in the file 'gasnet/ibv-conduit/README'.
WARNING: Found 4 IB HCAs, but GASNet was configured without multi-rail support. To utilize all your HCAs, you should reconfigure GASNet with '--enable-ibv-multirail --with-ibv-max-hcas=4'. You can silence this warning by setting the environment variable GASNET_IBV_PORTS as described in the file 'gasnet/ibv-conduit/README'.
WARNING: Found 4 IB HCAs, but GASNet was configured without multi-rail support. To utilize all your HCAs, you should reconfigure GASNet with '--enable-ibv-multirail --with-ibv-max-hcas=4'. You can silence this warning by setting the environment variable GASNET_IBV_PORTS as described in the file 'gasnet/ibv-conduit/README'.
*** Caught a signal (proc 1): SIGTERM(15)
*** Caught a signal (proc 0): SIGTERM(15)
*** Caught a signal (proc 4): SIGTERM(15)
*** Caught a signal (proc 7): SIGTERM(15)
*** Caught a signal (proc 6): SIGTERM(15)
*** Caught a signal (proc 5): SIGTERM(15)

mpirun noticed that process rank 3 with PID 0 on node c3-30 exited on signal 9 (Killed).

Is there any problem with UPCXX, GASNET or my server?

How do I configure GASNET with “--enable-ibv-multirail --with-ibv-max-hcas=4“?

Thank you so much!

Best regards,

Comments (2)

  1. Dan Bonachea

    Hi Ngoc - This warning is tell you that your nodes have network hardware that is not being used. However this is just a warning, it should not cause your program to be killed - so I think you are looking at two different problems here.

    If you want to use the multirail hardware, you can do that by reinstalling UPC++ as follows:

    env GASNET_CONFIGURE_ARGS='--enable-ibv-multirail --with-ibv-max-hcas=4' ./install /path/to/empty/directory
    

    To debug the other problem you'll need to use some debugging techniques - you've not provided enough information to know what might be wrong. Please follow this debugging guide

  2. Log in to comment