is new release (2.1.0?) coming?
Hi Lisandro,
I wondered if new release is coming or I should try to take current development snapshot. 2.0.0 finally started to fail to build/test on Debian sid (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=830440, there were upgrades to openmpi etc), so instead of doing patch work I thought to try "bleeding edge" ;)
Comments (13)
-
-
I tried to upgrade to openmpi 2.2 and found mpi4py test issue
ompi_mpi_init: ompi_rte_init failed --> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init_thread *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [anatol:1013] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
Do you plan to make a new release that is compatible with openmpi?
-
Yes, I'm planning it. Actually, everything is ready to make it. I'm waiting for the Microsoft folks to release a MSMPI v8.1, and then make a new mpi4py release. In the meantime, I would suggest you to use a development snapshot.
-
Attempt to compile mpi4py HEAD with openmpi 2.1.1 give the same error in tests:
running test -------------------------------------------------------------------------- The value of the MCA parameter "plm_rsh_agent" was set to a path that could not be found: plm_rsh_agent: ssh : rsh Please either unset the parameter, or check that the path is correct -------------------------------------------------------------------------- [anatol:01129] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 582 [anatol:01129] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 166 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_init failed --> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: ompi_rte_init failed --> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init_thread *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [anatol:1129] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
-
Are you able to run any other MPI program? Open MPI 2.1.1 is being used in our Bitbucket Pipelines builds https://bitbucket.org/mpi4py/mpi4py/addon/pipelines/home#!/, and latest OpenMPI builds are green for all Python versions. I'm afraid that the problem is in your side. I bet it is a simple configuration problem. Do you have the
rsh
command installed? Also, check Open MPI docs, you may set the paramenterplm_rsh_agent
tossh
. -
First of all Lisandro I would like to thank you for this amazing feature and the great work you do on mpi4py. MPIExecutor is exactly what I needed as it fits into HPC workflows very nicely.
It there any plan to release this version to pypi? Currently I am just pinning a package of mine on a commit.
-
@costrouc I'm still working on some low level things as time permits. And I still have to do some testing of
mpi4py.futures
in Cray systems. -
@dalcinl I will need to check my remaining compute hours on NERSC but I would be happy to help with testing on Cray systems. Cori is a cray system. Would that be useful?
-
Sure! I would really appreciate it. Using
python -m mpi4py.futures
should of course work. Could you also try the other way? AFAIK, spawning was not supported, but maybe there was some recent upgrade I'm not aware of. -
Okay so I ran with
python -m mpi4py.futures
and without and it behaved exactly as you expected.Summary of installation. NERSC prefers to use anaconda so I had to install that way
conda create -n mpi python=3.6 source activate mpi git clone git@bitbucket.org:mpi4py/mpi4py.git cd mpi4py # change setup.cfg so that they point to correct compilers (cc, CC, ftn) python setup.py build python setup.py install
Now to jobs that I submitted for testing (they block mpi calls on login nodes for good reason).
#!/bin/bash -l #SBATCH -N 1 #SBATCH -t 00:15:00 #SBATCH -p debug #SBATCH -L SCRATCH #Job requires $SCRATCH file system #SBATCH -C haswell module load python/3.6-anaconda-4.4 source activate mpi pwd python3.6 -m pip list srun -n 1 python3.6 script.py # srun -n 16 python3.6 -m mpi4py.futures script.py
script.py
import sys from mpi4py import MPI from mpi4py.futures import MPIPoolExecutor def do_work(i): return "Did some work on %d" % i def main(): comm = MPI.COMM_WORLD rank = comm.Get_rank() size = comm.Get_size() print(size, rank) print(sys.executable) tasks = [i**2 for i in range(100)] with MPIPoolExecutor(max_workers=10) as executor: results = [] for result in executor.map(do_work, tasks): print(type(result), result) results.append(result) print(results) if __name__ == "__main__": main()
Running without
-m mpi4py.futures
lead to following error/global/homes/c/costrouc DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning. mpi4py (2.0.1a0) pip (9.0.1) setuptools (27.2.0) wheel (0.29.0) Sun Aug 20 13:02:42 2017: [PE_0]:PMI2_Job_Spawn:PMI2_Job_Spawn not implemented.
Let me know if I can be of any more help. I think this is an awesome addition to python for HPC.
-
OK, many thanks! The
PMI2_Job_Spawn
error is indeed expected. The community just needs to make a bit of pressure on Cray to support MPI process spawning :-).Could you please share here your changes to
setup.cfg
ormpi.cfg
? -
The only changes that were made were to
setup.cfg
. The beginning of the file was changed according to compiling on cori instructions. Maybe most important is that I did not change any of the files at first and I got an error along the lines of#include mpi.h
not found. To fix this you just need to make mpi4py use the correct compilers. In the case of cori these arecc
,CC
, andftn
.setup.cfg
changes[config] mpicc = cc mpicxx = CC mpifort = ftn
-
- changed status to resolved
mpi4py 3.0.0 just released!
- Log in to comment
I'm working on a new feature related to Python 3's
concurrent.futures
interface, I would like to get this in the next release, but I'm still working on it.Regarding the failure in Debian you linked, the failing test (see the full log) is not related to MPI, not to any bug in mpi4py, just a bad assumption in mpi4py's testsuite. The following patch should fix the test failure, so I would argue that the patch work is rather minimal:
https://bitbucket.org/mpi4py/mpi4py/diff/test/test_dl.py?diff2=74d8da24a9f4&at=maint