is new release (2.1.0?) coming?

Issue #50 resolved
Former user created an issue

Hi Lisandro,

I wondered if new release is coming or I should try to take current development snapshot. 2.0.0 finally started to fail to build/test on Debian sid (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=830440, there were upgrades to openmpi etc), so instead of doing patch work I thought to try "bleeding edge" ;)

Comments (13)

  1. Lisandro Dalcin

    I'm working on a new feature related to Python 3's concurrent.futures interface, I would like to get this in the next release, but I'm still working on it.

    Regarding the failure in Debian you linked, the failing test (see the full log) is not related to MPI, not to any bug in mpi4py, just a bad assumption in mpi4py's testsuite. The following patch should fix the test failure, so I would argue that the patch work is rather minimal:

    https://bitbucket.org/mpi4py/mpi4py/diff/test/test_dl.py?diff2=74d8da24a9f4&at=maint

  2. Anatol Anatol

    I tried to upgrade to openmpi 2.2 and found mpi4py test issue

      ompi_mpi_init: ompi_rte_init failed
      --> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
    --------------------------------------------------------------------------
    *** An error occurred in MPI_Init_thread
    *** on a NULL communicator
    *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
    ***    and potentially your MPI job)
    [anatol:1013] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
    

    Do you plan to make a new release that is compatible with openmpi?

  3. Lisandro Dalcin

    Yes, I'm planning it. Actually, everything is ready to make it. I'm waiting for the Microsoft folks to release a MSMPI v8.1, and then make a new mpi4py release. In the meantime, I would suggest you to use a development snapshot.

  4. Anatol Anatol

    Attempt to compile mpi4py HEAD with openmpi 2.1.1 give the same error in tests:

    running test
    --------------------------------------------------------------------------
    The value of the MCA parameter "plm_rsh_agent" was set to a path
    that could not be found:
    
      plm_rsh_agent: ssh : rsh
    
    Please either unset the parameter, or check that the path is correct
    --------------------------------------------------------------------------
    [anatol:01129] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 582
    [anatol:01129] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 166
    --------------------------------------------------------------------------
    It looks like orte_init failed for some reason; your parallel process is
    likely to abort.  There are many reasons that a parallel process can
    fail during orte_init; some of which are due to configuration or
    environment problems.  This failure appears to be an internal failure;
    here's some additional information (which may only be relevant to an
    Open MPI developer):
    
      orte_ess_init failed
      --> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
    --------------------------------------------------------------------------
    --------------------------------------------------------------------------
    It looks like MPI_INIT failed for some reason; your parallel process is
    likely to abort.  There are many reasons that a parallel process can
    fail during MPI_INIT; some of which are due to configuration or environment
    problems.  This failure appears to be an internal failure; here's some
    additional information (which may only be relevant to an Open MPI
    developer):
    
      ompi_mpi_init: ompi_rte_init failed
      --> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
    --------------------------------------------------------------------------
    *** An error occurred in MPI_Init_thread
    *** on a NULL communicator
    *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
    ***    and potentially your MPI job)
    [anatol:1129] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
    
  5. Lisandro Dalcin

    Are you able to run any other MPI program? Open MPI 2.1.1 is being used in our Bitbucket Pipelines builds https://bitbucket.org/mpi4py/mpi4py/addon/pipelines/home#!/, and latest OpenMPI builds are green for all Python versions. I'm afraid that the problem is in your side. I bet it is a simple configuration problem. Do you have the rsh command installed? Also, check Open MPI docs, you may set the paramenter plm_rsh_agent to ssh.

  6. Christopher Ostrouchov

    First of all Lisandro I would like to thank you for this amazing feature and the great work you do on mpi4py. MPIExecutor is exactly what I needed as it fits into HPC workflows very nicely.

    It there any plan to release this version to pypi? Currently I am just pinning a package of mine on a commit.

  7. Lisandro Dalcin

    @costrouc I'm still working on some low level things as time permits. And I still have to do some testing of mpi4py.futures in Cray systems.

  8. Christopher Ostrouchov

    @dalcinl I will need to check my remaining compute hours on NERSC but I would be happy to help with testing on Cray systems. Cori is a cray system. Would that be useful?

  9. Lisandro Dalcin

    Sure! I would really appreciate it. Using python -m mpi4py.futures should of course work. Could you also try the other way? AFAIK, spawning was not supported, but maybe there was some recent upgrade I'm not aware of.

  10. Christopher Ostrouchov

    Okay so I ran with python -m mpi4py.futures and without and it behaved exactly as you expected.

    Summary of installation. NERSC prefers to use anaconda so I had to install that way

    conda create -n mpi python=3.6
    source activate mpi
    git clone git@bitbucket.org:mpi4py/mpi4py.git
    cd mpi4py
    # change setup.cfg so that they point to correct compilers (cc, CC, ftn)
    python setup.py build
    python setup.py install
    

    Now to jobs that I submitted for testing (they block mpi calls on login nodes for good reason).

    #!/bin/bash -l
    
    #SBATCH -N 1
    #SBATCH -t 00:15:00
    #SBATCH -p debug
    #SBATCH -L SCRATCH   #Job requires $SCRATCH file system
    #SBATCH -C haswell
    
    module load python/3.6-anaconda-4.4
    source activate mpi
    
    pwd
    python3.6 -m pip list
    
    srun -n 1 python3.6 script.py
    # srun -n 16 python3.6 -m mpi4py.futures script.py
    

    script.py

    import sys
    
    from mpi4py import MPI
    from mpi4py.futures import MPIPoolExecutor
    
    def do_work(i):
        return "Did some work on %d" % i
    
    def main():
        comm = MPI.COMM_WORLD
        rank = comm.Get_rank()
        size = comm.Get_size()
    
        print(size, rank)
        print(sys.executable)
    
        tasks = [i**2 for i in range(100)]
    
        with MPIPoolExecutor(max_workers=10) as executor:
            results = []
            for result in executor.map(do_work, tasks):
                print(type(result), result)
                results.append(result)
        print(results)
    
    if __name__ == "__main__":
        main()
    

    Running without -m mpi4py.futures lead to following error

    /global/homes/c/costrouc
    DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
    mpi4py (2.0.1a0)
    pip (9.0.1)
    setuptools (27.2.0)
    wheel (0.29.0)
    Sun Aug 20 13:02:42 2017: [PE_0]:PMI2_Job_Spawn:PMI2_Job_Spawn not implemented.
    

    Let me know if I can be of any more help. I think this is an awesome addition to python for HPC.

  11. Lisandro Dalcin

    OK, many thanks! The PMI2_Job_Spawn error is indeed expected. The community just needs to make a bit of pressure on Cray to support MPI process spawning :-).

    Could you please share here your changes to setup.cfg or mpi.cfg?

  12. Christopher Ostrouchov

    The only changes that were made were to setup.cfg. The beginning of the file was changed according to compiling on cori instructions. Maybe most important is that I did not change any of the files at first and I got an error along the lines of #include mpi.h not found. To fix this you just need to make mpi4py use the correct compilers. In the case of cori these are cc, CC, and ftn.

    setup.cfg changes

    [config]
    mpicc   = cc
    mpicxx  = CC
    mpifort = ftn
    
  13. Log in to comment