openmpi-based mpi4py runtests.py error

Issue #78 resolved
Tim Jim created an issue

I'm not sure if this is the right place to post this but I am having trouble with running the unit tests on myu mpi4py install. I am running Ubuntu 16.04 and have got a working implementation of opempi 3.0.0 installed. Following https://mpi4py.readthedocs.io/en/stable/install.html I installed mpi4py using:

pip install mpi4py

It appears to have successfully picked up on my setup, and I could run the test

mpiexec -n 4 python -m mpi4py helloworld

I couldn't find where pip keeps the source files, so I downloaded the source from mpi4py github to run the unit tests, as suggested in the install guide. However, I had the following error below. Is this a problem? Kind regards.

tjim@DESKTOP-TA3P0PS:~/Downloads/mpi4py-master$ mpiexec -n 4 python test/runtests.py
[2@DESKTOP-TA3P0PS] Python 3.6 (/home/tjim/anaconda3/bin/python)
[2@DESKTOP-TA3P0PS] MPI 3.1 (Open MPI 3.0.0)
[3@DESKTOP-TA3P0PS] Python 3.6 (/home/tjim/anaconda3/bin/python)
[2@DESKTOP-TA3P0PS] mpi4py 2.0.0 (/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py)
[0@DESKTOP-TA3P0PS] Python 3.6 (/home/tjim/anaconda3/bin/python)
[0@DESKTOP-TA3P0PS] MPI 3.1 (Open MPI 3.0.0)
[0@DESKTOP-TA3P0PS] mpi4py 2.0.0 (/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py)
[3@DESKTOP-TA3P0PS] MPI 3.1 (Open MPI 3.0.0)
[1@DESKTOP-TA3P0PS] Python 3.6 (/home/tjim/anaconda3/bin/python)
[3@DESKTOP-TA3P0PS] mpi4py 2.0.0 (/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py)
[1@DESKTOP-TA3P0PS] MPI 3.1 (Open MPI 3.0.0)
[1@DESKTOP-TA3P0PS] mpi4py 2.0.0 (/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py)
EEEEEEEE.sE...ss........F.FEEE.EE.EFEEEEEE.s....FEEE................FFFFEEEEEEEEEEEE............FFF.EEEEEE.EEE............FFF.EEEEEE.EEE............FFFEFEEEEEEEE.E............E..EEEEEEEE.EE...F.E...E......E................F..EE.E.......EEE.....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................EEEE................E....E.E................E...........EEEE........................EEEE........................................................................................................s.............s.............s.............s................................................................................................................................................................................................................................................................................................................................................................................................................................E...E............................................................................................................................................................................E.........EE.........E....................E...E.................................................................................................................ssssssssssssssssssssssss.....................................................ssssss.........ssssss..........ssssss.........ssssss...............ssssss.......s...s.................ssssss......s...s...........ssssss.........s.....s..........Fatal Python error: exception in user-defined reduction operation
Traceback (most recent call last):
  File "MPI/opimpl.pxi", line 99, in mpi4py.MPI.op_user_mpi (src/mpi4py.MPI.c:19531)
  File "MPI/opimpl.pxi", line 90, in mpi4py.MPI.op_user_py (src/mpi4py.MPI.c:19417)
  File "test/test_op.py", line 41, in mysum
    return mysum_buf(ba, bb, dt)
  File "test/test_op.py", line 35, in mysum_buf
    b[:] = mysum_obj(asarray('i', a), asarray('i', b))
ValueError: memoryview assignment: lvalue and rvalue have different structures
Fatal Python error: exception in user-defined reduction operation
Traceback (most recent call last):
  File "MPI/opimpl.pxi", line 99, in mpi4py.MPI.op_user_mpi (src/mpi4py.MPI.c:19531)
  File "MPI/opimpl.pxi", line 90, in mpi4py.MPI.op_user_py (src/mpi4py.MPI.c:19417)
  File "test/test_op.py", line 41, in mysum
    return mysum_buf(ba, bb, dt)
  File "test/test_op.py", line 35, in mysum_buf
    b[:] = mysum_obj(asarray('i', a), asarray('i', b))
ValueError: memoryview assignment: lvalue and rvalue have different structures
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[DESKTOP-TA3P0PS:04099] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[DESKTOP-TA3P0PS:04099] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Comments (33)

  1. Lisandro Dalcin

    You should install mpi4py from the git master, pip install https://bitbucket.org/mpi4py/mpi4py/get/master.tar.gz. The master branch have some improvements that workaround issues with Python's memoryview behavior across versions.

  2. Tim Jim reporter

    @dalcinl do I need to uninstall my current setup? What is the best way to do so? Kind regards.

  3. Tim Jim reporter

    @dalcinl I tried out your suggestion above, but it seems that it has broken the helloworld test above too. When I try to run

    mpiexec -n 4 python -m mpi4py helloworld
    

    I get the following error. What may have happened?

    tjim@DESKTOP-TA3P0PS:~$ mpiexec -n 4 python -m mpi4py helloworld 
    Traceback (most recent call last):
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/__main__.py", line 7, in <module>
        main()
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/run.py", line 196, in main
        run_command_line(args)
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/run.py", line 47, in run_command_line
        run_path(sys.argv[0], run_name='__main__')
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 261, in run_path
        code, fname = _get_code_from_file(run_name, path_name)
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 236, in _get_code_from_file
        code = compile(f.read(), fname, 'exec')
    ValueError: source code string cannot contain null bytes
    Traceback (most recent call last):
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/__main__.py", line 7, in <module>
        main()
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/run.py", line 196, in main
        run_command_line(args)
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/run.py", line 47, in run_command_line
        run_path(sys.argv[0], run_name='__main__')
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 261, in run_path
        code, fname = _get_code_from_file(run_name, path_name)
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 236, in _get_code_from_file
        code = compile(f.read(), fname, 'exec')
    ValueError: source code string cannot contain null bytes
    Traceback (most recent call last):
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/__main__.py", line 7, in <module>
        main()
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/run.py", line 196, in main
        run_command_line(args)
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/run.py", line 47, in run_command_line
        run_path(sys.argv[0], run_name='__main__')
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 261, in run_path
        code, fname = _get_code_from_file(run_name, path_name)
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 236, in _get_code_from_file
        code = compile(f.read(), fname, 'exec')
    ValueError: source code string cannot contain null bytes
    Traceback (most recent call last):
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/__main__.py", line 7, in <module>
        main()
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/run.py", line 196, in main
        run_command_line(args)
      File "/home/tjim/anaconda3/lib/python3.6/site-packages/mpi4py/run.py", line 47, in run_command_line
        run_path(sys.argv[0], run_name='__main__')
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 261, in run_path
        code, fname = _get_code_from_file(run_name, path_name)
      File "/home/tjim/anaconda3/lib/python3.6/runpy.py", line 236, in _get_code_from_file
        code = compile(f.read(), fname, 'exec')
    ValueError: source code string cannot contain null bytes
    
  4. Tim Jim reporter

    @dalcinl Thanks for the update and information. The two tests you suggested appear to be running fine. Can I assume that this means I have a working install? Or is there a more comprehensive test I should run? Thanks again for the help.

  5. Lisandro Dalcin

    That should be enough, you should have a working install. You can also try to run the full testsuite with mpiexec -n 4 python test/runtests.py.

  6. Lisandro Dalcin

    Do export OMPI_MCA_rmaps_base_oversubscribe=yes before running the tests. Could you send the new output again? I saw some strange pickle failure.

  7. Lisandro Dalcin

    @timjim333 After a second look at your last test log, I think you are not running the tests from the master branch of the git repo, that should be the reason of these failures in test_pickle.py.

  8. Tim Jim reporter

    @dalcinl I am actually unsure where pip has installed mpi4py, so I just downloaded the tests from the github master. Actually, I wanted to download mpi4py mainly to use SU2; however, which installing SU2, I get the Error: Unable to find 'mpi4py/mpi4py.i'. - where is mpi4py located when installed by pip? Thanks and regards.

  9. Tim Jim reporter

    I realised that it is because SU2 uses python2.7, hence it's not picking up the mpi4py install in the 3.6 directories. Can I also use pip install https://bitbucket.org/mpi4py/mpi4py/get/master.tar.gz to install the latest mpi4py for python2.7 too?

  10. Tim Jim reporter

    I just created a python 2.7 environment and installed mpi4py as before. I noticed that for the helloworld test case on 4 cores, it prints "Hello, world...process 0 of 1 on..." 4 times. Is this the expected behavior? (I thought I remembered differently from before).

    From which directory might I be able to perform the runtests command? I'm not sure where pip puts the test directory.

  11. Lisandro Dalcin

    Something should be wrong. It seems you installed mpi4py with some MPI implementation, but then you are executing with an mpiexec command corresponding to a different MPI. About running tests, pip do not install the test scripts, you have to explicitly clone the repository with git and run them as you did before.

  12. Tim Jim reporter

    Is there any way to find out which mpi implementation mpiexec is calling? Will rerunning the installs of openmpi and mpi4py solve this issue?

  13. Lisandro Dalcin

    You will have to figure out yourself, try mpiexec --help or mpiexec -help. Rerunning the installs may help, but maybe you have a mess in your environment or system. You should check that the mpicc and mpiexec commands are in the same directory path and they correspond each other.

  14. Tim Jim reporter

    Yes - you were completely correct; the wrong mpiexec was being picked up. For reference for anyone who has Paraview installed, please see below.

    Running which mpicc showed my openmpi install, at /opt/openmpi/openmpi-3.0.0/bin/mpicc but which mpiexec had hooked onto my Paraview mpiexec at /opt/paraview/ParaView-5.4.1-Qt5-OpenGL2-MPI-Linux-64bit/bin/mpiexec. Swapping my $PATH order solved the issue and running the process again gives the expected behaviour:

    tjim@DESKTOP-TA3P0PS:/opt/SU2/SU2v5.0.0/bin$ mpiexec -n 4 python -m mpi4py.bench helloworld
    Hello, World! I am process 0 of 4 on DESKTOP-TA3P0PS.
    Hello, World! I am process 1 of 4 on DESKTOP-TA3P0PS.
    Hello, World! I am process 2 of 4 on DESKTOP-TA3P0PS.
    Hello, World! I am process 3 of 4 on DESKTOP-TA3P0PS.
    
  15. Tim Jim reporter

    @dalcinl I'm getting some strange behaviour when running in a conda environment - since the project I need to run uses python2.7, I have set up a new conda environemnt which defaults to python2.7. I installed mpi4py in this environment as before and (with the environment active) it successfully runs mpiexec -n 4 python -m mpi4py.bench helloworld and ringtest, but for some reason, it fails when attempting to run runtests.py with the following:

    (su2) tjim@DESKTOP-TA3P0PS:/opt/mpi4py/mpi4py_src$ mpiexec -n 4 python test/runtests.py
    Traceback (most recent call last):
      File "test/runtests.py", line 240, in <module>
        sys.exit(main())
      File "test/runtests.py", line 227, in main
        package = import_package(options, pkgname)
      File "test/runtests.py", line 103, in import_package
        import mpi4py.MPI
    ImportError: No module named MPI
    Traceback (most recent call last):
      File "test/runtests.py", line 240, in <module>
        sys.exit(main())
      File "test/runtests.py", line 227, in main
        package = import_package(options, pkgname)
      File "test/runtests.py", line 103, in import_package
        import mpi4py.MPI
    ImportError: No module named MPI
    Traceback (most recent call last):
      File "test/runtests.py", line 240, in <module>
        sys.exit(main())
      File "test/runtests.py", line 227, in main
        package = import_package(options, pkgname)
      File "test/runtests.py", line 103, in import_package
        import mpi4py.MPI
    ImportError: No module named MPI
    Traceback (most recent call last):
      File "test/runtests.py", line 240, in <module>
        sys.exit(main())
      File "test/runtests.py", line 227, in main
        package = import_package(options, pkgname)
      File "test/runtests.py", line 103, in import_package
        import mpi4py.MPI
    ImportError: No module named MPI
    

    This is strange since I can start up python in the terminal and successfully import mpi4py.MPI

    Is this something you have come across?

  16. Lisandro Dalcin

    Are you sure the python command corresponds with the one in your conda environment? I really don't know, you have to understand it is quite hard for me to guess all the issues you may have without access to your machine. Try running mpiexec -n 4 python -c "import sys;print(sys.executable)" to be sure you are using the right Python binary.

  17. Tim Jim reporter

    Yes, I'm fairly sure it the right python

    (su2) tjim@DESKTOP-TA3P0PS:/opt/mpi4py/mpi4py_src$ mpiexec -n 4 python -c "import sys;print(sys.executable)"
    /home/tjim/anaconda3/envs/su2/bin/python
    /home/tjim/anaconda3/envs/su2/bin/python
    /home/tjim/anaconda3/envs/su2/bin/python
    /home/tjim/anaconda3/envs/su2/bin/python
    

    This is the same python location as the one I get when calling which python with the environment activated.

    I had a check and it seems that mpi4py is installed in /home/tjim/anaconda3/envs/su2/lib/python2.7/site-packages/mpi4py/include/mpi4py/.

    Thanks for the support so far - I appreciate it's a bit difficult to debug without being here!

  18. Tim Jim reporter

    Ok, for some reason, that solved the issue. Here is the output:

    (su2) tjim@DESKTOP-TA3P0PS:/opt/mpi4py/mpi4py_src$ git clean -dxf
    Removing .eggs/
    Removing build/
    Removing conf/cythonize.pyc
    Removing conf/mpiconfig.pyc
    Removing conf/mpidistutils.pyc
    Removing conf/mpiregexes.pyc
    Removing conf/mpiscanner.pyc
    Removing src/lib-mpi/config/config.h
    Removing src/mpi4py.MPI.c
    Removing src/mpi4py/include/mpi4py/mpi4py.MPI.h
    Removing src/mpi4py/include/mpi4py/mpi4py.MPI_api.h
    

    After that, mpiexec -n 4 python test/runtests.py runs just fine (aside from the pickle errors seen earlier)

  19. Lisandro Dalcin

    So you are still getting the pickle errors? Can you run git checkout master && git pull and try again?

  20. Tim Jim reporter

    Yes, I'm still getting them at the moment. I'm getting the below message on running git checkout master && git pull - I suppose I don't need to commit any changes

    (su2) tjim@DESKTOP-TA3P0PS:/opt/mpi4py/mpi4py_src$ git checkout master && git pull
    M   mpi.cfg
    Already on 'master'
    Your branch is up-to-date with 'origin/master'.
    remote: Counting objects: 667, done.
    remote: Compressing objects: 100% (428/428), done.
    remote: Total 667 (delta 545), reused 312 (delta 235)
    Receiving objects: 100% (667/667), 85.75 KiB | 0 bytes/s, done.
    Resolving deltas: 100% (545/545), completed with 135 local objects.
    From https://bitbucket.org/mpi4py/mpi4py
       5845476..0a6ee1d  master     -> origin/master
    Updating 5845476..0a6ee1d
    error: Your local changes to the following files would be overwritten by merge:
        mpi.cfg
    Please, commit your changes or stash them before you can merge.
    Aborting
    
  21. Tim Jim reporter

    On pulling the latest and rerunning runtests.py, I no longer have the pickle errors. I think all is good! output. That's a working install then?

  22. Log in to comment