MPI_THREAD_MULTIPLE vs MPI_THREAD_SINGLE
Hey there,
I just had a look on src/python.c
line 33 folowing:
33 static int
34 PyMPI_Main(int argc, char **argv)
35 {
36 int sts=0, flag=1, finalize=0;
37
38 /* MPI initalization */
39 (void)MPI_Initialized(&flag);
40 if (!flag) {
41 #if defined(MPI_VERSION) && (MPI_VERSION > 1)
42 int required = MPI_THREAD_MULTIPLE;
43 int provided = MPI_THREAD_SINGLE;
44 (void)MPI_Init_thread(&argc, &argv, required, &provided);
45 #else
46 (void)MPI_Init(&argc, &argv);
47 #endif
48 finalize = 1;
49 }
I wondered why MPI_THREAD_MULTIPLE
is required?
The MPI doc says:
MPI_THREAD_MULTIPLE Multiple threads may call MPI, with no restrictions.
[MPI doc page 488: http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf]
However, wouldn't be MPI_THREAD_SINGLE
be sufficient? As I understand multithreading with python is pointless anyway (https://wiki.python.org/moin/GlobalInterpreterLock).
I realised this while tracing an MPI application using Score-P (https://github.com/score-p/scorep_binding_python/tree/mpi2), which warns about the MPI thread level.
Best,
Andreas
Comments (6)
-
-
- changed status to wontfix
-
reporter Multithreading in Python is not pointless, otherwise Python would not have a threading module, right?. While it is true that the GIL prevents Python bytecode (or Python C/API code) for running concurrently, once a third-party pure C/C++/Fortran library is let run with the GIL released, concurrent running of thread is indeed possible, as long as the library does not call back into Python.
There is a point
.
mpi4py itself (in the new mpi4py.futures module) is able to make concurrent MPI calls, and that requires MPI_THREAD_MULTIPLE support. FYI, if you call let say comm.Send(), mpi4py endups releasing the GIL to perform the (blocking) MPI_Send() call. Look at the example in the demo/threads directory, that code would not work if the threads were not able to run concurrently (and that happens thanks to mpi4py releasing the GIL).
Interesting, I'll have a look on this.
All that being said, I acknowledge that src\python.c should have a user-controlled way (command line? environment variable?) to call MPI_Init() (or request a different level of thread support) instead. I would happily accept a patch providing such features.
I'll have a look at this as soon as I have some time left. So something like
mpirun python my_mpi.py --thread=single
would be ok for you?Best,
Andreas
-
Yes. However note that it should be
mpiexec python --mpi-thread-level=single script.py
, ie., the flag has to be passed to the Python executable before the script, and you should intercept and remove the flag before callingPy_Main()
, which might prove a bit difficult to get it right (unless you restrict the flag to be exactly inargv[1]
).In Python 3, this could be handled with a
-X
option, i.e something like-X mpi-thread-level=single". This could be even hacked to be supported in Python 2 (trough interception and removal of the arg). As you see, lots of tiny details for a feature that may not pay off.
MPI_Init_thread` was added to MPI about 20 years ago. MPI implementations should just support it, we are in 2017 in a multicore world, there is no excuse!!Years ago (2008) I asked Bill Gropp (father of MPI) about this choice in mpi4py, and he replied:
I recommend that you initialize with MPI_Init_thread and MPI_THREAD_MULTIPLE . There is some overhead, but it is mainly an added latency and is thus most important for short messages. You can give users that want to optimize the option to select a lower level of thread support. At 5k entries, on a cluster, the added latency should not be too serious.
-
I would like to reopen this ticket.
I have compiled mpi4py from source on our cluster (and on our compute nodes, hyperthreading is disabled), using the following module list (and OpenMPI/4.0.0 which is CUDA-aware).
Currently Loaded Modules: 1) GCCcore/6.4.0 8) libpciaccess/0.14-GCCcore-6.4.0 15) Tcl/8.6.8-GCCcore-6.4.0 2) binutils/2.28-GCCcore-6.4.0 9) hwloc/2.0.2-GCCcore-6.4.0 16) SQLite/3.21.0-GCCcore-6.4.0 3) GCC/6.4.0-2.28 10) CUDA/10.0.130 17) GMP/6.1.2-GCCcore-6.4.0 4) zlib/1.2.11-GCCcore-6.4.0 11) OpenMPI/4.0.0-GCC-6.4.0-2.28 18) libffi/3.2.1-GCCcore-6.4.0 5) numactl/2.0.11-GCCcore-6.4.0 12) bzip2/1.0.6-GCCcore-6.4.0 19) Python/3.6.5-GCCcore-6.4.0-bare 6) XZ/5.2.3-GCCcore-6.4.0 13) libreadline/7.0-GCCcore-6.4.0 20) mpi4py/3.0.1-GCC-6.4.0-2.28 7) libxml2/2.9.7-GCCcore-6.4.0 14) ncurses/6.0-GCCcore-6.4.0
However, when I try to run the basic mpi4py test example, I get the following error:
$ mpiexec -n 2 python -m mpi4py.bench helloworld
[r23i13n19:35098] pml_ucx.c:228 Error: UCP worker does not support MPI_THREAD_MULTIPLE
[r23i13n19:35099] pml_ucx.c:228 Error: UCP worker does not support MPI_THREAD_MULTIPLE
Hello, World! I am process 0 of 2 on r23i13n19.
Hello, World! I am process 1 of 2 on r23i13n19.
What's your verdict on this?
Thanks a lot.
-
@ehsan_moravveji The MPI standard developed a mechanism such that you call
MPI_Init_thread()
with therequired
level of thread level you want/need, and then the MPI implementation initializes the MPI runtime and answers back to you with theprovided
level of thread support they can provide. Theprovided
value can be higher, equal, or lower thanrequired
. If it is lower, it is up to the caller ofMPI_Init_thread()
to handle the lack of enough thread support: the caller could error right away, or just proceed happily (that's what mpi4py does). So, from the point of view of the MPI standard, mpi4py is doing by default just doing the right thing to do. Note also thatimport mpi4py.rc; mpi4py.rc.threads = False
at the very beginning of your script will not initialize MPI with thread support.mpi4py.bench
andmpi4py.run
have options to disable threads if the need arises, run then with--help
.Now, about the Open MPI behavior:
- The script seems to proceed just fine, you just get a warning, despite it saying Error: ....
- The warning is annoying, but maybe Open MPI do have a way to silence it through MCA parameters.
- Ultimately, this is an Open MPI issue, you should file a bug report there and ask them to take action.
- Log in to comment
We cannot know in advance the level of MPI thread support required by user code. Then,
src/python.c
(and thempi4py.MPI
module) request the maximum level to be safe.Multithreading in Python is not pointless, otherwise Python would not have a
threading
module, right?. While it is true that the GIL prevents Python bytecode (or Python C/API code) for running concurrently, once a third-party pure C/C++/Fortran library is let run with the GIL released, concurrent running of thread is indeed possible, as long as the library does not call back into Python.mpi4py itself (in the new
mpi4py.futures
module) is able to make concurrent MPI calls, and that requiresMPI_THREAD_MULTIPLE
support. FYI, if you call let saycomm.Send()
, mpi4py endups releasing the GIL to perform the (blocking)MPI_Send()
call. Look at the example in thedemo/threads
directory, that code would not work if the threads were not able to run concurrently (and that happens thanks to mpi4py releasing the GIL).About the Score-P warning, well, IMHO, you should complain to them. I've put a lot of effort to make mpi4py thread-safe. In the face of ambiguity (what mpi4py users need), I refuse the temptation to guess (and request MPI for the maximum level of thread support).
All that being said, I acknowledge that
src\python.c
should have a user-controlled way (command line? environment variable?) to callMPI_Init()
(or request a different level of thread support) instead. I would happily accept a patch providing such features.