Working of numpy vector interface in parallel
Implementation of GenericVector.__setitem__()
is strange in parallel, at least for PETScVector
. Calling x[0] = 42.0
ends by calling non-collective VecSetValues
. This is done unnecessarily by every process and in arbitrary order as demonstrated in the following snippet by random values when one supplies inconsistent values.
from dolfin import *
mesh = UnitSquareMesh(3,3)
V = FunctionSpace(mesh, 'CG', 1)
u = Function(V)
x = u.vector()
for i in range(100):
x[0] = MPI.process_number() # set inconsistent value
if MPI.process_number() == 0:
print x[0] # random results with mpirun -n 3
if MPI.process_number() == 0:
x[0] = 42.0 # deadlocks because of apply("insert")
As __setitem__
calls apply("insert")
it is necessarilly collective operation. But it should then assume that user supplies across-processes-consistent value, then throw-away off-proccess items and call set_local
.
With the present state completely unnecessary interproccess comunication and np-times extra setting of values is done.
Comments (13)
-
reporter -
reporter In the given code snippet one should do
if MPI.process_number() == 0: x[0] = 42.0 else: x.apply("insert") # prevents deadlock
so setting values of
x
locally is possible.But one could keep thinking about useful design of vector numpy interface. See also related issue 56.
-
reporter - changed title to Working of numpy vector interface in parallel
- marked as task
-
It is difficult to establish general rules for accessing data in a GenericVector through the
__getitem__
and__setitem__
interface, which are both pythonic and handling the distributed data seamlessly. One could argue that the above "fix" to the deadlock is ugly, but letting the swig layer or DOLFIN handling it for you might not be attractive as these things easily gets out of hands.I am curious if you, Jan, have some general suggestions to how we should handle the different cases of
__getitem__
and__setitem__
for single items as well as for an index array with several item assignments. -
reporter I have no strong opinion - that's why I said one should keep thinking:) You should also consider that no user but me has complained ever about a situation, right?
I'm slowly getting to think that automatic call of
apply
is not such a poor solution. As numpy should work in a vectorized fashion, I think that we should assume thatx[indices]=...
is used rather than loopyfor i in indices: x[i]=...
. Therefore it is logical to do automaticapply
. User then mustapply
on a process where no assignment takes place to avoid dead-lock. Note that automaticapply
is needed for the mostly used feature - whole vector assignmentsx[:] =
.Maybe it would be good if original C++
set
method is exposed with numpy arrays as arguments so user can do whatever he needs. On the other hand such a custom things can be done with compiled modules. -
You can add me to that list of people who has complained over the situation. I have however, managed to avoid making it a show-stopper.
I think what you suggest makes sense. vectorized assignment is more sensible than a for loop in python. Calling apply after such assignment could be handled by the wrapper.
Handling
x[:] = value
now just works if value is a scalar as it is delegated tox._assign(value)
. However ifvalue
is a ndarray with local data, it should be delegated to x.set_local(value). Then it should be safe to let the library callx.apply
for you.Handling
x[ind] = value
is more tricky. There are several cases to consider:ind
= int ndarray,value
= ndarrayind
= int ndarray,value
= scalarind
= scalar,value
= scalar
These can be handled generically by letting 2. and 3. be translated in the python layer into 1. We could then implement a generic wrapper interface in C++, which call
x.apply
for all processes. The C++ wrapper function should then only expect local values forind
, which could be ensured by looping over all passedind
and check if the values are withinlocal_range
and issue an error if not. Additionally shouldind
andvalue
have the same local size. If no values are supposed to be assigned at the local process an empty array is expected for bothind
andvalue
.In addition to the above mentioned cases there are cases where
ind
could be boolean arrays, and pythonSlices
. However, I think we should be able to translate all of these cases into 1. above in the Python layer. -
reporter This sounds as a reasonable blueprint;-)
-
Yeah, it will also get rid of all the nice but hard to maintain wrapper code in both la_get_set_items.i and Indices.i.
-
reporter I don't get it - you drop old wrapper code to write new wrapper code - this is not nice by itself, is it?
I also remind that comparison operators and indexing by bool arrays probably need to be redesigned. Currently these arrays are of global size, filled with Falses on off-process items. This is not optimal although it somehow works - see
test/unit/la/python/test.py
. -
By itself it is not nicer, but if the result is a simpler and easier maintainable interface I would say it is better. Speed wise it might be less efficient, but comparable with similar
NumPy
operations. I propose that most of the logic of the wrapper should now be in thePython/NumPy
layer. We could also alter the generation of the boolean array so it is pure local, fixing the problem you discovered. -
-
- changed status to resolved
Fixed in: 4e42d304ea6178eed2f93d12fe50cd006a491423
-
- removed milestone
Removing milestone: 1.5 (automated comment)
- Log in to comment
We could also prefer to
__setitem__
be non-collective. As a consequenceapply("insert")
would have to be called by user after__setitem__
.