Working of numpy vector interface in parallel

Issue #54 resolved
Jan Blechta created an issue

Implementation of GenericVector.__setitem__() is strange in parallel, at least for PETScVector. Calling x[0] = 42.0 ends by calling non-collective VecSetValues. This is done unnecessarily by every process and in arbitrary order as demonstrated in the following snippet by random values when one supplies inconsistent values.

from dolfin import *
mesh = UnitSquareMesh(3,3)
V = FunctionSpace(mesh, 'CG', 1)
u = Function(V)
x = u.vector()

for i in range(100):
  x[0] = MPI.process_number() # set inconsistent value
  if MPI.process_number() == 0:
    print x[0] # random results with mpirun -n 3

if MPI.process_number() == 0:
  x[0] = 42.0 # deadlocks because of apply("insert")

As __setitem__ calls apply("insert") it is necessarilly collective operation. But it should then assume that user supplies across-processes-consistent value, then throw-away off-proccess items and call set_local.

With the present state completely unnecessary interproccess comunication and np-times extra setting of values is done.

Comments (13)

  1. Jan Blechta reporter

    We could also prefer to __setitem__ be non-collective. As a consequence apply("insert") would have to be called by user after __setitem__.

  2. Jan Blechta reporter

    In the given code snippet one should do

    if MPI.process_number() == 0:
      x[0] = 42.0
    else:
      x.apply("insert") # prevents deadlock
    

    so setting values of x locally is possible.

    But one could keep thinking about useful design of vector numpy interface. See also related issue 56.

  3. Johan Hake

    It is difficult to establish general rules for accessing data in a GenericVector through the __getitem__ and __setitem__ interface, which are both pythonic and handling the distributed data seamlessly. One could argue that the above "fix" to the deadlock is ugly, but letting the swig layer or DOLFIN handling it for you might not be attractive as these things easily gets out of hands.

    I am curious if you, Jan, have some general suggestions to how we should handle the different cases of __getitem__ and __setitem__ for single items as well as for an index array with several item assignments.

  4. Jan Blechta reporter

    I have no strong opinion - that's why I said one should keep thinking:) You should also consider that no user but me has complained ever about a situation, right?

    I'm slowly getting to think that automatic call of apply is not such a poor solution. As numpy should work in a vectorized fashion, I think that we should assume that x[indices]=... is used rather than loopy for i in indices: x[i]=.... Therefore it is logical to do automatic apply. User then must apply on a process where no assignment takes place to avoid dead-lock. Note that automatic apply is needed for the mostly used feature - whole vector assignments x[:] =.

    Maybe it would be good if original C++ set method is exposed with numpy arrays as arguments so user can do whatever he needs. On the other hand such a custom things can be done with compiled modules.

  5. Johan Hake

    You can add me to that list of people who has complained over the situation. I have however, managed to avoid making it a show-stopper.

    I think what you suggest makes sense. vectorized assignment is more sensible than a for loop in python. Calling apply after such assignment could be handled by the wrapper.

    Handling x[:] = value now just works if value is a scalar as it is delegated to x._assign(value). However if value is a ndarray with local data, it should be delegated to x.set_local(value). Then it should be safe to let the library call x.apply for you.

    Handling x[ind] = value is more tricky. There are several cases to consider:

    1. ind = int ndarray, value = ndarray
    2. ind = int ndarray, value = scalar
    3. ind = scalar, value = scalar

    These can be handled generically by letting 2. and 3. be translated in the python layer into 1. We could then implement a generic wrapper interface in C++, which call x.apply for all processes. The C++ wrapper function should then only expect local values for ind, which could be ensured by looping over all passed ind and check if the values are within local_range and issue an error if not. Additionally should ind and value have the same local size. If no values are supposed to be assigned at the local process an empty array is expected for both ind and value.

    In addition to the above mentioned cases there are cases where ind could be boolean arrays, and python Slices. However, I think we should be able to translate all of these cases into 1. above in the Python layer.

  6. Johan Hake

    Yeah, it will also get rid of all the nice but hard to maintain wrapper code in both la_get_set_items.i and Indices.i.

  7. Jan Blechta reporter

    I don't get it - you drop old wrapper code to write new wrapper code - this is not nice by itself, is it?

    I also remind that comparison operators and indexing by bool arrays probably need to be redesigned. Currently these arrays are of global size, filled with Falses on off-process items. This is not optimal although it somehow works - see test/unit/la/python/test.py.

  8. Johan Hake

    By itself it is not nicer, but if the result is a simpler and easier maintainable interface I would say it is better. Speed wise it might be less efficient, but comparable with similar NumPy operations. I propose that most of the logic of the wrapper should now be in the Python/NumPy layer. We could also alter the generation of the boolean array so it is pure local, fixing the problem you discovered.

  9. Log in to comment