Python: implicit conversion to a continuous dataObject when creating a np.array from it

In itom < 4.0, the following code snippet worked:

d = dataObject([40, 100, 100])  # d is not continuous, data is divided into junks in memory
x = np.array(d, copy=False)

However, a crash occurred if d was deleted and x was used afterwards. The reason was, that the Python class itom.dataObject internally created a continuous version of the wrapped C++ dataObject and passes its necessary information to the __reduce__ method, that is called by numpy to execute the shallow copy. The numpy array holds a reference to its dependent itom.dataObject, however to the original one and not to the continuous one.

In order to avoid this, a workaround was implemented, that only allowed creating a numpy array from a dataObject if the dataObject is already continuous. Else,

‌

x = np.array(d.makeContinuous(), copy=False)

has to be called.

This requires however more knowledge of the working principle and a more careful programming. To avoid this, it would be possible,
that one of the __array_struct__, __array_interface__ or __array__ methods of itom.dataObject creates a continuous Python object dataObject (not only the C++ object) if it is not continuous yet and passes this to Numpy, such that the Numpy array keeps this object as base object.

Drawback: The user does not directly get informed, that a couple of deep copies have to be performed in the worst case. Maybe a different way of programming would avoid this, however if the user does not directly know, that a continuous version is created in the background, he will not get sensitized if better solutions might be available. The worst case is as follows:

‌

d = dataObject([40, 100, 100])
x = np.array(d)

# Line 2 will internally create a continous version of d (deep copy 1)
# Then, numpy readys this continuous dataObject via the __reduce__ method
# and creates a 2nd deep copy, since the optional copy argument is not set (default: true).

Nevertheless, I would suggest to implement the improved solution, proposed above, in order to allow a simple handling of dataObjects together with numpy arrays.

Comments (4)