 edited description
Python: implicit conversion to a continuous dataObject when creating a np.array from it
In itom < 4.0, the following code snippet worked:
d = dataObject([40, 100, 100]) # d is not continuous, data is divided into junks in memory
x = np.array(d, copy=False)
However, a crash occurred if d was deleted and x was used afterwards. The reason was, that the Python class itom.dataObject internally created a continuous version of the wrapped C++ dataObject and passes its necessary information to the __reduce__ method, that is called by numpy to execute the shallow copy. The numpy array holds a reference to its dependent itom.dataObject, however to the original one and not to the continuous one.
In order to avoid this, a workaround was implemented, that only allowed creating a numpy array from a dataObject if the dataObject is already continuous. Else,
x = np.array(d.makeContinuous(), copy=False)
has to be called.
This requires however more knowledge of the working principle and a more careful programming. To avoid this, it would be possible,
that one of the __array_struct__, __array_interface__ or __array__ methods of itom.dataObject creates a continuous Python object dataObject (not only the C++ object) if it is not continuous yet and passes this to Numpy, such that the Numpy array keeps this object as base object.
Drawback: The user does not directly get informed, that a couple of deep copies have to be performed in the worst case. Maybe a different way of programming would avoid this, however if the user does not directly know, that a continuous version is created in the background, he will not get sensitized if better solutions might be available. The worst case is as follows:
d = dataObject([40, 100, 100])
x = np.array(d)
# Line 2 will internally create a continous version of d (deep copy 1)
# Then, numpy readys this continuous dataObject via the __reduce__ method
# and creates a 2nd deep copy, since the optional copy argument is not set (default: true).
Nevertheless, I would suggest to implement the improved solution, proposed above, in order to allow a simple handling of dataObjects together with numpy arrays.
Comments (4)

reporter 
reporter It seems that with some Python / Numpy versions, the implicit conversion of an noncontinuous dataObject to a np.array works, with other version is raises a RuntimeError
the dataObject cannot be directly converted into a numpy array since it is not continuous...
.
import numpy as np a = dataObject([4, 100, 100]) b = np.array(a) # runs with Python 3.7.00, Numpy 1.17.1; Runtime Error with Python 3.7.2, Numpy 1.18.1
Task: Check the array interface of Numpy again and verify if there were any changes between Numpy 1.17 and 1.18.

reporter OK, there was a implementation change between Numpy 1.17 and 1.18.
Wenn an array_like object, like a dataObject should be converted to a np.array, the following possibilities are tested in the given order:
 array_like object implements the Python buffer protocol (not the case for dataObject and not desired)
 array_like object implements __array_struct__ and / or __array_interface__ (both attributes). Then a simple view can be done by Numpy
 array_like object provides the __array__() method and returns a np.array
The dataObject provides both steps 2 and 3. However the direct view of step can only be done if the dataObject has a continuous memory management.
If not, NULL was returned there with an Exception (RuntimeError). Numpy <= 1.17 ignored the error and tried step 3. Numpy 1.18 or higher checks
if NULL is returned and goes then to step 3, however if step 2 also sets an error, the conversion fails and the error is presented to the user.Therefore, the methods of step 2 should not set a RuntimeError if they fail due to a noncontinuous dataObject.

reporter  changed status to resolved
Fixes issue
#154. The implicit conversion of a noncontinuous dataObject to a numpy array is working again, without the need to convert the dataObject to a continuous one before. This conversion is done again in the background. This fix is necessary due to a minor implementation change from Numpy 1.18 on. For more information see the closed issue.→ <<cset b7ad4cc6befa>>
 Log in to comment