Python: implicit conversion to a continuous dataObject when creating a np.array from it

Issue #154 resolved
M. Gronle created an issue

In itom < 4.0, the following code snippet worked:

d = dataObject([40, 100, 100])  # d is not continuous, data is divided into junks in memory
x = np.array(d, copy=False)

However, a crash occurred if d was deleted and x was used afterwards. The reason was, that the Python class itom.dataObject internally created a continuous version of the wrapped C++ dataObject and passes its necessary information to the __reduce__ method, that is called by numpy to execute the shallow copy. The numpy array holds a reference to its dependent itom.dataObject, however to the original one and not to the continuous one.

In order to avoid this, a workaround was implemented, that only allowed creating a numpy array from a dataObject if the dataObject is already continuous. Else,

x = np.array(d.makeContinuous(), copy=False)

has to be called.

This requires however more knowledge of the working principle and a more careful programming. To avoid this, it would be possible,
that one of the __array_struct__, __array_interface__ or __array__ methods of itom.dataObject creates a continuous Python object dataObject (not only the C++ object) if it is not continuous yet and passes this to Numpy, such that the Numpy array keeps this object as base object.

Drawback: The user does not directly get informed, that a couple of deep copies have to be performed in the worst case. Maybe a different way of programming would avoid this, however if the user does not directly know, that a continuous version is created in the background, he will not get sensitized if better solutions might be available. The worst case is as follows:

d = dataObject([40, 100, 100])
x = np.array(d)

# Line 2 will internally create a continous version of d (deep copy 1)
# Then, numpy readys this continuous dataObject via the __reduce__ method
# and creates a 2nd deep copy, since the optional copy argument is not set (default: true).

Nevertheless, I would suggest to implement the improved solution, proposed above, in order to allow a simple handling of dataObjects together with numpy arrays.

Comments (4)

  1. M. Gronle reporter

    It seems that with some Python / Numpy versions, the implicit conversion of an non-continuous dataObject to a np.array works, with other version is raises a RuntimeError the dataObject cannot be directly converted into a numpy array since it is not continuous....

    import numpy as np
    a = dataObject([4, 100, 100])
    b = np.array(a)  # runs with Python 3.7.00, Numpy 1.17.1; Runtime Error with Python 3.7.2, Numpy 1.18.1
    

    Task: Check the array interface of Numpy again and verify if there were any changes between Numpy 1.17 and 1.18.

  2. M. Gronle reporter

    OK, there was a implementation change between Numpy 1.17 and 1.18.

    Wenn an array_like object, like a dataObject should be converted to a np.array, the following possibilities are tested in the given order:

    1. array_like object implements the Python buffer protocol (not the case for dataObject and not desired)
    2. array_like object implements __array_struct__ and / or __array_interface__ (both attributes). Then a simple view can be done by Numpy
    3. array_like object provides the __array__() method and returns a np.array

    The dataObject provides both steps 2 and 3. However the direct view of step can only be done if the dataObject has a continuous memory management.

    If not, NULL was returned there with an Exception (RuntimeError). Numpy <= 1.17 ignored the error and tried step 3. Numpy 1.18 or higher checks
    if NULL is returned and goes then to step 3, however if step 2 also sets an error, the conversion fails and the error is presented to the user.

    Therefore, the methods of step 2 should not set a RuntimeError if they fail due to a non-continuous dataObject.

  3. M. Gronle reporter

    Fixes issue #154. The implicit conversion of a non-continuous dataObject to a numpy array is working again, without the need to convert the dataObject to a continuous one before. This conversion is done again in the background. This fix is necessary due to a minor implementation change from Numpy 1.18 on. For more information see the closed issue.

    → <<cset b7ad4cc6befa>>

  4. Log in to comment