scalar types should not be iterable.

Issue #10 resolved
Yichao Yu created an issue

The following code should fail but actually print 6 empty list on pypy.

import numpy as np
print(list(np.int8(17)))
print(list(np.int16(17)))
print(list(np.int32(17)))
print(list(np.int64(17)))
print(list(np.float32(17)))
print(list(np.float64(17)))

Comments (11)

  1. Yichao Yu reporter

    Actually, after looking at the np.generic class, I don't really understand why it is not iterable in cpython since it has the __getitem__ method. Is there some tricks in cpython (c level?) to make a object with __getitem__ not iterable?

  2. lilydjwg

    @yuyichao Quoting from the doc: "object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0)." As these types raise IndexErrors with index 0, they are not iterables.

  3. Yichao Yu reporter

    @lilydjwg raising IndexError with index 0 does not make them not-iterable. The following code runs fine on all python versions I can find (cpython/pypy, 2/3)

    class A:
        def __getitem__(self, key):
            raise IndexError
    
    list(A())
    
  4. Yichao Yu reporter

    P.S. str object in python2 does not have __iter__ method but a 0-length str is still iterable.

  5. lilydjwg

    Oops, I was wrong.

    In PyObject_GetIter, if __iter__ is not defined, it calls PySequence_Check, which then checks the .tp_as_sequence field of the type. This is NULL for numpy.generic (it has the .tp_as_mapping field to provide __getitem__).

  6. Yichao Yu reporter

    So it is indeed a feature of the cpython c-api The best workarround I can think of so far is to add the following method to numpy.generic

        @classmethod
        def __iter__(cls):
            raise TypeError("'%s.%s' object is not iterable" %
                            (cls.__module__, cls.__name__))
    

    However, this will make isinstance(numpy.int32(1), collections.Iterable) True......

  7. lilydjwg

    In Python code, when __getitem__ is defined, when the class is instanticated, it calls type_call in Objects/typeobject.c. It assigns the address of as_sequence of a PyHeapTypeObject to the class's tp_as_sequence field. The PySequenceMethods struct it points to is initially all zeros, so tp_as_sequence->sq_item is NULL. Then, in update_one_slot called from fixup_slot_dispatchers called from type_new as the type's tp_new field called from type_call, it checks if __getitem__ is defined. If that is true, it assigns the slot_sq_item function to tp_as_sequence->sq_item, to make PySequence_Check return True.

  8. Armin Rigo

    There is no way in Python, according to the language spec, to have an object with __getitem__ which is not iterable. I suppose that numpy implements that by obscure hacking at the C level. Yichao's is the only workaround. I don't think that collections.Iterable is a big blocker...

  9. Yichao Yu reporter

    LOL.

    I guess being collections.Iterable is indeed fine since it is already confusing enough and an object with __getitem__ is technically "iterable" anyway....

  10. Log in to comment