Grid/Array naming

Issue #5 resolved
Dan Bonachea created an issue

Ch 8: "For a non-reference type T, the type “N-dimensional grid with element type T” is denoted: ndarray<T, N> where N is a positive compile-time int constant."

Currently the normative text of the spec uses the term "grid" almost everywhere, the chapter 8 headings use the term "Array", but the type name is defined to be "ndarray<>".

Is there a strong motivation for this inconsistency in terminology? Why isn't the type name "grid<>"? Why do Chapter 3 and Chapter 8 titles both use the term "Array" and yet discuss two completely different datatypes?

As a first-time "user" of UPC++ this naming inconsistency for core features seems like an invitation to future confusion and ambiguity. I would strongly recommend we choose one term for each datatype and apply it universally (in text, chapter titles and type names).

Comments (7)

  1. Dan Bonachea reporter

    On a related note, the name "shared_array" (for the datatype described in Ch 3) seems like a bad choice, since it seems to imply that other array data structures cannot shared (and it invites confusion/ambiguity when discussing other arrays that live in shared space).

    Perhaps a better term would be "distributed_array", since parallel distribution seems to be their defining characteristic?

  2. Yili Zheng

    I agree with Dan that "distributed_array" is more appropriate than "shared_array". Originally I used shared_array to mimic UPC shared arrays but I've no problem changing the name, especially before the first version of spec.

    Additionally, inspired by Bill Carlson, I think it would be very nice to use a single "shared<T>" template for everything lives in the global address space. But I'm not sure if this is implementable and how complicated the semantics would be. We would need a bit more design and a proof-of-concept prototype before committing to this direction.

  3. Cy Chan

    The distinctions between shared_arrays and the ndarrays described here needs to be clearer. The ndarrays are never distributed, while the shared_arrays are almost always distributed, correct? Having a single shared data type that can handle both cases would be great, but agree with Yili that the semantics might become complicated. Perhaps a section near the beginning highlighting the differences between various provided array/domain/grid types and how they are related would be helpful.

  4. Amir Kamil

    The term "grid" is carried over in the spec from Titanium. The typename "ndarray" is inspired by NumPy. I agree that this is inconsistent. Do you prefer one over the other?

    As for "shared_array", Yili at one point suggested combining shared arrays and shared variables in a single type, which hopefully would be simpler and more doable than a generic "shared<T>" template. If we can unify them, then maybe we can avoid the "shared_array" terminology.

    I think Dan makes a good point about distinguishing between an entity that lives in the global address space and an entity that has a specific distribution across UPC++ ranks. If I remember correctly, that's we we chose the term "global pointer" rather than "shared pointer" in UPC++ for the former. So I agree that we should make it clearer that "grids" (or whatever we decide to call them) are in the global address space but are not distributed across ranks.

  5. Log in to comment