Library: Shared-pointer privatizability (castability) functions

Issue #37 new
Former user created an issue

Originally reported on Google Code with ID 37

This issue is responsive to open language issues item 62.

62. Shared-pointer privatizability (castability) functions.

The proposal and related discussion are included here by reference.
https://sites.google.com/a/lbl.gov/upc-proposals/privatizability-castability-functions

The gist of the proposal is summarized here.

Privatizability (castability) functions

A UPC implementation may map the shared space of other threads into the local address
space of the current thread.  To provide the programmer the ability to take advantage
of this local mapping in an implementation-independent manner, I propose the additon
of the functions described below.  I will use the term "castable" in reference to a
shared address to indicate that it can be converted to a usable local address.

void *upc_cast(shared void *ptr);

    Converts a pointer-to-shared to a pointer-to-local if possible.  If not possible,
a null pointer is returned.  (Alternative: undefined behavior if conversion is not
possible.)  Pointers to data with affinity to the current thread always succeed.  Null
pointers-to-shared yield null pointers-to-local.

int upc_castable(shared void *ptr);

    Returns TRUE if the pointer can be converted to a pointer-to-local, FALSE otherwise.
 Null pointers and pointers to data with affinity to the current thread return TRUE.

int upc_thread_castable(unsigned int threadnum);

    Returns TRUE if all of the specified thread's shared data may be referenced via
local addresses, FALSE otherwise.  In the event that some but not all of the thread's
data may be so referenced, returns FALSE.  If threadnum == MYTHREAD, returns TRUE.
 This function allows for one castability check prior to a large number of references.

At present the main point of discussion concerns the stated need by a compiler vendor
for the addition of a size argument that would delimit the range/extent of memory that
may be accessed after a successful call to upc_cast.

Rather than repeat the discussion thread here, participants are encouraged to discuss
the topic on the proposal page and to then post all relevant conclusions and recommendations
as comments attached to this issue.

Reported by gary.funck on 2012-05-22 05:56:40

Comments (46)

  1. Former user Account Deleted

    Reported by `phhargrove@lbl.gov` on 2012-06-01 04:04:21 - Labels added: Spec-1.3

  2. Former user Account Deleted

    Reported by `phhargrove@lbl.gov` on 2012-06-01 06:06:25 - Labels added: Milestone-Spec-1.3 - Labels removed: Spec-1.3

  3. Former user Account Deleted

    ``` (Updated 'Type' to "Enhancement") ```

    Reported by `nspark.work` on 2012-06-18 21:19:33 - Labels added: Type-Enhancement - Labels removed: Type-Defect

  4. Former user Account Deleted

    ``` Previously, I stated: "Rather than repeat the discussion thread here, participants are encouraged to discuss the topic on the proposal page and to then post all relevant conclusions and recommendations as comments attached to this issue."

    Based on our recent decision to move all spec. related discussions to those associated with particular issue tickets, I would like to retract my recommendation above and request that all discussion regarding the castability functions be logged here. ```

    Reported by `gary.funck` on 2012-06-28 22:33:51

  5. Former user Account Deleted

    ``` This enhancement is currently tagged for inclusion into the 1.3 specification.

    Per earlier discussions regarding the process for the introduction of proposed enhancements into the specification, the owner of the issue is responsible for leading discussions, documenting conclusions. If it concluded that the new functions will be included in the specification, the owner is also responsible for ensuring that the new functions are described in LaTeX in form suitable for inclusion into the UPC specification, or related library specifications.

    Since Brian initially propsed the castability functions, I am re-assigning ownerwhip to Brian.

    NOTE: please add the proposed library description in a unique directory under "lib/proposed" as time permits.

    ```

    Reported by `gary.funck` on 2012-06-28 22:44:19

  6. Former user Account Deleted

    Reported by `gary.funck` on 2012-07-03 18:07:50 - Labels added: Type-Lib-Required - Labels removed: Type-Enhancement

  7. Former user Account Deleted

    ``` A draft version of a specification for these functions has been added.

    Based on comments in the proposal discussion, I added this language for upc_cast:

    "The conversion will be considered possible only if the returned pointer-to-local may be used to read and write the entire contiguous portion of a shared object (which may be a portion of a larger shared object) beginning at the specified shared address. If only part of that region may be referenced, an undefined value is returned."

    I also clarified that the returned pointer is only valid in the calling thread.

    Based on discussion on the proposal page, I added this text for upc_castable:

    "The upc_castable function places no constraints on the order or number of subsequent calls to upc_cast, nor on their arguments. Calls to upc_cast are not required to be preceded by calls to upc_castable."

    I went with "castability" as the name for the section, given the names of the functions in the proposal. ```

    Reported by `brian.wibecan` on 2012-07-16 13:35:22

  8. Former user Account Deleted

    ``` I'd like to revisit the design choice to provide a separate upc_castable query function, rather than simply augment upc_cast to return NULL when the requested cast is not possible. Brian's original proposal argues this might be leveraged to provide a performance boost, but I can't think of a motivating example.

    It seems like the entire purpose of this library is to amortize pointer conversion overhead prior to a block of computation using local pointers, so both the upc_cast and upc_castable should appear side-by-side at the top of said block. Because we require upc_cast to succeed for an entire object at a time, only one call to each is required per object - so I don't see the savings of separating the calls.

    Additionally, separating the calls creates a need for additional semantics not reflected in the current document. Specifically, we'd need to guarantee the result of upc_castable is constant for the lifetime a given shared object, and doesn't change over time (eg in the time between the call to upc_castable and upc_cast).

    Proposed change: 1. Remove upc_castable 2. Specify that upc_cast returns NULL if the conversion cannot be performed (instead of undefined as currently written).

    Totally unrelated, I'd like to slightly strengthen the semantics of upc_cast. Current language (emphasis added):

    "The conversion will be considered possible only if the returned pointer-to-local may be used to read and write the entire contiguous portion of a shared object (which may be a portion of a larger shared object) BEGINNING AT the specified shared address. If only part of that region may be referenced, an undefined value is returned."

    I believe "beginning at" should be changed to "containing". Specifically, if the user casts a pointer to a shared array element which falls in the middle of a thread's "slice", then it should be legal for accesses via pointer-arithmetic to reference the elements which appear after OR before the selected sub-object. I think this was the intended semantics? We've already argued that successfully casting any pointer to a sub-object automatically nails down the possible local addresses for the entire, maximal containing shared object, so the guarantee should apply in both directions.

    ```

    Reported by `danbonachea` on 2012-08-03 14:10:45

  9. Former user Account Deleted

    ``` One more miscellaneous nitpick: the current proposal needs to define a library feature macro, eg: UPC_CASTABLE = 1 ```

    Reported by `danbonachea` on 2012-08-03 14:11:48

  10. Former user Account Deleted

    ``` I think Dan's suggested changes sound reasonable. ```

    Reported by `brian.wibecan` on 2012-08-03 14:50:18

  11. Former user Account Deleted

    ``` Dan asks why we need/want a distinct upc_castable() query.

    In the Berkeley UPC implementation of the currently proposed interfaces, the upc_cast() call performs arithmetic "blindly" without bounds checking, and therefore returns a "wrong" pointer rather than NULL in the case that the upc_castable() query would have returned NULL. This removes (at least) 1 branch from the implementation. This tiny performance "boost" is the ONLY benefit I can see from specifying a distinct upc_castable() call (and an undefined result from upc_cast() in place of a NULL return for the not-castable case).

    On the balance, I am "neutral" on Dan's proposed change. I assume the upc_thread_castable() interface will remain, right? ```

    Reported by `phhargrove@lbl.gov` on 2012-08-03 18:13:36

  12. Former user Account Deleted

    ``` "I assume the upc_thread_castable() interface will remain, right?"

    Yes - upc_thread_castable() provides a more general piece of information which I can see as being potentially valuable to some applications, and potentially very beneficial to performance on hybrid architectures.

    I acknowledge that the existence of upc_castable() may allow the removal of a few instructions from the upc_cast() operation, but the branch you mention has just been relocated into the upc_castable() operation (or possibly upc_thread_castable). So if we expect these to usually be called together, and in any case called infrequently, then the small savings doesn't seem worth the additional complexity. The user can presumably get a larger savings by caching the result of upc_cast() in a variable and hoisting the call to an outer preamble, to ensure the function is only ever called once per object/thread combo - thereby removing the cost of the entire call from the critical path (which is probably also a Good Idea as far as the serial optimizer is concerned).

    ```

    Reported by `danbonachea` on 2012-08-03 20:10:44

  13. Former user Account Deleted

    ``` I would like to get some more comments on this.

    Current text refers to upc_castable.h. In the interest of possibly making the suggested change now or in the future, I am suggesting changing the header name to upc_cast.h.

    If we go forward with Dan's suggested change:

    1. The first paragraph for upc_cast will change from: The upc_cast function converts the specified pointer-to-shared to a valid pointer-to-local. If such a conversion is not possible, an undefined value is returned. to: The upc_cast function converts the specified pointer-to-shared to a valid pointer-to-local. If such a conversion is not possible, NULL is returned. 2. The second paragraph will be revised as suggested by Dan in comment 9. 3. upc_castable will be removed.

    The performance savings in HP UPC for not performing the check in upc_cast are on the same basis as for Berkeley UPC, thus negligible, and if Dan's expectations regarding the pattern of use of these functions is correct, irrelevant. I like the simpler interface presented by Dan's suggestion. ```

    Reported by `brian.wibecan` on 2012-08-16 16:16:53

  14. Former user Account Deleted

    ``` If upc_cast() is now guaranteed to return NULL if the conversion is not possible, what is the point of upc_thread_castable()? Are NULL pointer checks really expensive enough to warrant this separate routine, given that its presence will encourage people to skip trying upc_cast() if it returns false? There are potentially implementations out there that can't provide local pointers to ALL of another thread's shared memory and thus have to return false to upc_thread_castable(), but do cover most common shared memory regions. ```

    Reported by `sdvormwa@cray.com` on 2012-08-16 16:30:41

  15. Former user Account Deleted

    ``` Steven wrote (in part):

    Are NULL pointer checks really expensive enough to warrant this separate routine, given that its presence will encourage people to skip trying upc_cast() if it returns false?

    I can't offer specific examples off the top of my head, but I can imagine that one might use a different algorithm entirely when you could be certain ALL of a given thread's memory was accessible -- perhaps some algorithm in which getting part way through a linked data-structure (known to have affinity to a specific thread) and finding a non-castable pointer would mean discarding a lot of work and starting over. Again, I don't have ACTUAL examples of such an algorithm, but this is how I imaging upc_thread_castable() might be used. ```

    Reported by `phhargrove@lbl.gov` on 2012-08-16 16:38:24

  16. Former user Account Deleted

    ``` The intent of upc_thread_castable is that an algorithm choice might be different if all of another thread's shared data are accessible locally. This decision could be made with only upc_cast if all relevant data structures are checked, but that might be overly complex. Consider, for example, a library routine with several possible implementations, accessed via pointer-to-function that is instantiated after the first call to choose the best algorithm for the current program run. upc_thread_castable might allow a different choice of routine to be made in this first call.

    "There are potentially implementations out there that can't provide local pointers to ALL of another thread's shared memory and thus have to return false to upc_thread_castable(), but do cover most common shared memory regions."

    Certainly. There are also implementations that can return true for upc_thread_castable, thus providing information to the program that might be useful.

    Perhaps there are additional simple cases that might be useful? Or perhaps there is a better way of summarizing for the programmer the breadth of what is guaranteed castable in the current thread? ```

    Reported by `brian.wibecan` on 2012-08-16 16:49:20

  17. Former user Account Deleted

    ``` "Certainly. There are also implementations that can return true for upc_thread_castable, thus providing information to the program that might be useful."

    Useful certainly--but also potentially misleading for the reasons I indicated in comment 15.

    I'm also slightly concerned that programmers will incorrectly assume that if upc_thread_castable() return false, upc_cast() will ALWAYS fail for memory with affinity to a given thread. The name upc_thread_castable(), to me at least, seems to imply that it is used to query whether or not upc_cast() is permitted/possible to use upc_cast() on memory with affinity to a given thread, which is quite different than the proposed usage. This incorrect assumption would be encouraged by seeing code that avoids using upc_cast() if upc_thread_castable() returns false.

    "Perhaps there are additional simple cases that might be useful? Or perhaps there is a better way of summarizing for the programmer the breadth of what is guaranteed castable in the current thread?"

    That would be nice. I can easily see implementations for which statically allocated shared memory and memory allocated with upc_all_alloc()/upc_global_alloc() is guaranteed to return non-NULL, while memory allocated with upc_alloc() (or the deprecated upc_local_alloc()) might return NULL in some circumstances. I'm not sure how we could neatly summarize this information though. ```

    Reported by `sdvormwa@cray.com` on 2012-08-16 17:18:33

  18. Former user Account Deleted

    ``` "...that it is used to query whether or not upc_cast() is permitted/possible to use upc_cast() on memory with affinity to a given thread,..."

    Oops, this should read, "...that it is used to query whether or not it is permitted/possible to use upc_cast() on memory with affinity to a given thread,...". ```

    Reported by `sdvormwa@cray.com` on 2012-08-16 17:23:39

  19. Former user Account Deleted

    ``` "I'm also slightly concerned that programmers will incorrectly assume that if upc_thread_castable() return false, upc_cast() will ALWAYS fail for memory with affinity to a given thread. The name upc_thread_castable(), to me at least, seems to imply that it is used to query whether or not upc_cast() is permitted/possible to use upc_cast() on memory with affinity to a given thread, which is quite different than the proposed usage. This incorrect assumption would be encouraged by seeing code that avoids using upc_cast() if upc_thread_castable() returns false."

    I dislike arguments of the form "this feature is a bad idea, because people who don't read the specification may misuse the feature". This vacuous argument can be made against any language feature. Programmers who don't read and understand the semantics for a function they blindly use get what they deserve. This should not be a motivation for omitting a feature.

    That being said, we could possibly choose a better function name which is more suggestive of its semantics and less conducive to incorrect interpretation by "lazy programmers". Would you be happier if the function was instead named:

    int upc_entiresharedmemory_castable(threadid)

    or perhaps:

    int upc_joint_affinity(threadid)

    or even:

    int upc_RTFM_castable(threadid) <joke>

    ```

    Reported by `danbonachea` on 2012-08-16 22:42:45

  20. Former user Account Deleted

    ``` "I dislike arguments of the form "this feature is a bad idea, because people who don't read the specification may misuse the feature". This vacuous argument can be made against any language feature. Programmers who don't read and understand the semantics for a function they blindly use get what they deserve. This should not be a motivation for omitting a feature."

    Please re-read my comment. I'm not saying this feature is a bad idea. On the contrary, I'm very much in favor of it. I just wanted to point out that to me the name seemed to intuitively imply different behavior than what is proposed, and we might therefore consider changing it to avoid confusing users if others also find the behavior unintuitive given the name. ```

    Reported by `sdvormwa@cray.com` on 2012-08-17 03:00:38

  21. Former user Account Deleted

    ``` "I'm very much in favor of it. I just wanted to point out that to me the name seemed to intuitively imply different behavior than what is proposed, and we might therefore consider changing it"

    It sounded like you were arguing to omit the upc_thread_castable function entirely on the basis that its existence might mislead users who don't read the manual, but perhaps I misread.

    In that case, by all means please suggest a new name for upc_thread_castable that you prefer..

    Here's another possible name I like:

    int upc_thread_fully_castable(threadid)

    ```

    Reported by `danbonachea` on 2012-08-17 03:44:55

  22. Former user Account Deleted

    ``` To clarify my position on this:

    1. upc_thread_castable()'s proposed behavior is unintuitive to me, so I'm worried it will be confusing to others. Dan's idea of upc_thread_fully_castable() seems much better in that regard. This is a very minor point in my mind though, so I won't contest it if other people don't think it will be a problem.

    2. Finer granularity for checking the castability of a remote thread's shared data is desirable to avoid programmers skipping trying upc_cast() merely because upc_thread_castable() (or whatever we end up calling it) returned false. I agree this call gives useful information, but I can easily see implementations that can provide local pointers in almost all cases, but since it is not ALL cases, must return false, and then users end up missing out on performance. This finer granularity would be in addition to the the full-out "all memory can be cast to local" that upc_thread_castable() currently details. Perhaps this could be provided in a future update after the base stuff goes in? ```

    Reported by `sdvormwa@cray.com` on 2012-08-17 16:20:48

  23. Former user Account Deleted

    ``` All "brand new" library proposals are targeted for starting in the "Optional" library document. Promotion to the "Required" document comes later after at least 6 months residence in the ratified Optional document, and other conditions described in the Appendix A spec process. ```

    Reported by `danbonachea` on 2012-08-17 17:53:59 - Labels added: Type-Lib-Opt - Labels removed: Type-Lib-Required

  24. Former user Account Deleted

    ``` Set default Consensus to "Low". ```

    Reported by `gary.funck` on 2012-08-19 23:26:19 - Labels added: Consensus-Low

  25. Former user Account Deleted

    Reported by `gary.funck` on 2012-08-19 23:46:24 - Labels added: Consensus-Medium - Labels removed: Consensus-Low

  26. Former user Account Deleted

    ``` A suggestion: create

    int upc_castable(threadid)

    and have it return a coded value: NONE = 0 = None of that thread's shared memory can be cast to local; ALL = All of that thread's shared memory can be cast to local; SOME = Some of that thread's shared memory can be cast to local, but not necessarily a well-defined segment, check each case; Coded bits for well-defined segments, any or all of which can be set (OR them together): ALL_ALLOC, GLOBAL_ALLOC, ALLOC = Memory from those heaps can be cast to local; STATIC = Non-heap shared memory can be cast to local.

    Coded values would be macros defined with prefix UPC_CASTABLE_. The well-defined segments would have prefix UPC_CASTABLE_SEGMENT_.

    If the segments are complete, ALL could be set to the value ALL_ALLOC | GLOBAL_ALLOC | ALLOC | STATIC

    ```

    Reported by `brian.wibecan` on 2012-08-20 19:02:41

  27. Former user Account Deleted

    ``` I like Brian's proposal in comment 27. It seems to capture most of the cases we want to expose, although if I understand correctly there is no "SOME" value defined, just an OR-ing together of one or more of: UPC_CASTABLE_ + ALL_ALLOC, GLOBAL_ALLOC, ALLOC, STATIC

    The only case this doesn't seem to describe is Gary's, where castability is selectively available on an object-by-object basis, so some objects in each category MAY be castable, but no category guarantees that all objects are castable. It also doesn't capture 32-bit systems where upc_cast may work for awhile and eventually fail once the caller's virtual address space is near exhaustion. Would be nice to somehow distinguish these cases from the "it's a remote node with guaranteed failure" case.

    One possible solution would be to state that upc_castable is a "tightest-possible-upper-bound" on what can be castable from a given thread, but not *guarantee* success of castability for any given object. This gives applications information about what objects are definitely NOT castable from the given thread (eg because it is physically remote), and what objects are PROBABLY castable from that thread. upc_cast would still be permitted to fail on an object-by-object basis (eg due to VM exhaustion), so robust applications should still include a check for the NULL return, but algorithmic design could still rely on the upc_thread_castable result and simply abort in the rare(?) case that a cast fails.

    ```

    Reported by `danbonachea` on 2012-08-20 23:02:42

  28. Former user Account Deleted

    ``` Dan wrote (in part):

    I like Brian's proposal in comment 27. It seems to capture most of the cases we want to expose, although if I understand correctly there is no "SOME" value defined, just an OR-ing together of one or more of: UPC_CASTABLE_ + ALL_ALLOC, GLOBAL_ALLOC, ALLOC, STATIC

    The only case this doesn't seem to describe is Gary's, where castability is selectively available on an object-by-object basis, [...]

    I actually read the proposal (which I do like) slightly differently than Dan. I read it as saying there *is* a "SOME" for cases exactly like Gary's in which there is no broad categorization possible which would be sufficiently accurate to be helpful. That was my interpretation based on the phrase "but not necessarily a well-defined segment".

    Whether Dan or I interpreted comment #27 as intended, I'd like to suggest/request that this more-that-none-but-not-easily-characterized case be distinguished, and be named "CHECK" (rather than "SOME"). ```

    Reported by `phhargrove@lbl.gov` on 2012-08-20 23:14:20

  29. Former user Account Deleted

    ``` Paul is correct that I intended there to be a specific SOME value, for cases where an object-by-object check was the only reliable way to make this determination.

    Regarding the name, CHECK is fine with me; it's what I had originally, before changing it to SOME.

    I intended this suggestion to *replace* upc_thread_castable. Since there was no emerging consensus on what upc_thread_castable should look like, given the variety of possible approaches to memory sharing, I concocted this function to provide more details. Roughly, replace:

    if (upc_thread_castable(n))

    with

    if (upc_castable(n) == UPC_CASTABLE_ALL)

    I'm not sure I see the value in indicating what is *probably* castable, but if that's useful, then perhaps the CHECK concept needs to be added as a bit flag:

    ALL = All of this thread's shared memory, guaranteed; ALL+CHECK = All of this thread's shared memory, probably, but check individual cases; CHECK = No indication of "probably", don't bank on it, check individual cases.

    And so on with the individual segment flags. Note that the replacement code example above still works as intended.

    With this modification, I see no need for the SEGMENT portion of the prefix, since everything works well as a set of flags. I do think it appropriate to reserve the value 0 for the NONE case. We thus have the following constants defined:

    UPC_CASTABLE_NONE = 1 UPC_CASTABLE_CHECK UPC_CASTABLE_ALL_ALLOC UPC_CASTABLE_GLOBAL_ALLOC UPC_CASTABLE_ALLOC UPC_CASTABLE_STATIC UPC_CASTABLE_ALL = (UPC_CASTABLE_ALL_ALLOC | UPC_CASTABLE_GLOBAL_ALLOC

    UPC_CASTABLE_ALLOCUPC_CASTABLE_STATIC)

    I think it reasonable to require UPC_CASTABLE_ALL to be defined this way. That way code can check, for instance, if static variables are shared without having to check both for ALL and for STATIC.

    All of these values can be ORed together as appropriate.

    Is STATIC an appropriate name? ```

    Reported by `brian.wibecan` on 2012-08-21 13:54:58

  30. Former user Account Deleted

    ``` I like the gist of Brian's emerging proposal, but I need a more precise semantic definition of what the flags indicate and guarantee (especially in combination), because I think there's some confusion even amongst the three of us. I think it's clear that (return

    UPC_CASTABLE_NONE == 0) means nothing is ever castable from thread n, and return

    UPC_CASTABLE_ALL indicates every shared object with affinity to n is (guaranteed?)

    castable. But we need more explanation of combinations.

    If I'm reading correctly then (return == UPC_CASTABLE_ALL|UPC_CASTABLE_CHECK) means everything is *probably* castable, but must be checked on a per-object basis? Is that what I return if I'm a 32-bit platform that can cast any shared object up to some VM space resource limit?

    As a different example, assume I'm a platform that can guarantee the all_alloc space is always castable for thread n, the statically-allocated stuff is never castable, and the other shared heaps are on an object-by-object basis. What is the right return value to indicate that? My reading is one of : return1 == UPC_CASTABLE_ALL_ALLOC return2 == UPC_CASTABLE_CHECK | UPC_CASTABLE_ALL_ALLOC | UPC_CASTABLE_GLOBAL_ALLOC | UPC_CASTABLE_ALLOC Option return1 indicates all_alloc is guaranteed. Option return2 indicates the non-static heaps *may* be castable. Neither option gives the full picture - do we just give up and not provide the expressiveness to indicate that? If so, how does the implementation decide which return value gives the client the information it cares about? (I can imagine clients that care about one property or the other).

    Here's a possible solution - remove the CHECK flag and add a second function argument:

    int upc_castable(int threadid, int queryGuaranteed)

    When queryGuaranteed is non-zero, the return value indicates the segments with affinity to threadid where objects are GUARANTEED to be castable. When it is zero, the return value indicates the segments with affinity to threadid where objects MAY castable, but may fail on an object-by-object basis. The return flags would be otherwise the same, minus the CHECK flag: UPC_CASTABLE_NONE = 0 UPC_CASTABLE_ALL_ALLOC UPC_CASTABLE_GLOBAL_ALLOC UPC_CASTABLE_ALLOC UPC_CASTABLE_STATIC UPC_CASTABLE_ALL = (UPC_CASTABLE_ALL_ALLOC | UPC_CASTABLE_GLOBAL_ALLOC

    UPC_CASTABLE_ALLOCUPC_CASTABLE_STATIC)

    This way it is the client that indicates whether he wants a "probably" or a "guaranteed", and the implementation supplies the appropriate answer.

    ```

    Reported by `danbonachea` on 2012-08-21 15:14:16

  31. Former user Account Deleted

    ``` Dan's suggested change sounds reasonable to me. ```

    Reported by `brian.wibecan` on 2012-08-21 15:48:55

  32. Former user Account Deleted

    ``` After a bit of thought, I have an alternative suggestion. Instead of

    upc_castable(threadid, queryGuaranteed)

    We could have two entry points

    upc_castable(threadid) upc_probably_castable(threadid) ```

    Reported by `brian.wibecan` on 2012-08-22 22:53:22

  33. Former user Account Deleted

    ``` "We could have two entry points"

    I slightly prefer the single entry point design, because requiring that second argument forces the user to think about whether he needs a "guaranteed" answer or if he can work with a "maybe" answer. With two entry points lazy users may get in the habit of calling only the 'guaranteed' entry point and forget the 'probably' one exists, and lose performance opportunities on platforms where those differ (the whole motivation for having both).

    Using a single function also probably makes the formal spec more concise, and groups two very related pieces of functionality into a single function. We're really dealing with a single logical query that returns a result comprised of two integers, but C doesn't cleanly support returning that. If you really don't like the queryGuaranteed argument, here's two alternate formulations:

    void upc_castable(int threadid, int *guaranteedHeaps, int *probablyHeaps) void upc_castable(int threadid, struct { int guaranteedHeaps; int probablyHeaps } *result)

    These allow the user to retrieve the full information with a single call, with the slight usability downside of requiring by-reference arguments for the callee to fill with the results. The first form could allow the user to pass a NULL for a result he doesn't care about, the second form offers extensibility if we ever decide it would be useful to return additional pieces of information.

    Any of these syntax provide the requested information to the user, so I think it's mostly a matter of taste.

    Thoughts? ```

    Reported by `danbonachea` on 2012-08-23 05:50:21

  34. Former user Account Deleted

    ``` Getting back to this. I think either formulation works well. Given that there seems to be more potential complication in the return status than I had envisioned, I'm inclined to support the second version. More specifically:

    typedef struct { int guaranteedHeaps; int probablyHeaps; } upc_castable_t;

    void upc_castable(int threadId, upc_castable_t *result);

    (It could return a struct by value, alternatively.)

    Then a user could create the original thread_castable_function easily enough:

    int thread_castable(int id) { upc_castable_t rval; upc_castable(id, &rval); return (rval.guaranteedHeaps == UPC_CASTABLE_ALL); }

    and a variety of other checks are possible. Plus the structure can be extended without affecting existing code.

    I kind of like returning a structure by value, as it allows any of the following:

    upc_castable_t rval = upc_castable(id);

    return (upc_castable(id).guaranteedHeaps == UPC_CASTABLE_ALL);

    1. define thread_castable(id) (upc_castable(id).guaranteedHeaps == UPC_CASTABLE_ALL)

    but those are not significant advantages. ```

    Reported by `brian.wibecan` on 2012-09-19 18:39:14

  35. Former user Account Deleted

    ``` As you said, the main advantage to the struct approach is extensibility, which seems worthwhile. I had a look through the C99 libraries, and the only analogous example I could find that returned information as a transparent struct type was:

    div_t div(int numer, int denom);

    Where div_t is a struct type that is basically: typedef struct { int quot; int rem; } div_t; although they bend over backwards to ensure the struct fields may be implemented in either order. Anyhow, following that example it seems reasonable to simply return the struct by value.

    Taking a step back and thinking ahead, there may be some value in promoting this "query information about a peer thread" function out of the "castable library" to a more general level, where it can return other interesting information about a peer thread, like the node "distance" to that thread (issue 52, currently slated for 1.4). We might imagine providing other queryable per-thread information in the future (eg hostname, file system info, who knows what else).

    So perhaps we could give the function a more general name, to be inclusive of such future usage?

    upc_castable.h defines a struct type upc_thread_info_t, which contains at least the following members (in unspecified order): int guaranteedCastable; int probablyCastable; The struct may include other implementation-defined fields.

    upc_castable.h defines the constants that are OR-ed together to form the values of the two fields above: UPC_CASTABLE_NONE = 0 UPC_CASTABLE_ALL_ALLOC UPC_CASTABLE_GLOBAL_ALLOC UPC_CASTABLE_ALLOC UPC_CASTABLE_STATIC UPC_CASTABLE_ALL = (UPC_CASTABLE_ALL_ALLOC | UPC_CASTABLE_GLOBAL_ALLOC

    UPC_CASTABLE_ALLOCUPC_CASTABLE_STATIC)

    upc_castable.h also defines two functions:

    upc_thread_info_t upc_thread_info(int threadId); /* the query function */

    void *upc_cast(shared void *ptr); /* the cast operation */

    In a future spec (once the library passes from optional to required), we can promote the upc_thread_info_t type and upc_thread_info query function into a more general header (upc.h?), and leave a placeholder include inside upc_castable.h.

    If you like this idea, lets proceed with writing/updating a concrete document ASAP, so that we can work on solidifying all the required verbiage.

    ```

    Reported by `danbonachea` on 2012-09-19 22:33:43

  36. Former user Account Deleted

    ``` "If you like this idea, lets proceed with writing/updating a concrete document ASAP, so that we can work on solidifying all the required verbiage."

    Yes, I like this idea. I can update the existing optional library document to reflect this revised form, if that's the appropriate action. ```

    Reported by `brian.wibecan` on 2012-09-20 22:21:25

  37. Former user Account Deleted
    Attached to this comment is a draft version of the "castability" proposed library document,
    with revisions incorporating what was discussed here.  I haven't checked any of this
    in.  I also added a feature macro.
    

    Reported by brian.wibecan on 2012-10-09 20:26:04

    <hr> * Attachment: upc-lib-castability-spec-draft.pdf

  38. Former user Account Deleted
    Brian - thanks for the new draft, I think we're getting close here. I'd say go ahead
    and commit the latex you have, so we have a complete record of the document evolution.
    
    A few comments on the current verbiage:
    
    * Please add the following paragraph to 7.4.1, for conformance with issue 91:
    
        Unless otherwise noted, all of the functions, types and macros
        specified in Section~\ref{upc-castable}
        are declared by the header {\tt <upc_castable.h>}.
    
    * 7.4.2.1: We should probably say something about the validity lifetime of the returned
    pointer. I think we probably want to guarantee that it remains valid for the lifetime
    of the referenced shared object, but not after a dynamically-allocated shared object
    is freed. So for example this code is not guaranteed to work:
    
      shared int *sp = upc_alloc(100);
      shared int *p = upc_cast(sp);
      upc_free(sp);
      shared int *sp2 = upc_alloc(100);
      if (sp == sp2 && p) { *p = 100; }
    
    * Similarly, we should say something about the behavior of repeated upc_cast calls
    on the same object with the same affinity by the same thread. Specifically, if upc_cast
    on a pointer with affinity X into a shared object Y returns a valid pointer, then future
    calls on pointers with affinity X to anywhere in the same shared object Y should also
    return a valid pointer (for the lifetime of shared object Y). My point above already
    ensures the first return pointer remains valid for the lifetime of the object, but
    this point additionally guarantees that upc_cast continues to return valid pointers
    into that object. 
    
    * Verbiage correction: upc_cast can return "a null pointer" (a special value defined
    in C99) but cannot return "NULL" (a macro).
    
    * 7.4.2.2: We need to clarify in para 2 and 4 that the info being queried/returned
    is relative to the calling thread.  
    
    * 7.4.2.2: I think we should remove the symbol UPC_CASTABLE_NONE, and simply let zero
    be zero. The statement "can be unambiguously tested using the bitwise AND operator"
    does not hold for a macro defined to be zero, and I don't really see the utility of
    a macro defined to always be zero. 
    
    * Be sure to add \index entries for the flag macros.
    

    Reported by danbonachea on 2012-10-10 19:11:55

  39. Former user Account Deleted
    Brian - here are my detailed comments on the latest draft:
    
    7.4.2.1:
    * Add a "const" qualifier to the signature of upc_cast
    
    * The term "target thread" is used in a few places and is not consistent with UPC spec
    terminology. A PTS does not reference a thread, but it can reference shared memory
    with affinity to a thread. Suggest changing:
    
    "The conversion will be considered possible only if the returned
    pointer-to-local may be used to read and write the target thread's
    entire contiguous portion of the largest shared object
    containing the specified shared address."
    
    to something like:
    
    "The conversion will be considered possible only if the returned
    pointer-to-local may be used to read and write the
    entire contiguous portion 
    with affinity to {\tt upc_threadof(ptr)}
    of the largest shared object
    containing the specified shared address."
    
    That's unfortunately still a rather convoluted sentence (and absolutely crucial to
    the operation of this function), so we might consider splitting this up or rephrasing
    further for clarity.
    
    * We should say something about the behavior of repeated upc_cast calls on the same
    object with the same affinity by the same thread. Specifically, if upc_cast on a pointer
    with affinity X into a shared object Y returns a valid pointer, then future calls on
    pointers with affinity X to anywhere in the same shared object Y should also return
    a valid pointer (for the lifetime of shared object Y). We already ensure the first
    return pointer remains valid for the lifetime of the object, but this point additionally
    guarantees that upc_cast continues to return valid pointers into that object (so the
    client is permitted to cache the first result, but not FORCED to do so - he can just
    call upc_cast again and be assured it still works the second time).
    
    7.4.2.2:
    * Currently upc_thread_info takes the threadID parameter as type "int". This is one
    of the very few places in the current spec that gives a specific type to a thread index,
    the other being upc_threadof() whose return value is a size_t. There was some discussion
    in issue 32 regarding introducing a new type to hold thread indexes, but unless/until
    we implement such a change we should probably use size_t for consistency with upc_threadof().
    
    * Another instance of "target thread":
    "If the flag for a particular region is set in the {\tt guaranteedCastable} field,
    it indicates that any pointer into that region in the specified target thread's memory
    is castable."
    
    change to something like:
    
    "If the flag for a particular region is set in the {\tt guaranteedCastable} field,
    it indicates that any pointer into that region with affinity to {\tt threadId} is castable."
    
    * "Multiple flag values may be combined using the bitwise OR operator" - This statement
    is true but redundant with the next sentence, and potentially confusing as the client
    never needs to construct these values. I suggest just deleting that sentence.
    
    * On the telecon we discussed the possibility allowing additional implementation-defined
    memory region values. Just need a sentence somewhere mentioning that, and specifying
    whether UPC_CASTABLE_ALL includes implementation-defined regions (probably yes). 
    
    In addition to the example you provided for a possible implementation-defined region
    (file scope static vs block scope static), I have another realistic example for consideration.
    Some implementation might decide to introduce an implementation-defined allocator that
    allocates shared memory which is more amenable to upc_cast, eg  shared void *xupc_alloc_castable(size_t
    nbytes), where this function allocates shared memory that the implementation has "worked
    extra hard" to make sure is castable to threads sharing a hardware addressable memory
    domain. So for example if a system has limited resources for cross-mapping shared memory
    into thread peers, the implementation might choose to only do that for memory allocated
    in this "special" way. I'm not advocating implementations to introduce such a feature
    (because of the portability pain it would create), but as a practical matter they may
    see it as preferable to no support for castability. This would constitute a new region
    of shared memory that this implementation could return in guaranteedCastable, even
    if all the spec-provided regions are not guaranteed.
    

    Reported by danbonachea on 2012-10-23 20:55:00

  40. Former user Account Deleted
    I think I have things in pretty good shape now.  The description of the conditions under
    which the upc_cast function succeeds is I think much better.  I've dealt with
    all the suggestions from comment 41.  Attached is a PDF of the section.
    

    Reported by brian.wibecan on 2012-11-04 03:57:01

    <hr> * Attachment: upc-lib-castability-spec-draft.pdf

  41. Former user Account Deleted
    Thanks Brian - Latest draft looks great!
    
    I suggest mailing the official change announcement and moving this issue to PendingApproval.
    

    Reported by danbonachea on 2012-11-04 23:14:02

  42. Former user Account Deleted

    Reported by brian.wibecan on 2012-11-04 23:52:13 - Status changed: PendingApproval

  43. Former user Account Deleted
    This PendingApproval change appeared in the SC12 Draft 3 release.
    It was officially ratified at the 11/29 telecon.
    

    Reported by danbonachea on 2012-11-29 20:03:22 - Status changed: Ratified

  44. Former user Account Deleted
    Attaching a public domain upc_castable.h header file, and a free-to-use compliance tester
    for the new library (also now in SVN).
    
    The upcoming release of BUPC passes this test and will include a fully compliant implementation
    of this library.
    

    Reported by danbonachea on 2013-03-23 20:27:16

    <hr> * Attachment: test-castable.upc * Attachment: upc_castable.h

  45. Log in to comment