Library: Collective Deallocation Functions

Issue #12 new
Former user created an issue

Originally reported on Google Code with ID 12

There is considerable interest from my user base in standardizing collective deallocation
functions:

    void upc_all_free( shared void *ptr );
    void upc_all_lock_free( upc_lock_t *ptr );

There was previously discussion on the Google Site (https://sites.google.com/a/lbl.gov/upc-proposals/cray-position-on-proposed-upc-library-extensions/collective-deallocation-functions),
but this item hasn't been discussed recently.  BUPC has some documentation on their
extension: http://upc.lbl.gov/docs/user/index.shtml#all_free

Are there any objections to collective deallocation functions?

For the Berkeley and Cray implementations, if (for example) upc_all_alloc() and upc_all_free()
were called in a loop (with other processing in between), would users expect to see
a performance increase/decrease in switching from upc_free() to upc_all_free()?  Our
interest is primarily in hopefully seeing a performance increase by letting each thread
do its own deallocation directly through a collective call.

Reported by nspark.work on 2012-05-03 21:02:32

Comments (21)

  1. Former user Account Deleted

    ``` I think there was unanimous consent to these when we discussed them before. As far as the performance goes, I think there is some room for debate there. It is likely both implementation and application dependent. Keep in mind that these have barrier semantics, so if the threads aren't in sync you might run into cases where the non-collective deallocations performs better than the collective ones depending on how the implementation works behind the scenes. ```

    Reported by `sdvormwa@cray.com` on 2012-05-11 17:04:18

  2. Former user Account Deleted

    ``` The previous comment states that the collective deallocation calls have barrier semantics, but I recall the teleconference as agreeing that they did NOT require barrier semantics. What we said was that the object could not be released until ALL threads had called the function, but Dan and others were emphatic that the specification should NOT require that all threads WAIT in the function for the last thread to arrive.

    Of course, the trivial implementation (which I believe both Cray and Berkeley have acknowledged using) does utilize a barrier, but that is not a property we want the spec to require. ```

    Reported by `phhargrove@lbl.gov` on 2012-05-11 19:01:11

  3. Former user Account Deleted

    ``` Gheorghe says that allowing upc_all_free() to not have barrier semantics is a recipe for disaster (or at least for deadlock). I am inclined to believe the argument.

    So, have we not arrived at the following?

    1. define upc_all_free(_p) \ do { upc_barrier; if (!MYTHREAD) upc_free(_p); } while(0)

    If so, what is the benefit to adding this to the spec?

    At least for Berkeley UPC we change the if-test in the definition above to (the internal equivalent of): ... if (MYTHREAD == upc_threadof(_p)) ... which ensures that the free is performed by the thread with affinity to the shared heap data structure and thus improves performance (very) slightly.

    I have no problem with specifying something most of us have already implemented, but if Gheorghe is right then the hoped-for performance benefit will not materialize. ```

    Reported by `phhargrove@lbl.gov` on 2012-05-24 04:05:40

  4. Former user Account Deleted

    Reported by `phhargrove@lbl.gov` on 2012-06-01 03:41:25 - Labels added: Spec-1.3

  5. Former user Account Deleted

    Reported by `phhargrove@lbl.gov` on 2012-06-01 06:08:43 - Labels added: Milestone-Spec-1.3 - Labels removed: Spec-1.3

  6. Former user Account Deleted

    ``` I think as specifiers of a parallel system we need a way to unambiguously specify that the implementation of collective operation X MAY imply a barrier synchronization without REQUIRING it to imply a barrier. Users need to understand this concept and that code whose correctness depends on one or the other possibility has undefined behavior (ie the deadlock is the user's fault).

    The proper C99 spec-ese here is it is "unspecified" whether or not collective operation X implies a barrier synchronization:

    3.4.4: unspecified behavior behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance 2 EXAMPLE An example of unspecified behavior is the order in which the arguments to a function are evaluated.

    I would like to see this approach applied to these functions and also to upc_notify/upc_wait for issue 4 (http://code.google.com/p/upc-specification/issues/detail?id=4). ```

    Reported by `danbonachea` on 2012-06-15 08:37:39

  7. Former user Account Deleted

    ``` Note UPC 3.7 already takes a stab at this issue:

    collective constraint placed on some language operations which requires evaluation of uch operations to be matched3 across all threads. The behavior of collective operations is undefined unless all threads execute the same sequence of collective operations.

    3A collective operation need not provide any actual synchronization between threads, unless otherwise noted. The collective requirement simply states a relative ordering property of calls to collective operations that must be maintained in the parallel execution trace for all executions of any legal program. Some implementations may include unspeci fied synchronization between threads within collective operations, but programs must not rely upon such unspeci fied synchronization for correctness.

    This should probably be amended to state: " programs must not rely upon THE PRESENSE OR ABSENCE OF such unspeci fied synchronization for correctness"

    ```

    Reported by `danbonachea` on 2012-06-15 08:41:44

  8. Former user Account Deleted

    ``` I'm with Dan. As far as collective anything is concerned, MPI3 is the law of the land, and regardless of what they say, they *mean* exactly what Dan just put so concisely. Collective X may - or may not - be completely synchronizing.

    It would be oh so nice if we could put limit on the amount of synchronization a collective is allowed to have. But we don't make their rules, and we don't pay their engineers who write things like MPI_Barrier_start.

    ```

    Reported by `ga10502` on 2012-06-15 14:42:57

  9. Former user Account Deleted

    Reported by `gary.funck` on 2012-07-03 18:07:50 - Labels added: Type-Lib-Required - Labels removed: Type-Enhancement

  10. Former user Account Deleted

    ``` Set default Consensus to "Low". ```

    Reported by `gary.funck` on 2012-08-19 23:26:19 - Labels added: Consensus-Low

  11. Former user Account Deleted

    Reported by `gary.funck` on 2012-08-19 23:45:19 - Labels added: Consensus-High - Labels removed: Consensus-Low

  12. Former user Account Deleted

    ``` The proposed specification language for these new functions has been committed to the new "issue12" branch. The diff against the working draft is copied below, and the attached PDF shows the output of the branch with the proposed additions. An official Change Announcement was mailed to the list on 9/16/2012.

    Proposed Change: -------------------------- --- upc-lib-core.tex (revision 118) +++ upc-lib-core.tex (working copy) @@ -52,6 +52,7 @@ storage, and terminates the execution for all active threads.

    \subsubsection{Shared memory allocation functions} +\label{shared_allocation} \index{shared object, allocation} \index{memory allocation}

    @@ -159,10 +160,50 @@ a null pointer, no action occurs. Otherwise, if the argument does not match a pointer earlier returned by the {\tt upc\_alloc, } {\tt upc\_global\_alloc,} {\tt upc\_all\_alloc,} or {\tt upc\_local\_alloc,} function, - or if the space has been deallocated by a previous call, by any - thread,\footnote{i.e., only one thread may call {\tt upc\_free} for - each allocation} to {\tt upc\_free,} the behavior is undefined. + or if the space has been deallocated by a previous call + to {\tt upc\_free} by any thread,\footnote{i.e., only one thread may call {\tt upc\_free} for + each allocation} + \xadded[id=DB]{12}{or a previous call to {\tt upc\_all\_free}}, + the behavior is undefined.

    +\cbstart +\paragraph{The {\tt upc\_all\_free} function} +\xchangenote[id=DB]{12}{SECTION ADDED} + +{\bf Synopsis} +\index{upc\_all\_free} + +\npf\vspace{-2.5em} +\begin{verbatim} + #include <upc.h> + void upc_all_free(shared void *ptr); +\end{verbatim} + +{\bf Description} + +\np {\tt upc\_all\_free} is a {\em collective} variant of {\tt upc\_free}, provided as a convenience. + It must be called collectively by all threads with the {\em single-valued} argument {\tt ptr}. + +\np The {\tt upc\_all\_free} function frees the dynamically + allocated shared storage pointed to by {\tt ptr}. If {\tt ptr} is + a null pointer, no action occurs. Otherwise, if the argument does + not match a pointer earlier returned by the {\tt upc\_alloc}, + {\tt upc\_global\_alloc}, {\tt upc\_all\_alloc} or {\tt upc\_local\_alloc} function, + or if the space has been deallocated by a previous call to {\tt upc\_free} + or {\tt upc\_all\_free}, the behavior is undefined. + +\np The shared storage referenced by {\tt ptr} is guaranteed to remain valid + until all threads have entered the call to {\tt upc\_all\_free}, but the + function does not otherwise guarantee any synchronization or strict reference. + +\np There is no required correspondence between the functions specified + in Section\ref{shared_allocation} to allocate and free objects. + Either of the {\tt upc\_free} or {\tt upc\_all\_free} functions + may be used to free shared space allocated using {\tt upc\_all\_alloc}, + {\tt upc\_alloc}, {\tt upc\_global\_alloc}, or {\tt upc\_local\_alloc}. + +\cbend + \subsubsection{Pointer-to-shared manipulation functions}

    \paragraph{The {\tt upc\_threadof} function} @@ -374,9 +415,12 @@ Otherwise, if the argument does not match a pointer earlier returned by the {\tt upc\_global\_lock\_alloc} or {\tt upc\_all\_lock\_alloc} function, or if the lock has been - deallocated by a previous call to {\tt upc\_lock\_free,}\footnote + deallocated by a previous call to {\tt upc\_lock\_free} by any thread,% + \footnote {i.e., only one thread may call {\tt upc\_lock\_free} for each - allocation} the behavior is undefined. + allocation} + \xadded[id=DB]{12}{or a previous call to {\tt upc\_all\_lock\_free},} + the behavior is undefined.

    \np{\tt upc\_lock\_free} succeeds regardless of whether the referenced lock is currently unlocked or currently locked (by any @@ -395,6 +439,52 @@ applies to any thread currently calling {\tt upc\_lock}. }

    +\cbstart +\paragraph{The {\tt upc\_all\_lock\_free} function} +\xchangenote[id=DB]{12}{SECTION ADDED} + +{\bf Synopsis} +\index{upc\_all\_lock\_free} + +\npf\vspace{-2.5em} +\begin{verbatim} + #include <upc.h> + void upc_all_lock_free(upc_lock_t *ptr); +\end{verbatim} + +{\bf Description} + +\np {\tt upc\_all\_lock\_free} is a {\em collective} variant of {\tt upc\_lock\_free}, provided as a convenience. + It must be called collectively by all threads with the {\em single-valued} argument {\tt ptr}. + +\np The {\tt upc\_all\_lock\_free} function frees all resources + associated with the dynamically allocated {\tt upc\_lock\_t} pointed to + by {\tt ptr}. If {\tt ptr} is a null pointer, no action occurs. + Otherwise, if the argument does not match a pointer earlier + returned by the {\tt upc\_global\_lock\_alloc} or {\tt + upc\_all\_lock\_alloc} function, or if the lock has been + deallocated by a previous call to {\tt upc\_lock\_free} + or {\tt upc\_all\_lock\_free}, the behavior is undefined. + +\np{\tt upc\_all\_lock\_free} succeeds regardless of whether the + referenced lock is currently unlocked or currently locked (by any + thread). + +\np The lock referenced by {\tt ptr} is guaranteed to remain valid + until all threads have entered the call to {\tt upc\_all\_lock\_free}, but the + function does not otherwise guarantee any synchronization or strict reference. + +\np Any subsequent calls to locking functions from any + thread using {\tt ptr} have undefined effects. + +\np There is no required correspondence between the functions specified + in Section\ref{upc_lock} to allocate and free locks. + Either of the {\tt upc\_lock\_free} or {\tt upc\_all\_lock\_free} functions + may be used to free locks allocated using {\tt upc\_global\_lock\_alloc} or + {\tt upc\_all\_lock\_alloc}. + +\cbend + \paragraph{The {\tt upc\_lock} function}

    {\bf Synopsis}

    ```

    Reported by `danbonachea` on 2012-09-17 00:50:41 - Status changed: `PendingApproval`

    <hr>

  13. Former user Account Deleted

    ``` I'm bumping this issue as a reminder.

    Of all the current PendingApprovals, this issue involves the greatest amount of "new text" by far, all of which was authored by myself with minimal outside input of actual verbiage. I'd appreciate several additional sets of eyes reviewing this text and signing off on it (even a simple "looks good"), to ensure we get it right "the first time". ```

    Reported by `danbonachea` on 2012-09-21 18:57:26 - Labels added: Priority-Critical - Labels removed: Priority-Medium

  14. Former user Account Deleted

    ``` Dan, I think your changes look good, however, I wonder whether the clarification you note in 7.2.2.5 Paragraph 5 might be more appropriate as 7.2.2 Paragraph 2, leading in to the shared memory (de)allocation functions, rather than as a follow-up to only one of them. Should this seem reasonable, the equivalent text in 7.2.4.5 Paragraph 7 might also want to move possibly to/as 7.2.4 Paragraph 1. ```

    Reported by `nspark.work` on 2012-09-21 20:00:44

  15. Former user Account Deleted

    ``` "I wonder whether the clarification you note in 7.2.2.5 Paragraph 5 might be more appropriate as 7.2.2 Paragraph 2"

    I like this idea - I'll go ahead and promote those two paragraphs to the top of their enclosing section. ```

    Reported by `danbonachea` on 2012-09-21 21:20:59

  16. Former user Account Deleted

    ``` There appear to several places in the Issue12 branch where upc_local_alloc is mentioned. For example,

    lang/upc-lib-core.tex-\np \xadded[id=DB]{12}{ lang/upc-lib-core.tex- There is no required correspondence between the functions specified lang/upc-lib-core.tex- in Section\ref{shared_allocation} to allocate and free objects. lang/upc-lib-core.tex- Either of the {\tt upc\_free} or {\tt upc\_all\_free} functions lang/upc-lib-core.tex- may be used to free shared space allocated using {\tt upc\_all\_alloc}, lang/upc-lib-core.tex: {\tt upc\_alloc}, {\tt upc\_global\_alloc}, or {\tt upc\_local\_alloc}. lang/upc-lib-core.tex- }

    Since upc_local_alloc() is deprecated, perhaps these references to upc_local_alloc() should be deleted?

    Also, is there any merit in merging the ttunk into the Issue12 branch to bring it up-to-date?

    Also, currently this issue is tagged as Type-Lib-Required. Should this be re-classified as Type-Lib-Core?

    ```

    Reported by `gary.funck` on 2012-09-25 05:49:34

  17. Former user Account Deleted

    ```

    Since upc_local_alloc() is deprecated, perhaps these references to upc_local_alloc()

    should be deleted?

    Good point - in fact as of the ratification of issue 82 upc_local_alloc is now completely removed, so I've gone ahead and removed the left-over references to it on the branch. Updated PDF is attached.

    Also, is there any merit in merging the ttunk into the Issue12 branch to bring it

    up-to-date?

    Unlike code projects, such merging seems unnecessary since there is no "testing" to be done. The branch is basically just convenient storage for a big documentation diff, and once the diff is ratified I will apply it directly to the trunk.

    I think a better question to ask is do we really need to let this diff sit around for another two weeks before application, or can the committee agree to expedite ratification on some of these PendingApprovals in the interests of making progress.. Perhaps now that we're meeting weekly we should shorten the approval cycle?

    ```

    Reported by `danbonachea` on 2012-09-25 22:05:56 - Labels added: Type-Lib-Core - Labels removed: Type-Lib-Required

    <hr>

  18. Former user Account Deleted
    This PendingApproval issue was ratified at the 10/19/2012 telecon, and merged into the
    working draft in SVN 175.
    

    Reported by danbonachea on 2012-10-22 18:43:13 - Status changed: Ratified

  19. Log in to comment