Library: Collective Deallocation Functions
Originally reported on Google Code with ID 12
There is considerable interest from my user base in standardizing collective deallocation
functions:
void upc_all_free( shared void *ptr );
void upc_all_lock_free( upc_lock_t *ptr );
There was previously discussion on the Google Site (https://sites.google.com/a/lbl.gov/upc-proposals/cray-position-on-proposed-upc-library-extensions/collective-deallocation-functions),
but this item hasn't been discussed recently. BUPC has some documentation on their
extension: http://upc.lbl.gov/docs/user/index.shtml#all_free
Are there any objections to collective deallocation functions?
For the Berkeley and Cray implementations, if (for example) upc_all_alloc() and upc_all_free()
were called in a loop (with other processing in between), would users expect to see
a performance increase/decrease in switching from upc_free() to upc_all_free()? Our
interest is primarily in hopefully seeing a performance increase by letting each thread
do its own deallocation directly through a collective call.
Reported by nspark.work
on 2012-05-03 21:02:32
Comments (21)
-
Account Deleted -
Account Deleted ``` The previous comment states that the collective deallocation calls have barrier semantics, but I recall the teleconference as agreeing that they did NOT require barrier semantics. What we said was that the object could not be released until ALL threads had called the function, but Dan and others were emphatic that the specification should NOT require that all threads WAIT in the function for the last thread to arrive.
Of course, the trivial implementation (which I believe both Cray and Berkeley have acknowledged using) does utilize a barrier, but that is not a property we want the spec to require. ```
Reported by `phhargrove@lbl.gov` on 2012-05-11 19:01:11
-
Account Deleted ``` Danger Will Robinson. If you allow upc_all_free() to *not* have barrier semantics, somebody will write code that deadlocks when upc_all_free() actually *does* have barrier semantics. See issue 4 for nasty interaction between locks and notify/wait pairs. http://code.google.com/p/upc-specification/issues/detail?id=4 ```
Reported by `ga10502` on 2012-05-24 03:37:15
-
Account Deleted ``` Gheorghe says that allowing upc_all_free() to not have barrier semantics is a recipe for disaster (or at least for deadlock). I am inclined to believe the argument.
So, have we not arrived at the following?
- define upc_all_free(_p) \ do { upc_barrier; if (!MYTHREAD) upc_free(_p); } while(0)
If so, what is the benefit to adding this to the spec?
At least for Berkeley UPC we change the if-test in the definition above to (the internal equivalent of): ... if (MYTHREAD == upc_threadof(_p)) ... which ensures that the free is performed by the thread with affinity to the shared heap data structure and thus improves performance (very) slightly.
I have no problem with specifying something most of us have already implemented, but if Gheorghe is right then the hoped-for performance benefit will not materialize. ```
Reported by `phhargrove@lbl.gov` on 2012-05-24 04:05:40
-
Account Deleted Reported by `phhargrove@lbl.gov` on 2012-06-01 03:41:25 - Labels added: Spec-1.3
-
Account Deleted Reported by `phhargrove@lbl.gov` on 2012-06-01 06:08:43 - Labels added: Milestone-Spec-1.3 - Labels removed: Spec-1.3
-
Account Deleted ``` I think as specifiers of a parallel system we need a way to unambiguously specify that the implementation of collective operation X MAY imply a barrier synchronization without REQUIRING it to imply a barrier. Users need to understand this concept and that code whose correctness depends on one or the other possibility has undefined behavior (ie the deadlock is the user's fault).
The proper C99 spec-ese here is it is "unspecified" whether or not collective operation X implies a barrier synchronization:
3.4.4: unspecified behavior behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance 2 EXAMPLE An example of unspecified behavior is the order in which the arguments to a function are evaluated.
I would like to see this approach applied to these functions and also to upc_notify/upc_wait for issue 4 (http://code.google.com/p/upc-specification/issues/detail?id=4). ```
Reported by `danbonachea` on 2012-06-15 08:37:39
-
Account Deleted ``` Note UPC 3.7 already takes a stab at this issue:
collective constraint placed on some language operations which requires evaluation of uch operations to be matched3 across all threads. The behavior of collective operations is undefined unless all threads execute the same sequence of collective operations.
3A collective operation need not provide any actual synchronization between threads, unless otherwise noted. The collective requirement simply states a relative ordering property of calls to collective operations that must be maintained in the parallel execution trace for all executions of any legal program. Some implementations may include unspecified synchronization between threads within collective operations, but programs must not rely upon such unspecified synchronization for correctness.
This should probably be amended to state: " programs must not rely upon THE PRESENSE OR ABSENCE OF such unspecified synchronization for correctness"
```
Reported by `danbonachea` on 2012-06-15 08:41:44
-
Account Deleted ``` I'm with Dan. As far as collective anything is concerned, MPI3 is the law of the land, and regardless of what they say, they *mean* exactly what Dan just put so concisely. Collective X may - or may not - be completely synchronizing.
It would be oh so nice if we could put limit on the amount of synchronization a collective is allowed to have. But we don't make their rules, and we don't pay their engineers who write things like MPI_Barrier_start.
```
Reported by `ga10502` on 2012-06-15 14:42:57
-
Account Deleted Reported by `gary.funck` on 2012-07-03 18:07:50 - Labels added: Type-Lib-Required - Labels removed: Type-Enhancement
-
Account Deleted Reported by `gary.funck` on 2012-07-03 18:10:45
-
Account Deleted ``` Set default Consensus to "Low". ```
Reported by `gary.funck` on 2012-08-19 23:26:19 - Labels added: Consensus-Low
-
Account Deleted Reported by `gary.funck` on 2012-08-19 23:45:19 - Labels added: Consensus-High - Labels removed: Consensus-Low
-
Account Deleted ``` The proposed specification language for these new functions has been committed to the new "issue12" branch. The diff against the working draft is copied below, and the attached PDF shows the output of the branch with the proposed additions. An official Change Announcement was mailed to the list on 9/16/2012.
Proposed Change: -------------------------- --- upc-lib-core.tex (revision 118) +++ upc-lib-core.tex (working copy) @@ -52,6 +52,7 @@ storage, and terminates the execution for all active threads.
\subsubsection{Shared memory allocation functions} +\label{shared_allocation} \index{shared object, allocation} \index{memory allocation}
@@ -159,10 +160,50 @@ a null pointer, no action occurs. Otherwise, if the argument does not match a pointer earlier returned by the {\tt upc\_alloc, } {\tt upc\_global\_alloc,} {\tt upc\_all\_alloc,} or {\tt upc\_local\_alloc,} function, - or if the space has been deallocated by a previous call, by any - thread,\footnote{i.e., only one thread may call {\tt upc\_free} for - each allocation} to {\tt upc\_free,} the behavior is undefined. + or if the space has been deallocated by a previous call + to {\tt upc\_free} by any thread,\footnote{i.e., only one thread may call {\tt upc\_free} for + each allocation} + \xadded[id=DB]{12}{or a previous call to {\tt upc\_all\_free}}, + the behavior is undefined.
+\cbstart +\paragraph{The {\tt upc\_all\_free} function} +\xchangenote[id=DB]{12}{SECTION ADDED} + +{\bf Synopsis} +\index{upc\_all\_free} + +\npf\vspace{-2.5em} +\begin{verbatim} + #include <upc.h> + void upc_all_free(shared void *ptr); +\end{verbatim} + +{\bf Description} + +\np {\tt upc\_all\_free} is a {\em collective} variant of {\tt upc\_free}, provided as a convenience. + It must be called collectively by all threads with the {\em single-valued} argument {\tt ptr}. + +\np The {\tt upc\_all\_free} function frees the dynamically + allocated shared storage pointed to by {\tt ptr}. If {\tt ptr} is + a null pointer, no action occurs. Otherwise, if the argument does + not match a pointer earlier returned by the {\tt upc\_alloc}, + {\tt upc\_global\_alloc}, {\tt upc\_all\_alloc} or {\tt upc\_local\_alloc} function, + or if the space has been deallocated by a previous call to {\tt upc\_free} + or {\tt upc\_all\_free}, the behavior is undefined. + +\np The shared storage referenced by {\tt ptr} is guaranteed to remain valid + until all threads have entered the call to {\tt upc\_all\_free}, but the + function does not otherwise guarantee any synchronization or strict reference. + +\np There is no required correspondence between the functions specified + in Section\ref{shared_allocation} to allocate and free objects. + Either of the {\tt upc\_free} or {\tt upc\_all\_free} functions + may be used to free shared space allocated using {\tt upc\_all\_alloc}, + {\tt upc\_alloc}, {\tt upc\_global\_alloc}, or {\tt upc\_local\_alloc}. + +\cbend + \subsubsection{Pointer-to-shared manipulation functions}
\paragraph{The {\tt upc\_threadof} function} @@ -374,9 +415,12 @@ Otherwise, if the argument does not match a pointer earlier returned by the {\tt upc\_global\_lock\_alloc} or {\tt upc\_all\_lock\_alloc} function, or if the lock has been - deallocated by a previous call to {\tt upc\_lock\_free,}\footnote + deallocated by a previous call to {\tt upc\_lock\_free} by any thread,% + \footnote {i.e., only one thread may call {\tt upc\_lock\_free} for each - allocation} the behavior is undefined. + allocation} + \xadded[id=DB]{12}{or a previous call to {\tt upc\_all\_lock\_free},} + the behavior is undefined.
\np{\tt upc\_lock\_free} succeeds regardless of whether the referenced lock is currently unlocked or currently locked (by any @@ -395,6 +439,52 @@ applies to any thread currently calling {\tt upc\_lock}. }
+\cbstart +\paragraph{The {\tt upc\_all\_lock\_free} function} +\xchangenote[id=DB]{12}{SECTION ADDED} + +{\bf Synopsis} +\index{upc\_all\_lock\_free} + +\npf\vspace{-2.5em} +\begin{verbatim} + #include <upc.h> + void upc_all_lock_free(upc_lock_t *ptr); +\end{verbatim} + +{\bf Description} + +\np {\tt upc\_all\_lock\_free} is a {\em collective} variant of {\tt upc\_lock\_free}, provided as a convenience. + It must be called collectively by all threads with the {\em single-valued} argument {\tt ptr}. + +\np The {\tt upc\_all\_lock\_free} function frees all resources + associated with the dynamically allocated {\tt upc\_lock\_t} pointed to + by {\tt ptr}. If {\tt ptr} is a null pointer, no action occurs. + Otherwise, if the argument does not match a pointer earlier + returned by the {\tt upc\_global\_lock\_alloc} or {\tt + upc\_all\_lock\_alloc} function, or if the lock has been + deallocated by a previous call to {\tt upc\_lock\_free} + or {\tt upc\_all\_lock\_free}, the behavior is undefined. + +\np{\tt upc\_all\_lock\_free} succeeds regardless of whether the + referenced lock is currently unlocked or currently locked (by any + thread). + +\np The lock referenced by {\tt ptr} is guaranteed to remain valid + until all threads have entered the call to {\tt upc\_all\_lock\_free}, but the + function does not otherwise guarantee any synchronization or strict reference. + +\np Any subsequent calls to locking functions from any + thread using {\tt ptr} have undefined effects. + +\np There is no required correspondence between the functions specified + in Section\ref{upc_lock} to allocate and free locks. + Either of the {\tt upc\_lock\_free} or {\tt upc\_all\_lock\_free} functions + may be used to free locks allocated using {\tt upc\_global\_lock\_alloc} or + {\tt upc\_all\_lock\_alloc}. + +\cbend + \paragraph{The {\tt upc\_lock} function}
{\bf Synopsis}
```
Reported by `danbonachea` on 2012-09-17 00:50:41 - Status changed: `PendingApproval`
<hr>
- *Attachment: [upc-lang-spec-issue12-draft.pdf](https://storage.googleapis.com/google-code-attachments/upc-specification/issue-12/comment-14/upc-lang-spec-issue12-draft.pdf)*
-
Account Deleted ``` I'm bumping this issue as a reminder.
Of all the current PendingApprovals, this issue involves the greatest amount of "new text" by far, all of which was authored by myself with minimal outside input of actual verbiage. I'd appreciate several additional sets of eyes reviewing this text and signing off on it (even a simple "looks good"), to ensure we get it right "the first time". ```
Reported by `danbonachea` on 2012-09-21 18:57:26 - Labels added: Priority-Critical - Labels removed: Priority-Medium
-
Account Deleted ``` Dan, I think your changes look good, however, I wonder whether the clarification you note in 7.2.2.5 Paragraph 5 might be more appropriate as 7.2.2 Paragraph 2, leading in to the shared memory (de)allocation functions, rather than as a follow-up to only one of them. Should this seem reasonable, the equivalent text in 7.2.4.5 Paragraph 7 might also want to move possibly to/as 7.2.4 Paragraph 1. ```
Reported by `nspark.work` on 2012-09-21 20:00:44
-
Account Deleted ``` "I wonder whether the clarification you note in 7.2.2.5 Paragraph 5 might be more appropriate as 7.2.2 Paragraph 2"
I like this idea - I'll go ahead and promote those two paragraphs to the top of their enclosing section. ```
Reported by `danbonachea` on 2012-09-21 21:20:59
-
Account Deleted ``` I've applied the suggestion in comment 16 to the branch, updated PDF is attached. ```
Reported by `danbonachea` on 2012-09-21 21:29:31
<hr>
- *Attachment: [upc-lang-spec-issue12-draftB.pdf](https://storage.googleapis.com/google-code-attachments/upc-specification/issue-12/comment-18/upc-lang-spec-issue12-draftB.pdf)*
-
Account Deleted ``` There appear to several places in the Issue12 branch where upc_local_alloc is mentioned. For example,
lang/upc-lib-core.tex-\np \xadded[id=DB]{12}{ lang/upc-lib-core.tex- There is no required correspondence between the functions specified lang/upc-lib-core.tex- in Section\ref{shared_allocation} to allocate and free objects. lang/upc-lib-core.tex- Either of the {\tt upc\_free} or {\tt upc\_all\_free} functions lang/upc-lib-core.tex- may be used to free shared space allocated using {\tt upc\_all\_alloc}, lang/upc-lib-core.tex: {\tt upc\_alloc}, {\tt upc\_global\_alloc}, or {\tt upc\_local\_alloc}. lang/upc-lib-core.tex- }
Since upc_local_alloc() is deprecated, perhaps these references to upc_local_alloc() should be deleted?
Also, is there any merit in merging the ttunk into the Issue12 branch to bring it up-to-date?
Also, currently this issue is tagged as Type-Lib-Required. Should this be re-classified as Type-Lib-Core?
```
Reported by `gary.funck` on 2012-09-25 05:49:34
-
Account Deleted ```
Since upc_local_alloc() is deprecated, perhaps these references to upc_local_alloc()
should be deleted?
Good point - in fact as of the ratification of issue 82 upc_local_alloc is now completely removed, so I've gone ahead and removed the left-over references to it on the branch. Updated PDF is attached.
Also, is there any merit in merging the ttunk into the Issue12 branch to bring it
up-to-date?
Unlike code projects, such merging seems unnecessary since there is no "testing" to be done. The branch is basically just convenient storage for a big documentation diff, and once the diff is ratified I will apply it directly to the trunk.
I think a better question to ask is do we really need to let this diff sit around for another two weeks before application, or can the committee agree to expedite ratification on some of these PendingApprovals in the interests of making progress.. Perhaps now that we're meeting weekly we should shorten the approval cycle?
```
Reported by `danbonachea` on 2012-09-25 22:05:56 - Labels added: Type-Lib-Core - Labels removed: Type-Lib-Required
<hr>
- *Attachment: [upc-lang-spec-issue12-draftC.pdf](https://storage.googleapis.com/google-code-attachments/upc-specification/issue-12/comment-20/upc-lang-spec-issue12-draftC.pdf)*
-
Account Deleted This PendingApproval issue was ratified at the 10/19/2012 telecon, and merged into the working draft in SVN 175.
Reported by
danbonachea
on 2012-10-22 18:43:13 - Status changed:Ratified
- Log in to comment
``` I think there was unanimous consent to these when we discussed them before. As far as the performance goes, I think there is some room for debate there. It is likely both implementation and application dependent. Keep in mind that these have barrier semantics, so if the threads aren't in sync you might run into cases where the non-collective deallocations performs better than the collective ones depending on how the implementation works behind the scenes. ```
Reported by `sdvormwa@cray.com` on 2012-05-11 17:04:18