UPC progress guarantees

Issue #48 new
Former user created an issue

Originally reported on Google Code with ID 48 ``` What are the forward progress guarantees of UPC?

Background

At the risk of oversimplifying the matter, let us say that there are two major ways in modern networks to handle incoming messages: explicit (polling) vs. implicit (interrupt based, special progress threads or anything else invisible to the programmer, acting behind the scenes).

MPI has, for its part, gathered ample evidence that polling based progress is "good enough", even for things like MPI-2 one-sided communication.

On the other hand most PGAS languages tacitly assume that one-sided operations are truly one-sided. The problem is an implementation one - even with today's modern networks asynchronous progress guarantees are tricky.

So where does UPC stand?

Absolute progress guarantee: are we ready to say that UPC should be able to make steady forward progress UPC thread X is engaged in a long lasting computation and other threads are accessing data affine to X? what would implementors say?

upc_poll(): Berkeley UPC (and IBM xlUPC) provide the upc_poll() function that users can call explicitly to make progress. Is a UPC program allowed to deadlock/livelock due to failure on the programmer's part to call upc_poll() at the appropriate time? should we look at upc_poll() as a tool for performance *optimization* or as a way to avoid embarrassing deadlocks on certain architectures?

```

Reported by `ga10502` on 2012-05-23 13:47:19

Comments (4)

  1. Former user Account Deleted

    ``` Tagging Usability and Performance. Will defer to Owner on Milestone version, but suggest 2.0.

    Cray UPC handles this issue by having upc_fence poll and explaining to the user that they may want to fence periodically during a long-running computation if they are doing things that aren't truly one-sided in our implementation (e.g., upc_free() of data with affinity to a different PE). ```

    Reported by `johnson.troy.a` on 2012-06-15 18:13:31 - Labels added: Usability, Performance

  2. Former user Account Deleted

    ``` IBM and Berkeley[1] say: spin on upc_poll() Cray says: spin on upc_fence()

    In the Berkeley case, upc_fence() would also work, but we provide upc_poll() to "make progress" without also having the fencing property (becomes a no-op in the "pure pthreads" case where there is no network to poll).

    So, I in favor of DISCUSSING whether we want upc_poll() in the language spec. There should be no problem with

    1. define upc_poll upc_fence as a trivially correct implementation. To me the crux of the discussion is whether the inclusion of upc_poll() is useful, or just a horrible substitute for a true progress guarantee.

    My initial thought is that if we believe that MPI's experience "proves" that explicit polling is good enough, then upc_poll() would just be an optimization as George suggests. HOWEVER, I don't think the current UPC specification does anything that precludes writing CORRECT code that {dead,live}locks if one assumes true asynchronous progress is made. So, I would argue that as the spec currently stands, any implementation (my own included) which requires insertion of poll/fence calls to ensure progress is BROKEN. Therefore, I would argue that upc_poll() does NOT belong in the specification (as a mechanism to avoid implementation limitations).

    [1] I've mentioned before that Berkeley avoids placing our extensions into the upc_* namespace. The case of upc_poll() predates our realization of how doing this can lead to later headaches. ```

    Reported by `phhargrove@lbl.gov` on 2012-06-18 22:32:26

  3. Former user Account Deleted

    ``` We need to identify what parts of UPC require polling for progress in current implementations. That information is useful to this discussion whether or not explicit polling is added to the UPC spec. (If polling is added, then users need guidance on when they should poll. If polling is not added, then we need the information to figure out how we can live without polling.)

    For example, Cray does not need polling to handle Get, Put, or AMO operations, but we need it to handle the following:

    1) upc_global_{lock_}alloc - One thread calls upc_global_{lock_}alloc and all threads must perform an allocation. 2) upc_free - One thread calls upc_free to deallocate memory that has affinity to a different thread. 3) upc_global_exit - One thread terminates all threads.

    In all of these cases, one thread does something that requires action by other threads. For (1), we discourage users from calling the function and advise them to use upc_all_{lock_}alloc for better performance. For (2), we see it in test cases for upc_free, but have not seen it in a real application. For (3), generally it is called after an error is detected and the function's description makes no guarantee about how quickly it must terminate the application, so performance is not a concern provided that the original thread continues to respond to other threads until they have exited.

    I'll note that we used to require polling for upc_memcpy for the case where neither the source nor the destination had affinity to the calling thread because we used to cause a direct transfer from the source to the destination, but it turned out that if a user actually writes such code, then they expect the calling thread to perform the copy itself via temporary buffering. ```

    Reported by `johnson.troy.a` on 2012-06-19 16:26:49

  4. Former user Account Deleted

    ``` Tagged for the version 1.4 specification milestone.

    Although we may be able to reach consensus on the progress guarantees, need (or lack thereof) for polling, I doubt that there is sufficient time to re-work implmentations in the near term, for example, if the decision is made to remove user-level polling requirements across the board.

    ```

    Reported by `gary.funck` on 2012-07-02 16:07:43 - Labels added: Milestone-Spec-1.4

  5. Log in to comment