Relaxing the relaxed consistency same-address exception

Issue #6 new
Former user created an issue

Originally reported on Google Code with ID 6 ``` UPC 1.2 5.1.2.3.3 says that relaxed shared accesses issued by a given thread may appear to be arbitrarily reordered and different threads need not agree upon the order. "The only exception to the previous statement is that two relaxed accesses issued by a given thread to the same memory location where at least one is a write will always appear to all threads to have executed in program order."

The exception to the rule actually has a measurable performance impact. A UPC compiler generally does not know the target of a pointer-to-shared at compile time and this exception requires the compiler to be conservative. For any sequence of pointer accesses involving a write, such as:

  • p = 0;
  • q = 1;

The compiler must assume that p == q and ensure that the effects of these statements appear in program order to all threads.

Furthermore, any access near the end of a function must block until it is fully visible, even if that access is via an array. Consider:

a[i] = 0; return;

Unless there is full inlining, the compiler cannot tell whether there is another access to a[i] following the return but prior to the next fence.

There was a discussion with Bill Carlson a few years ago regarding why this exception was included in the spec. I recall him saying that it was to avoid surprising the user. The surprise would be that a thread executing the following code, for example:

a[i] = 0; a[i] = 1; upc_fence;

would not be able to rely on other threads seeing a specific value for a[i] following the fence; some threads may see 0, others may see 1. Cray has had the opposite experience with users. Some have not understood why the compiler was being conservative and had interpreted the "relaxed" model to be very literally relaxed in every sense, including from the same thread to the same address. Generally we are not seeing users write to the same location multiple times between two fences. (When users do write multiple times, they are generally aware of it and are often using atomic operations.)

Cray has worked around the performance implications by adding a #pragma that can be applied to a statement to force any synchronization of communication performed by that statement to be deferred until the next fence. Continuing one of the above examples, Cray supports:

  1. pragma pgas defer_sync a[i] = 0; return;

Which initiates a Put to a[i] and does not wait for its completion prior to the return.

To eliminate the need for this directive, provide more optimization opportunities, and to match what many users expect, we suggest revising the exception as follows:

"The only exception to the previous statement is that two relaxed accesses issued by a given thread to the same memory location with affinity to that same thread where at least one access is a write will always appear to all threads to have executed in program order."

Because nearly all implementations will handle local references immediately via processor loads and stores (for which the processor will ensure ordering from the same thread), this language will allow UPC compilers to be less conservative. ```

Reported by `johnson.troy.a` on 2012-03-13 19:48:31

Comments (3)

  1. Former user Account Deleted

    ``` My view on this is "complicated". On the one hand, I certainly recognize the performance implications here and have always advocated in favor of performance enhancing features in UPC. On the other hand, there are people who have been surprised by the following code:

    foo (shared int *x) {

    • x = 3; printf("%d", *x); }

    and not had "3" printed out. Such results can be explained, of course, but it would be nicer not to have to :)

    I think I like the proposed softening as it will lessen some surprise but allow good performance, but would like to hear other views.

    I'd also point out that if compilers generally did CSE on the above code, performance would improve and there would be less surprise!

    ```

    Reported by `wwc@uuuuc.us` on 2012-03-16 10:46:11

  2. Former user Account Deleted

    ``` My biggest concern with relaxing the relaxed memory model -- which accidentally monopolized our most recent telecon -- is that existing codes *may* see negative effects from this change that could be difficult to debug. I would probably be more in favor of adding a "super-relaxed" memory model for those that want the least overhead from runtime help and want to do all their own synchronization and fencing manually. In principle, this seems to me to be strictly an addition to the language, not a change, which is important from a "I don't want existing codes to start breaking" perspective. I think that a third reference type qualifier for this super-relaxed mode would allow developers to ease into this less protected and "more surprising" memory model. ```

    Reported by `nspark.work` on 2012-04-28 21:54:42

  3. Former user Account Deleted

    ``` Marking 2.0 and Performance. ```

    Reported by `johnson.troy.a` on 2012-06-15 18:04:56 - Labels added: Milestone-Spec-2.0, Performance

  4. Log in to comment