Deprecate support for static THREADS compilation

Issue #30 new
Former user created an issue

Originally reported on Google Code with ID 30

This proposal is responsive to issue 46.

46. Given that any program that compiles under the dynamic compilation environment
will also compile in the static environment, might it help to simplify the language
and compilation environment by removing support for the static threads compilation
environment?

UPC programmers have some difficulty in determiing when the use of the static threads
compilation environment is required.  Further attempts to compile programs using the
dynamic threads compilation environment when those programs use constructs that can
be compiled only under a static threads environment can lead to situations where the
compiler emits some seemingly obscure error messages referring to "variably modified"
types.  

While these error messages can be improved, it isn't clear that the gains in expressivity
warrant the additional complexity of describing and supporting the static threads compilation
model.  In addition, since the dynamic threads compilation model naturally supports
an arbitrary level of parallel thread execution at runtime, it promotes the development
of more flexible and scalable UPC applications.

Reported by gary.funck on 2012-05-22 00:05:02

Comments (31)

  1. Former user Account Deleted

    ``` As an implementer I'd be happy to drop the static threads environment as a concept at the language specification level. By that I mean that anything the language says is only legal in a static threads environment would become illegal, and all references to static vs dynamic threads would be removed.

    The alleged "optimization opportunities" (such as loop unrolling) that comes from a compile-time constant number of threads CAN REMAIN. Nothing in the changes I envision being made to the UPC spec would prohibit the compiler from taking the number of threads as a compiler option.

    Of course, to be realistic about backward comparability, I cannot actually endorse full removal for UPC 1.3. It is, I think, worthy of consideration for removal in a 2.0 specification. So, I would propose that a UPC 1.3 compliant compiler would WARN that static-threads is deprecated ("scheduled for removal in a future specification"?) when passed the thread-count argument AND it encounters a legal-only-in-static-thread-environment construct in the code. As an implementer I don't look forward to adding the warnings, but am expecting that I'll find all the proper places in the compiler by searching for the if-static-threads checks.

    I would, of course, also encourage improvement in the error messages that result today when a static-only code is compiled in a dynamic-threads environment. (I say this knowing that BUPC is probably one of the worst offenders). ```

    Reported by `phhargrove@lbl.gov` on 2012-05-22 01:19:27

  2. Former user Account Deleted

    ``` Static THREADS is the only means in the standard language for declaring an array with multiple THREADS dimensions, or otherwise with a size that can only be determined at run time (e.g. THREADS*THREADS). Similarly, it is the only way to declare an array with a block size of THREADS. If the language had some expected ability to handle these cases in the dynamic THREADS environment, I expect getting rid of the static THREADS environment would be more palatable. ```

    Reported by `brian.wibecan` on 2012-05-22 21:14:12

  3. Former user Account Deleted

    ``` It isn't too difficult to contemplate an inmplementation where shared array declarations are re-written into a pointer-to-shared variable (either literally or conceptually) that is initialized via a call to upc_all_alloc() with the appropriate parameters, prior to "main" being called. With this approach, or something similar, most restrictions currently imposed by the dynamic threads compilation environment are eliminated.

    The trade-off is that some folding of array index calculations is not possible, but as Paul mentioned, a compiler can still implement static threads as an extension.

    ```

    Reported by `gary.funck` on 2012-05-22 21:27:39

  4. Former user Account Deleted

    ``` Static threads has always seemed like a hack to me and I wouldn't mind seeing it go. However, the issue Brian raised is really important -- this is the only way currently to use arrays whose size isn't known at compile time.

    Gary, I like the idea of an implementation that dynamically allocates the given array. Is this something that would be possible to prototype to increase our confidence before deprecating the static threads environment? ```

    Reported by `james.dinan` on 2012-05-31 19:08:51

  5. Former user Account Deleted

    ``` Jim wrote:

    Gary, I like the idea of an implementation that dynamically allocates the given array. Is this something that would be possible to prototype to increase our confidence before deprecating the static threads environment?

    At least the Berkeley UPC runtime already uses (internally) a variation of the mechanism Gary described. We don't have a way (at least not in a portable source-to-source implementation) to get the back-end linker to do what we want. So, our translation changes all shared array references to pointers and dynamically allocates (before main) any shared variables (including shared scalars) which were statically allocated in the user's UPC code.

    Don't ask about statically declared shared arrays with initializers! ```

    Reported by `phhargrove@lbl.gov` on 2012-05-31 19:17:03

  6. Former user Account Deleted

    ``` Given that there is some general support for this proposed change, and that it is out-of-scope for version 1.3 of the specification, will designate this as a 1.4 issue. ```

    Reported by `gary.funck` on 2012-07-02 15:50:36 - Labels added: Milestone-Spec-1.4

  7. Former user Account Deleted

    ``` Tagged as "Enhancement".

    ```

    Reported by `gary.funck` on 2012-07-02 16:08:39 - Labels added: Type-Enhancement - Labels removed: Type-Defect

  8. Former user Account Deleted
    "Static THREADS is the only means in the standard language for declaring an array with
    multiple THREADS dimensions, or otherwise with a size that can only be determined at
    run time (e.g. THREADS*THREADS)."
    
    This is false.  There is clearly a way to declare a _dynamically allocated_ array with
    multiple THREADS dimensions.  C programmers have figured out how to use dynamic memory
    allocation to implement multidimensional arrays since the dawn of time (C's time, that
    is), so why can't UPC programmers do the same?
    
    Can someone give an example of something that can only be done with the static model?
     I mean one that is impossible to do with dynamic allocation.
    
    In any case, as Paul says, the compiler can (easily?) implement the static cause via
    the dynamic case, so there is really no point to having this feature in the language
    spec.  Do any UPC compilers make use of the static THREADS model to optimize multidimensional
    array expressions beyond what they can do for the dynamic THREADS model?
    
    I consider the static model to be profoundly useless since it would require me to compile
    my applications N times, where N is the maximum available number of UPC threads on
    a particular system.  On Hopper, I would have to compile thousands of binaries just
    to handle the case of fixed threads-per-node.  Why should UPC encourage this?
    

    Reported by jeff.science on 2012-10-14 15:25:30

  9. Former user Account Deleted
    I'm basically neutral on the issue of whether or not UPC should include a static THREADS
    translation environment, but I can answer some of the questions raised in comment 8:
    
    "Can someone give an example of something that can only be done with the static model?
     I mean one that is impossible to do with dynamic allocation."
    
    The static model allows additional shared array and pointer-to-shared declarations,
    where the compiler does more of the array indexing calculations "under the covers".
    All of it can be emulated under dynamic threads using dynamic allocation and manual
    indexing arithmetic, although it's potentially slower and definitely more error-prone.
    Jeff's argument is basically the same one that says C should not have multi-dimensional
    arrays at all, because everything can be (and ultimately is) implemented using linear
    memory and pointer arithmetic. The language feature exists to make certain code easier
    and less error-prone to write, not because it's impossible to live without it. UPC
    adds the additional wrinkle of array blocking, which complicates pointer arithmetic
    considerably - static threads are especially handy with non-trivial blocking factors,
    where the manual arithmetic can get rather "hairy". Consider this code:
    
    shared [THREADS*16] struct {
      char a;
      int b[THREADS];
      char c;
      double d[THREADS];
    } myarray[THREADS*12][THREADS*3700];
    
    myarray[i][j].d[k] = 42;
    
    This example is admittedly contrived, but the point is that despite the complexity
    it remains concise, and leaves all the "heavy lifting" arithmetic to the compiler.
    Emulating an equivalent allocation and assignment under dynamic threads would be considerably
    uglier, and even a veteran UPC programmer is likely to get it wrong on the first try.
    
    "Do any UPC compilers make use of the static THREADS model to optimize multidimensional
    array expressions beyond what they can do for the dynamic THREADS model?"
    
    Absolutely. Under static THREADS, the expression THREADS is a compile-time constant,
    and is therefore subject to the usual constant-folding optimizations that every optimizer
    performs. So for example, if you have code like:
    
    shared [] int array[THREADS][THREADS][THREADS];
    
    array[4][6][7] = 42;
    
    even a very simple compiler can generate code that looks something like (assuming THREADS==16):
    
    *(parray + 1127) = 42;
    
    and since array is statically declared, with a smart enough linker this could be assembled
    down to a direct address store:
    
    *(0x0ff00467) = 42;
    
    which is about as simple as it can get. Of course you don't usually deal with fully
    constant array indexes, but it illustrates the possibilities.
    
    Under dynamic threads, the declaration is disallowed, but assuming the user wrote something
    basically equivalent like:
    
    shared [] int *parr = upc_alloc(THREADS*THREADS*THREADS*sizeof(int));
    
    *(parr + 4 * THREADS * THREADS + 6 * THREADS + 7) = 42;
    
    Here THREADS is a runtime quantity, so basic compile-time constant folding optimizations
    do not apply. Assuming the optimizer is smart enough to realize THREADS is invariant
    (probably most UPC compilers), it could perform redundant subexpression elimination
    and possibly hoist some of those multiplies out of an enclosing loop; but since the
    value of THREADS is not known until runtime, the components of that calculation all
    still have to be evaluated during program execution.
    
    For simplicity this example assumes an indefinite block size, but once you introduce
    a constant blocking factor, that adds another source of compile-time constants which
    are amenable to folding with a statically-known THREADS value. Of course none of this
    answers the question of actual runtime performance impact, but it demonstrates the
    optimization possibilities.
    
    "I consider the static model to be profoundly useless since it would require me to
    compile my applications N times, where N is the maximum available number of UPC threads
    on a particular system.  On Hopper, I would have to compile thousands of binaries just
    to handle the case of fixed threads-per-node.  Why should UPC encourage this?"
    
    Bill is probably the right person to answer this. However my 2c: I believe the static
    THREADS model is not designed for "computer scientists" who like to run scaling experiments,
    ie repeated runs with varying thread count and data size, where they care about measured
    performance and not the numerical answer. It's designed for "domain scientists", who
    have finished debugging their UPC program and have a particular numerical problem to
    solve and a particular machine allocation in which to do it - so they compile their
    program for that machine size and launch their "big run".
    

    Reported by danbonachea on 2012-10-16 23:02:32

  10. Former user Account Deleted
    As Brian alludes to in comment 2, we should look at adding library functionality for
    performing pointer arithmetic with parametric blocking factor (issue 93) before considering
    removing static THREADS. This would insulate users from much of the pain of manual
    blocked pointer arithmetic. We might even consider extending it to perform multi-dimensional
    offset calculation.
    

    Reported by danbonachea on 2012-10-16 23:42:06 - Blocked on: #93

  11. Former user Account Deleted
    In response to Dan's well-written comment #9 I wanted to take a moment to repeat something
    I said in comment #1:
    
    > The alleged "optimization opportunities" (such as loop unrolling) that comes from
    > a compile-time constant number of threads CAN REMAIN.  Nothing in the changes I
    > envision being made to the UPC spec would prohibit the compiler from taking the
    > number of threads as a compiler option.
    
    In other words, I see removal of the "corners" of the type system which are only legal
    in a static-threads environment.  I see no reason why the UPC spec should "outlaw"
    a compiler mode in which the user supplies the value of THREADS on the command line
    and the compiler gets to do all the wonderful constant folding, loop unrolling, etc..
      However, such a mode would no longer be a REQUIREMENT for compliance.  The change
    I envision would reduce the set of legal type declarations in such a mode to match
    that of the static-threads environment.
    

    Reported by phhargrove@lbl.gov on 2012-10-16 23:53:25

  12. Former user Account Deleted
    OOPS!!
    In the previous comment the final sentence says "static-threads" where I actually meant
    "dynamic-threads".
    

    Reported by phhargrove@lbl.gov on 2012-10-16 23:55:21

  13. Former user Account Deleted
    I find this discussion interesting in light of the fact that at least a few users have
    asked me the opposite question, suggesting getting rid of the dynamic THREADS environment.
    

    Reported by brian.wibecan on 2012-10-17 00:12:59

  14. Former user Account Deleted
    Regarding Dan's comments that pointer arithmetic is hard, yes I agree, but this is why
    we have libraries.  If one wants distributed sparse matrices, there is e.g. PETSc.
     For distributed dense matrices in C, one can use Global Arrays.  Writing DGEMM is
    also hard, but thankfully we've had BLAS for 40 years and vendors who kindly implement
    it using assembly so that it runs crazy fast compared to even intelligent loop code
    written by domain scientists.
    
    If pointer arithmetic in UPC is hard for usage X, then someone needs to step up and
    write a library for X.  In fact, I plan to do exactly this for the case where X=tensors,
    which make 2D arrays look like a walk in the park.  Anyways...
    

    Reported by jeff.science on 2012-10-17 00:21:20

  15. Former user Account Deleted
    OKAY, Brian, I'll bite:
    
    What is the reason/motivation these users give for removing support for dynamic threads?
     Since the type system of dynamic is a proper subset of static, there is nothing much
    to "remove" from the language other than the implementation burden (which I doubt is
    the users' motivation).
    

    Reported by phhargrove@lbl.gov on 2012-10-17 00:25:36

  16. Former user Account Deleted
    "Regarding Dan's comments that pointer arithmetic is hard, yes I agree, but this is
    why we have libraries"
    
    So to restate, you're advocating that we remove a standardized language feature which
    is already available in every compiler that hides this complexity and enables optimizations,
    with the plan to  "put back" this syntactic convenience using a library? This seems
    backwards to me. 
    
    I agree that good libraries are a crucial tool for application writers, but they aren't
    a replacement for good language syntax. Well-tuned libraries are a great way to factor
    code and improve performance, but they don't make for clearer or more concise code,
    especially in a C-based language which lacks objects and operator overloading. And
    they don't help at all for cases that fall outside the usage pattern envisioned by
    the library writer. Now I'm not arguing that static THREADS + C's poor support for
    multi-dimensional arrays constitutes "good" language support, but it's at least a step
    in the right direction.
    

    Reported by danbonachea on 2012-10-17 00:45:03

  17. Former user Account Deleted
    Paul said:
    "I see no reason why the UPC spec should "outlaw" a compiler mode in which the user
    supplies the value of THREADS on the command line and the compiler gets to do all the
    wonderful constant folding, loop unrolling, etc.. "
    
    This idea is a valuable one and could be used to "get back" any optimization/performance
    loss from removing static threads - but it does nothing to address the usability issues.
    The user would still have to write his own error-prone address arithmetic (for cases
    not captured by a library), it would just run faster once he got it right.
    

    Reported by danbonachea on 2012-10-17 00:50:01

  18. Former user Account Deleted
    "they don't make for clearer or more concise code" 
    
    I meant to add "relative to built-in language support"
    

    Reported by danbonachea on 2012-10-17 00:55:31

  19. Former user Account Deleted
    Dan wrote, in part:
    > This idea is a valuable one and could be used to "get back" any
    > optimization/performance loss from removing static threads - but it does nothing
    > to address the usability issues. The user would still have to write his own
    > error-prone address arithmetic (for cases not captured by a library), it would
    > just run faster once he got it right.
    
    I only wanted to take issue with the performance issue.
    I am not disagreeing with the usability point(s) raised in comment #9 or elsewhere.
    

    Reported by phhargrove@lbl.gov on 2012-10-17 00:58:37

  20. Former user Account Deleted
    Regarding comment 16, static threads hides the complexity of distributed multidimensional
    arrays in the same way that FORTRAN77 hides the complexity of memory allocation by
    forcing the user to hard code that.  I don't believe that this constitutes a useful
    language feature.
    
    My argument is that the static model baits programmers into creating inflexible code
    because they are penalized for writing flexible code.  I don't necessary want to deprecate
    the static model as much as I want the dynamic model to support the same things, as
    are clearly possible according to Paul's statements that BUPC implements the static-only
    features in terms of dynamic ones.
    
    Can the other implementors state that what Paul says in comment 5 cannot be implemented
    in other implementations?  If there is no major barrier to this, why not _add_ to the
    specification support for the dynamic model to do cool things like A[THREADS][THREADS]
    without restriction?
    

    Reported by jeff.science on 2012-10-17 01:10:41

  21. Former user Account Deleted
    Paul asked: "What is the reason/motivation these users give for removing support for
    dynamic threads?"
    
    As I recall, they liked to use declarations like:
    
      shared [THREADS] long correspondence[THREADS][THREADS];
    
    and others that were rejected by the compiler in dynamic THREADS mode, so they never
    used dynamic THREADS mode, and thought it was a generally useless mode that should
    go away to avoid confusion.
    

    Reported by brian.wibecan on 2012-10-17 01:14:29

  22. Former user Account Deleted
    Regarding comment 21, the constructive response would have been to ask that the dynamic
    model support that usage.  I still have not heard any compelling reason why implementations
    cannot provide that.
    

    Reported by jeff.science on 2012-10-17 01:16:20

  23. Former user Account Deleted
    In comment #20 Jeff wrote, in part:
    > If there is no major barrier to this, why not _add_ to the specification support
    > for the dynamic model to do cool things like A[THREADS][THREADS] without
    > restriction?
    
    Jeff,
    
    If you are serious (and I think you are) about championing this idea, then it may be
    best for you to open a NEW tracker issue for your requested "enhancement" to the dynamic
    threads model.  I'd suggest a 2.0 milestone.
    

    Reported by phhargrove@lbl.gov on 2012-10-17 01:21:59

  24. Former user Account Deleted
    On the topic of users suggesting removal of *dynamic* threads, Brian wrote:
    > ... so they never used dynamic THREADS mode, and thought it was a generally useless
    > mode that should go away to avoid confusion.
    
    I am surprised that these users couldn't see the benefit of allowing THREADS to be
    unknown at compile time.  Did they also propose removal of malloc() from C because
    the need to call free() is error-prone?  I can accept the reality that they *do* feel
    as they do, but (baring a traumatic brain injury) I won't be convinced to agree with
    them.
    

    Reported by phhargrove@lbl.gov on 2012-10-17 01:27:08

  25. Former user Account Deleted
    "Regarding comment 21, the constructive response would have been to ask that the dynamic
    model support that usage.  I still have not heard any compelling reason why implementations
    cannot provide that."
    
    I believe there are two basic issues with relaxing the restrictions on shared types
    under dynamic threads.
    
    1. The first issue deals with implementing the allocation of static shared data. UPC
    was designed to allow an implementation where the compiler can, at compile time, determine
    and allocate the shared space required on each thread for all statically-allocated
    objects. The dynamic threads restriction (described formally in issue 94, comment 11)
    ensures this calculation is always possible for shared arrays, so that statically-allocated
    shared arrays can be placed in the .bss linker section, just like every other statically-allocated
    object in C99.
    
    That being said, nothing REQUIRES this implementation approach, and as Paul and others
    pointed out several compilers allocate the "static" shared data dynamically at startup
    anyhow - so those compilers could easily relax the allocation-related part of the restriction.
    I don't know how many of the major compilers fall into this category and how many rely
    upon the guarantee mentioned above, but changing an existing implementation from one
    strategy to the other probably represents a significant undertaking. That alone probably
    delays this feature enhancement to 2.0 or later.
    
    2. The second, stickier issue is type-checking. Allowing the THREADS expression to
    appear in arbitrary places in array declarations under dynamic threads would mean allowing
    all those expressions to have a value which is not a compile-time constant. Various
    parts of type-checking for arrays in *C89* relied upon compile-time constant array
    dimensions, and at the time UPC was first specified many C implementations still shared
    that restriction. C99 relaxes this restriction somewhat with the variable-length array
    feature, whose implementation has now become more widespread and might be used to alleviate
    some of this issue. However even in C99 it is not legal to declare a statically-allocated
    array with a variable length (because of point 1 above), so it would need to be extended
    somewhat to handle UPC shared arrays which are always declared with a static lifetime
    (as opposed to stack variables, which are the target of C99's VLA). One would also
    have to decide "how variable" we allow the dimension expressions to become - ie just
    otherwise-constant expressions that include THREADS? What about expressions like ceil(sqrt(THREADS))?
    What about arbitrary user-provided function calls?
    
    In any case, the problem also affects blocksizes, eg consider this declaration:
    
    shared [THREADS] long a[1024];
    shared [16] long *p = &a;
    
    Is the second line valid? The type compatibility depends on whether THREADS==16, which
    is not known at compile time for dynamic threads (and therefore cannot be typechecked
    statically, which is fundamental to the C philosophy). In the past we've occasionally
    tossed around the possibility of adding language support for variable blocksizes and
    it's never really taken off, however we would probably need a facility like that to
    support dynamic THREADS in a blocksize expression (without imposing a bunch of artificial
    limitations).
    
    A minor (perhaps unimportant?) side effect of adding such a feature is it would make
    it easy to write programs that failed with memory exhaustion BEFORE REACHING main().
    For example a declaration like this:
    
    shared int x[THREADS*THREADS*THREADS*THREADS*THREADS];
    
    would work fine when run with small thread counts, but at larger thread counts would
    quickly lead to a spawn-time error that cannot be diagnosed at compile or link time.
    This should not be surprising (when written this obviously), but it is novel - in the
    current UPC/C99 language the linker can reject erroneous attempts to create ludicrously-sized
    static data. Users are accustomed to the possibility that upc_alloc() or malloc() might
    fail due to memory exhaustion, but this failure would happen at startup before reaching
    any user code (which might make it more difficult to diagnose, depending on implementation
    support).
    

    Reported by danbonachea on 2012-10-17 02:07:17

  26. Former user Account Deleted
    Paul wrote: "I am surprised that these users couldn't see the benefit of allowing THREADS
    to be unknown at compile time."
    
    I'm not inclined to agree with them, either, but their concerns echo some of those
    expressed in this discussion: having two sets of rules is confusing.  They found static
    THREADS to allow them to express declarations more easily, and so sought to end the
    confusion by removing the mode they didn't use.  They also had a lot of jobs that compiled
    and ran the program in one fell swoop, rather than re-using the compiled program many
    times.  For their work model, I can see why they might be inclined to prefer static
    THREADS.
    
    By the way, one other thing that is easier in static THREADS is a private array dimensioned
    to THREADS.
    
    To be clear, I do NOT advocate dropping support of either mode at this time.
    

    Reported by brian.wibecan on 2012-10-17 02:08:52

  27. Former user Account Deleted
    I have taken Paul's advice and created http://code.google.com/p/upc-specification/issues/detail?id=99.
    

    Reported by jeff.science on 2012-10-17 02:25:29

  28. Former user Account Deleted
    We appear to have low consensus regarding the original issue of whether to deprecate
    support for static threads compilation.
    

    Reported by danbonachea on 2012-10-17 02:28:57 - Labels added: Consensus-Low

  29. Former user Account Deleted
    I'd be fine with UPC moving to a single compilation model in which THREADS is dynamic.
     Implementations would continue to be free to offer a compile-time option to fix THREADS
    to a certain value for better optimization, but doing so would not change any UPC rules
    that the program needs to follow.
    

    Reported by johnson.troy.a on 2012-10-17 15:47:29

  30. Former user Account Deleted
    Regarding comment 29: this position seems to contradict what johnson....@gmail.com (sorry
    for not knowing who you are) says regarding issue 99.  Am I not understanding what
    you are saying or was my proposal in issue 99 too broad?  Sorry for being dense.
    

    Reported by jeff.science on 2012-10-17 15:59:43

  31. Former user Account Deleted
    > Regarding comment 29: this position seems to contradict what
    
    I'm a Cray compiler implementer responsible for UPC, but code.google didn't seem to
    work well with my other email address.
    
    It's not a contradiction.  I'm saying I support deprecating the static compilation
    model.  The dynamic compilation model would remain, including all declarations that
    are currently legal within it.  What would go away would be the declarations that currently
    are legal only in the static compilation model as well as any need to distinguish between
    compilation models in the spec.
    

    Reported by johnson.troy.a on 2012-10-17 17:10:38

  32. Log in to comment