Deprecate support for static THREADS compilation
Originally reported on Google Code with ID 30
This proposal is responsive to issue 46.
46. Given that any program that compiles under the dynamic compilation environment
will also compile in the static environment, might it help to simplify the language
and compilation environment by removing support for the static threads compilation
environment?
UPC programmers have some difficulty in determiing when the use of the static threads
compilation environment is required. Further attempts to compile programs using the
dynamic threads compilation environment when those programs use constructs that can
be compiled only under a static threads environment can lead to situations where the
compiler emits some seemingly obscure error messages referring to "variably modified"
types.
While these error messages can be improved, it isn't clear that the gains in expressivity
warrant the additional complexity of describing and supporting the static threads compilation
model. In addition, since the dynamic threads compilation model naturally supports
an arbitrary level of parallel thread execution at runtime, it promotes the development
of more flexible and scalable UPC applications.
Reported by gary.funck
on 2012-05-22 00:05:02
Comments (31)
-
Account Deleted -
Account Deleted ``` Static THREADS is the only means in the standard language for declaring an array with multiple THREADS dimensions, or otherwise with a size that can only be determined at run time (e.g. THREADS*THREADS). Similarly, it is the only way to declare an array with a block size of THREADS. If the language had some expected ability to handle these cases in the dynamic THREADS environment, I expect getting rid of the static THREADS environment would be more palatable. ```
Reported by `brian.wibecan` on 2012-05-22 21:14:12
-
Account Deleted ``` It isn't too difficult to contemplate an inmplementation where shared array declarations are re-written into a pointer-to-shared variable (either literally or conceptually) that is initialized via a call to upc_all_alloc() with the appropriate parameters, prior to "main" being called. With this approach, or something similar, most restrictions currently imposed by the dynamic threads compilation environment are eliminated.
The trade-off is that some folding of array index calculations is not possible, but as Paul mentioned, a compiler can still implement static threads as an extension.
```
Reported by `gary.funck` on 2012-05-22 21:27:39
-
Account Deleted ``` Static threads has always seemed like a hack to me and I wouldn't mind seeing it go. However, the issue Brian raised is really important -- this is the only way currently to use arrays whose size isn't known at compile time.
Gary, I like the idea of an implementation that dynamically allocates the given array. Is this something that would be possible to prototype to increase our confidence before deprecating the static threads environment? ```
Reported by `james.dinan` on 2012-05-31 19:08:51
-
Account Deleted ``` Jim wrote:
Gary, I like the idea of an implementation that dynamically allocates the given array. Is this something that would be possible to prototype to increase our confidence before deprecating the static threads environment?
At least the Berkeley UPC runtime already uses (internally) a variation of the mechanism Gary described. We don't have a way (at least not in a portable source-to-source implementation) to get the back-end linker to do what we want. So, our translation changes all shared array references to pointers and dynamically allocates (before main) any shared variables (including shared scalars) which were statically allocated in the user's UPC code.
Don't ask about statically declared shared arrays with initializers! ```
Reported by `phhargrove@lbl.gov` on 2012-05-31 19:17:03
-
Account Deleted ``` Given that there is some general support for this proposed change, and that it is out-of-scope for version 1.3 of the specification, will designate this as a 1.4 issue. ```
Reported by `gary.funck` on 2012-07-02 15:50:36 - Labels added: Milestone-Spec-1.4
-
Account Deleted ``` Tagged as "Enhancement".
```
Reported by `gary.funck` on 2012-07-02 16:08:39 - Labels added: Type-Enhancement - Labels removed: Type-Defect
-
Account Deleted "Static THREADS is the only means in the standard language for declaring an array with multiple THREADS dimensions, or otherwise with a size that can only be determined at run time (e.g. THREADS*THREADS)." This is false. There is clearly a way to declare a _dynamically allocated_ array with multiple THREADS dimensions. C programmers have figured out how to use dynamic memory allocation to implement multidimensional arrays since the dawn of time (C's time, that is), so why can't UPC programmers do the same? Can someone give an example of something that can only be done with the static model? I mean one that is impossible to do with dynamic allocation. In any case, as Paul says, the compiler can (easily?) implement the static cause via the dynamic case, so there is really no point to having this feature in the language spec. Do any UPC compilers make use of the static THREADS model to optimize multidimensional array expressions beyond what they can do for the dynamic THREADS model? I consider the static model to be profoundly useless since it would require me to compile my applications N times, where N is the maximum available number of UPC threads on a particular system. On Hopper, I would have to compile thousands of binaries just to handle the case of fixed threads-per-node. Why should UPC encourage this?
Reported by
jeff.science
on 2012-10-14 15:25:30 -
Account Deleted I'm basically neutral on the issue of whether or not UPC should include a static THREADS translation environment, but I can answer some of the questions raised in comment 8: "Can someone give an example of something that can only be done with the static model? I mean one that is impossible to do with dynamic allocation." The static model allows additional shared array and pointer-to-shared declarations, where the compiler does more of the array indexing calculations "under the covers". All of it can be emulated under dynamic threads using dynamic allocation and manual indexing arithmetic, although it's potentially slower and definitely more error-prone. Jeff's argument is basically the same one that says C should not have multi-dimensional arrays at all, because everything can be (and ultimately is) implemented using linear memory and pointer arithmetic. The language feature exists to make certain code easier and less error-prone to write, not because it's impossible to live without it. UPC adds the additional wrinkle of array blocking, which complicates pointer arithmetic considerably - static threads are especially handy with non-trivial blocking factors, where the manual arithmetic can get rather "hairy". Consider this code: shared [THREADS*16] struct { char a; int b[THREADS]; char c; double d[THREADS]; } myarray[THREADS*12][THREADS*3700]; myarray[i][j].d[k] = 42; This example is admittedly contrived, but the point is that despite the complexity it remains concise, and leaves all the "heavy lifting" arithmetic to the compiler. Emulating an equivalent allocation and assignment under dynamic threads would be considerably uglier, and even a veteran UPC programmer is likely to get it wrong on the first try. "Do any UPC compilers make use of the static THREADS model to optimize multidimensional array expressions beyond what they can do for the dynamic THREADS model?" Absolutely. Under static THREADS, the expression THREADS is a compile-time constant, and is therefore subject to the usual constant-folding optimizations that every optimizer performs. So for example, if you have code like: shared [] int array[THREADS][THREADS][THREADS]; array[4][6][7] = 42; even a very simple compiler can generate code that looks something like (assuming THREADS==16): *(parray + 1127) = 42; and since array is statically declared, with a smart enough linker this could be assembled down to a direct address store: *(0x0ff00467) = 42; which is about as simple as it can get. Of course you don't usually deal with fully constant array indexes, but it illustrates the possibilities. Under dynamic threads, the declaration is disallowed, but assuming the user wrote something basically equivalent like: shared [] int *parr = upc_alloc(THREADS*THREADS*THREADS*sizeof(int)); *(parr + 4 * THREADS * THREADS + 6 * THREADS + 7) = 42; Here THREADS is a runtime quantity, so basic compile-time constant folding optimizations do not apply. Assuming the optimizer is smart enough to realize THREADS is invariant (probably most UPC compilers), it could perform redundant subexpression elimination and possibly hoist some of those multiplies out of an enclosing loop; but since the value of THREADS is not known until runtime, the components of that calculation all still have to be evaluated during program execution. For simplicity this example assumes an indefinite block size, but once you introduce a constant blocking factor, that adds another source of compile-time constants which are amenable to folding with a statically-known THREADS value. Of course none of this answers the question of actual runtime performance impact, but it demonstrates the optimization possibilities. "I consider the static model to be profoundly useless since it would require me to compile my applications N times, where N is the maximum available number of UPC threads on a particular system. On Hopper, I would have to compile thousands of binaries just to handle the case of fixed threads-per-node. Why should UPC encourage this?" Bill is probably the right person to answer this. However my 2c: I believe the static THREADS model is not designed for "computer scientists" who like to run scaling experiments, ie repeated runs with varying thread count and data size, where they care about measured performance and not the numerical answer. It's designed for "domain scientists", who have finished debugging their UPC program and have a particular numerical problem to solve and a particular machine allocation in which to do it - so they compile their program for that machine size and launch their "big run".
Reported by
danbonachea
on 2012-10-16 23:02:32 -
Account Deleted As Brian alludes to in comment 2, we should look at adding library functionality for performing pointer arithmetic with parametric blocking factor (issue 93) before considering removing static THREADS. This would insulate users from much of the pain of manual blocked pointer arithmetic. We might even consider extending it to perform multi-dimensional offset calculation.
Reported by
danbonachea
on 2012-10-16 23:42:06 - Blocked on: #93 -
Account Deleted In response to Dan's well-written comment #9 I wanted to take a moment to repeat something I said in comment #1: > The alleged "optimization opportunities" (such as loop unrolling) that comes from > a compile-time constant number of threads CAN REMAIN. Nothing in the changes I > envision being made to the UPC spec would prohibit the compiler from taking the > number of threads as a compiler option. In other words, I see removal of the "corners" of the type system which are only legal in a static-threads environment. I see no reason why the UPC spec should "outlaw" a compiler mode in which the user supplies the value of THREADS on the command line and the compiler gets to do all the wonderful constant folding, loop unrolling, etc.. However, such a mode would no longer be a REQUIREMENT for compliance. The change I envision would reduce the set of legal type declarations in such a mode to match that of the static-threads environment.
Reported by
phhargrove@lbl.gov
on 2012-10-16 23:53:25 -
Account Deleted OOPS!! In the previous comment the final sentence says "static-threads" where I actually meant "dynamic-threads".
Reported by
phhargrove@lbl.gov
on 2012-10-16 23:55:21 -
Account Deleted I find this discussion interesting in light of the fact that at least a few users have asked me the opposite question, suggesting getting rid of the dynamic THREADS environment.
Reported by
brian.wibecan
on 2012-10-17 00:12:59 -
Account Deleted Regarding Dan's comments that pointer arithmetic is hard, yes I agree, but this is why we have libraries. If one wants distributed sparse matrices, there is e.g. PETSc. For distributed dense matrices in C, one can use Global Arrays. Writing DGEMM is also hard, but thankfully we've had BLAS for 40 years and vendors who kindly implement it using assembly so that it runs crazy fast compared to even intelligent loop code written by domain scientists. If pointer arithmetic in UPC is hard for usage X, then someone needs to step up and write a library for X. In fact, I plan to do exactly this for the case where X=tensors, which make 2D arrays look like a walk in the park. Anyways...
Reported by
jeff.science
on 2012-10-17 00:21:20 -
Account Deleted OKAY, Brian, I'll bite: What is the reason/motivation these users give for removing support for dynamic threads? Since the type system of dynamic is a proper subset of static, there is nothing much to "remove" from the language other than the implementation burden (which I doubt is the users' motivation).
Reported by
phhargrove@lbl.gov
on 2012-10-17 00:25:36 -
Account Deleted "Regarding Dan's comments that pointer arithmetic is hard, yes I agree, but this is why we have libraries" So to restate, you're advocating that we remove a standardized language feature which is already available in every compiler that hides this complexity and enables optimizations, with the plan to "put back" this syntactic convenience using a library? This seems backwards to me. I agree that good libraries are a crucial tool for application writers, but they aren't a replacement for good language syntax. Well-tuned libraries are a great way to factor code and improve performance, but they don't make for clearer or more concise code, especially in a C-based language which lacks objects and operator overloading. And they don't help at all for cases that fall outside the usage pattern envisioned by the library writer. Now I'm not arguing that static THREADS + C's poor support for multi-dimensional arrays constitutes "good" language support, but it's at least a step in the right direction.
Reported by
danbonachea
on 2012-10-17 00:45:03 -
Account Deleted Paul said: "I see no reason why the UPC spec should "outlaw" a compiler mode in which the user supplies the value of THREADS on the command line and the compiler gets to do all the wonderful constant folding, loop unrolling, etc.. " This idea is a valuable one and could be used to "get back" any optimization/performance loss from removing static threads - but it does nothing to address the usability issues. The user would still have to write his own error-prone address arithmetic (for cases not captured by a library), it would just run faster once he got it right.
Reported by
danbonachea
on 2012-10-17 00:50:01 -
Account Deleted "they don't make for clearer or more concise code" I meant to add "relative to built-in language support"
Reported by
danbonachea
on 2012-10-17 00:55:31 -
Account Deleted Dan wrote, in part: > This idea is a valuable one and could be used to "get back" any > optimization/performance loss from removing static threads - but it does nothing > to address the usability issues. The user would still have to write his own > error-prone address arithmetic (for cases not captured by a library), it would > just run faster once he got it right. I only wanted to take issue with the performance issue. I am not disagreeing with the usability point(s) raised in comment #9 or elsewhere.
Reported by
phhargrove@lbl.gov
on 2012-10-17 00:58:37 -
Account Deleted Regarding comment 16, static threads hides the complexity of distributed multidimensional arrays in the same way that FORTRAN77 hides the complexity of memory allocation by forcing the user to hard code that. I don't believe that this constitutes a useful language feature. My argument is that the static model baits programmers into creating inflexible code because they are penalized for writing flexible code. I don't necessary want to deprecate the static model as much as I want the dynamic model to support the same things, as are clearly possible according to Paul's statements that BUPC implements the static-only features in terms of dynamic ones. Can the other implementors state that what Paul says in comment 5 cannot be implemented in other implementations? If there is no major barrier to this, why not _add_ to the specification support for the dynamic model to do cool things like A[THREADS][THREADS] without restriction?
Reported by
jeff.science
on 2012-10-17 01:10:41 -
Account Deleted Paul asked: "What is the reason/motivation these users give for removing support for dynamic threads?" As I recall, they liked to use declarations like: shared [THREADS] long correspondence[THREADS][THREADS]; and others that were rejected by the compiler in dynamic THREADS mode, so they never used dynamic THREADS mode, and thought it was a generally useless mode that should go away to avoid confusion.
Reported by
brian.wibecan
on 2012-10-17 01:14:29 -
Account Deleted Regarding comment 21, the constructive response would have been to ask that the dynamic model support that usage. I still have not heard any compelling reason why implementations cannot provide that.
Reported by
jeff.science
on 2012-10-17 01:16:20 -
Account Deleted In comment #20 Jeff wrote, in part: > If there is no major barrier to this, why not _add_ to the specification support > for the dynamic model to do cool things like A[THREADS][THREADS] without > restriction? Jeff, If you are serious (and I think you are) about championing this idea, then it may be best for you to open a NEW tracker issue for your requested "enhancement" to the dynamic threads model. I'd suggest a 2.0 milestone.
Reported by
phhargrove@lbl.gov
on 2012-10-17 01:21:59 -
Account Deleted On the topic of users suggesting removal of *dynamic* threads, Brian wrote: > ... so they never used dynamic THREADS mode, and thought it was a generally useless > mode that should go away to avoid confusion. I am surprised that these users couldn't see the benefit of allowing THREADS to be unknown at compile time. Did they also propose removal of malloc() from C because the need to call free() is error-prone? I can accept the reality that they *do* feel as they do, but (baring a traumatic brain injury) I won't be convinced to agree with them.
Reported by
phhargrove@lbl.gov
on 2012-10-17 01:27:08 -
Account Deleted "Regarding comment 21, the constructive response would have been to ask that the dynamic model support that usage. I still have not heard any compelling reason why implementations cannot provide that." I believe there are two basic issues with relaxing the restrictions on shared types under dynamic threads. 1. The first issue deals with implementing the allocation of static shared data. UPC was designed to allow an implementation where the compiler can, at compile time, determine and allocate the shared space required on each thread for all statically-allocated objects. The dynamic threads restriction (described formally in issue 94, comment 11) ensures this calculation is always possible for shared arrays, so that statically-allocated shared arrays can be placed in the .bss linker section, just like every other statically-allocated object in C99. That being said, nothing REQUIRES this implementation approach, and as Paul and others pointed out several compilers allocate the "static" shared data dynamically at startup anyhow - so those compilers could easily relax the allocation-related part of the restriction. I don't know how many of the major compilers fall into this category and how many rely upon the guarantee mentioned above, but changing an existing implementation from one strategy to the other probably represents a significant undertaking. That alone probably delays this feature enhancement to 2.0 or later. 2. The second, stickier issue is type-checking. Allowing the THREADS expression to appear in arbitrary places in array declarations under dynamic threads would mean allowing all those expressions to have a value which is not a compile-time constant. Various parts of type-checking for arrays in *C89* relied upon compile-time constant array dimensions, and at the time UPC was first specified many C implementations still shared that restriction. C99 relaxes this restriction somewhat with the variable-length array feature, whose implementation has now become more widespread and might be used to alleviate some of this issue. However even in C99 it is not legal to declare a statically-allocated array with a variable length (because of point 1 above), so it would need to be extended somewhat to handle UPC shared arrays which are always declared with a static lifetime (as opposed to stack variables, which are the target of C99's VLA). One would also have to decide "how variable" we allow the dimension expressions to become - ie just otherwise-constant expressions that include THREADS? What about expressions like ceil(sqrt(THREADS))? What about arbitrary user-provided function calls? In any case, the problem also affects blocksizes, eg consider this declaration: shared [THREADS] long a[1024]; shared [16] long *p = &a; Is the second line valid? The type compatibility depends on whether THREADS==16, which is not known at compile time for dynamic threads (and therefore cannot be typechecked statically, which is fundamental to the C philosophy). In the past we've occasionally tossed around the possibility of adding language support for variable blocksizes and it's never really taken off, however we would probably need a facility like that to support dynamic THREADS in a blocksize expression (without imposing a bunch of artificial limitations). A minor (perhaps unimportant?) side effect of adding such a feature is it would make it easy to write programs that failed with memory exhaustion BEFORE REACHING main(). For example a declaration like this: shared int x[THREADS*THREADS*THREADS*THREADS*THREADS]; would work fine when run with small thread counts, but at larger thread counts would quickly lead to a spawn-time error that cannot be diagnosed at compile or link time. This should not be surprising (when written this obviously), but it is novel - in the current UPC/C99 language the linker can reject erroneous attempts to create ludicrously-sized static data. Users are accustomed to the possibility that upc_alloc() or malloc() might fail due to memory exhaustion, but this failure would happen at startup before reaching any user code (which might make it more difficult to diagnose, depending on implementation support).
Reported by
danbonachea
on 2012-10-17 02:07:17 -
Account Deleted Paul wrote: "I am surprised that these users couldn't see the benefit of allowing THREADS to be unknown at compile time." I'm not inclined to agree with them, either, but their concerns echo some of those expressed in this discussion: having two sets of rules is confusing. They found static THREADS to allow them to express declarations more easily, and so sought to end the confusion by removing the mode they didn't use. They also had a lot of jobs that compiled and ran the program in one fell swoop, rather than re-using the compiled program many times. For their work model, I can see why they might be inclined to prefer static THREADS. By the way, one other thing that is easier in static THREADS is a private array dimensioned to THREADS. To be clear, I do NOT advocate dropping support of either mode at this time.
Reported by
brian.wibecan
on 2012-10-17 02:08:52 -
Account Deleted I have taken Paul's advice and created http://code.google.com/p/upc-specification/issues/detail?id=99.
Reported by
jeff.science
on 2012-10-17 02:25:29 -
Account Deleted We appear to have low consensus regarding the original issue of whether to deprecate support for static threads compilation.
Reported by
danbonachea
on 2012-10-17 02:28:57 - Labels added: Consensus-Low -
Account Deleted I'd be fine with UPC moving to a single compilation model in which THREADS is dynamic. Implementations would continue to be free to offer a compile-time option to fix THREADS to a certain value for better optimization, but doing so would not change any UPC rules that the program needs to follow.
Reported by
johnson.troy.a
on 2012-10-17 15:47:29 -
Account Deleted Regarding comment 29: this position seems to contradict what johnson....@gmail.com (sorry for not knowing who you are) says regarding issue 99. Am I not understanding what you are saying or was my proposal in issue 99 too broad? Sorry for being dense.
Reported by
jeff.science
on 2012-10-17 15:59:43 -
Account Deleted > Regarding comment 29: this position seems to contradict what I'm a Cray compiler implementer responsible for UPC, but code.google didn't seem to work well with my other email address. It's not a contradiction. I'm saying I support deprecating the static compilation model. The dynamic compilation model would remain, including all declarations that are currently legal within it. What would go away would be the declarations that currently are legal only in the static compilation model as well as any need to distinguish between compilation models in the spec.
Reported by
johnson.troy.a
on 2012-10-17 17:10:38 - Log in to comment
``` As an implementer I'd be happy to drop the static threads environment as a concept at the language specification level. By that I mean that anything the language says is only legal in a static threads environment would become illegal, and all references to static vs dynamic threads would be removed.
The alleged "optimization opportunities" (such as loop unrolling) that comes from a compile-time constant number of threads CAN REMAIN. Nothing in the changes I envision being made to the UPC spec would prohibit the compiler from taking the number of threads as a compiler option.
Of course, to be realistic about backward comparability, I cannot actually endorse full removal for UPC 1.3. It is, I think, worthy of consideration for removal in a 2.0 specification. So, I would propose that a UPC 1.3 compliant compiler would WARN that static-threads is deprecated ("scheduled for removal in a future specification"?) when passed the thread-count argument AND it encounters a legal-only-in-static-thread-environment construct in the code. As an implementer I don't look forward to adding the warnings, but am expecting that I'll find all the proper places in the compiler by searching for the if-static-threads checks.
I would, of course, also encourage improvement in the error messages that result today when a static-only code is compiled in a dynamic-threads environment. (I say this knowing that BUPC is probably one of the worst offenders). ```
Reported by `phhargrove@lbl.gov` on 2012-05-22 01:19:27