require all array declarations that work in the static model to work in the dynamic model

Issue #99 new
Former user created an issue

Originally reported on Google Code with ID 99

This discussion is a continuation of http://code.google.com/p/upc-specification/issues/detail?id=30
that focuses on the positive solution to the discrepancy between the static and dynamic
threads model w.r.t. array declarations.

Dan has provided numerous examples of what works only the static model, so I'll just
name the following as my motivating example:

shared A[THREADS][THREADS];

This should work in both the static and dynamic models.  Paul Hargrove has indicated
that BUPC can do this an no implementer has given any evidence that other UPC compilers
can't do this as well.

Reported by jeff.science on 2012-10-17 02:25:03

Comments (6)

  1. Former user Account Deleted
    "no implementer has given any evidence that other UPC compilers can't do this as well"
    
    Copying my response from issue 30:
    
    I believe there are two basic issues with relaxing the restrictions on shared types
    under dynamic threads.
    
    1. The first issue deals with implementing the allocation of static shared data. UPC
    was designed to allow an implementation where the compiler can, at compile time, determine
    and allocate the shared space required on each thread for all statically-allocated
    objects. The dynamic threads restriction (described formally in issue 94, comment 11)
    ensures this calculation is always possible for shared arrays, so that statically-allocated
    shared arrays can be placed in the .bss linker section, just like every other statically-allocated
    object in C99.
    
    That being said, nothing REQUIRES this implementation approach, and as Paul and others
    pointed out several compilers allocate the "static" shared data dynamically at startup
    anyhow - so those compilers could easily relax the allocation-related part of the restriction.
    I don't know how many of the major compilers fall into this category and how many rely
    upon the guarantee mentioned above, but changing an existing implementation from one
    strategy to the other probably represents a significant undertaking. That alone probably
    delays this feature enhancement to 2.0 or later.
    
    2. The second, stickier issue is type-checking. Allowing the THREADS expression to
    appear in arbitrary places in array declarations under dynamic threads would mean allowing
    all those expressions to have a value which is not a compile-time constant. Various
    parts of type-checking for arrays in *C89* relied upon compile-time constant array
    dimensions, and at the time UPC was first specified many C implementations still shared
    that restriction. C99 relaxes this restriction somewhat with the variable-length array
    feature, whose implementation has now become more widespread and might be used to alleviate
    some of this issue. However even in C99 it is not legal to declare a statically-allocated
    array with a variable length (because of point 1 above), so it would need to be extended
    somewhat to handle UPC shared arrays which are always declared with a static lifetime
    (as opposed to stack variables, which are the target of C99's VLA). One would also
    have to decide "how variable" we allow the dimension expressions to become - ie just
    otherwise-constant expressions that include THREADS? What about expressions like ceil(sqrt(THREADS))?
    What about arbitrary user-provided function calls?
    
    In any case, the problem also affects blocksizes, eg consider this declaration:
    
    shared [THREADS] long a[1024];
    shared [16] long *p = &a;
    
    Is the second line valid? The type compatibility depends on whether THREADS==16, which
    is not known at compile time for dynamic threads (and therefore cannot be typechecked
    statically, which is fundamental to the C philosophy). In the past we've occasionally
    tossed around the possibility of adding language support for variable blocksizes and
    it's never really taken off, however we would probably need a facility like that to
    support dynamic THREADS in a blocksize expression (without imposing a bunch of artificial
    limitations).
    
    A minor (perhaps unimportant?) side effect of adding such a feature is it would make
    it easy to write programs that failed with memory exhaustion BEFORE REACHING main().
    For example a declaration like this:
    
    shared int x[THREADS*THREADS*THREADS*THREADS*THREADS];
    
    would work fine when run with small thread counts, but at larger thread counts would
    quickly lead to a spawn-time error that cannot be diagnosed at compile or link time.
    This should not be surprising (when written this obviously), but it is novel - in the
    current UPC/C99 language the linker can reject erroneous attempts to create ludicrously-sized
    static data. Users are accustomed to the possibility that upc_alloc() or malloc() might
    fail due to memory exhaustion, but this failure would happen at startup before reaching
    any user code (which might make it more difficult to diagnose, depending on implementation
    support).
    

    Reported by danbonachea on 2012-10-17 02:26:53

  2. Former user Account Deleted
    I oppose this relaxation.  The Cray compiler is an example of a UPC compiler that allocates
    static UPC data statically in the data segment.  Both this static allocation and the
    representation of array extents would be hindered by the need to support things like
    shared int x[THREADS*THREADS] or shared int x[THREADS][THREADS].  Detection of local
    references becomes more complicated when the compiler cannot rely on there being a
    single THREADS dimension.  Finally, compilers that allow all of their usual C optimizations
    to apply to (loops making) UPC array references may need to modify the "non-UPC" parts
    of their compiler to understand this new kind of array extent.
    
    The mention of C99 VLAs is an interesting comparison, but I think that we ought not
    to use it as a model for any UPC feature.  VLAs are very controversial and are one
    of the C features that never made it into C++.  In a compiler that supports C, UPC,
    and C++, the C99 VLAs are already a bit of a one-off oddity/nuisance.
    

    Reported by johnson.troy.a on 2012-10-17 15:22:17

  3. Former user Account Deleted
    Regarding comment 1: I think having the dimensions being a simple polynomial of THREADS
    is sufficient.  I don't know how to formalize this, but how about this?
    
    dim = \sum_i=1:n c*THREADS^n for any finite n less than some implementation defined
    and specified value that is greater than or equal to 12 but hopefully much larger;
    c is an integer constant.
    
    Regarding comment 2: So this sounds like the static model will provide better performance,
    which I think it should.  Clearly, more information enables greater optimization. 
    It sounds like the argument against this is because it requires work to implement.
     I would imagine any new feature has this property.  Paul suggested this feature target
    UPC 2.0, which seems like it might have other additions that would require additional
    implementation effort.  Should the language not be allowed to progress because it requires
    work to implement?  I imagine Cray's C compiler is going to need a lot of work to support
    C11 as well.
    
    I understand this this feature may require more work for implementers.  However, I
    think it has tremendous value for the user community outside of the power users who
    build their application once for a particular piece of hardware and rarely modify it
    again.  
    
    I believe that this feature will make it significantly easier to attract new users,
    particularly students who read the UPC book or look at the tutorials.  Every example
    of multidimensional arrays I've found in those resources does not work with the dynamic
    model.  This is incredibly discouraging to new users and makes UPC feel like a domain-specific
    language.
    
    Furthermore, it is effectively impossible to write a general purpose science code that
    uses multidimensional arrays because, as noted by others, this requires all sorts of
    error-prone pointer arithmetic that e.g. quantum chemists are not highly skilled at.
     At the very least, it has to wait until someone like me writes a full-service replacement
    to GA in UPC to do all of that for them, but as anyone can see from the general behavior
    of the HPC community, the existence of a good library for X does not reduce the probability
    that a domain scientist will try to re-implement X anyways.  Forcing users to adopt
    libraries because the language lacks features seems like an unfriendly model.
    

    Reported by jeff.science on 2012-10-17 15:56:57

  4. Former user Account Deleted
    My argument is not "this feature would require work, therefore it is bad."  The argument
    is, as Dan has also pointed out, that static allocation of an array with more than
    one THREADS dimension cannot be implemented, no matter how much work is put into it.
     Adding this feature effectively prohibits a UPC compiler from locating such an array
    in the static data segment.  Therefore, the compiler would need to use dynamic allocation
    (an under-the-covers upc_all_alloc), which returns a pointer, but ideally still optimize
    references through that pointer as well as if they were referencing through an array
    in the static data segment.  This is the part that _could_ be done, but I'm not convinced
    that we _should_.
    
    There is another issue open for providing a library routine to help with UPC pointer
    arithmetic.  Would that be sufficient to at least lower the difficulty of using UPC
    multidimensional arrays to the level of using C multidimensional arrays?  I guess I'm
    having trouble seeing how we could add better support for multidimensional arrays to
    UPC when it doesn't already exist in C and that's the base language.  If we were successful,
    you'd have C programmers wanting to declare their multidimensional arrays as shared
    simply to take advantage of some nicer way of dealing with them.
    

    Reported by johnson.troy.a on 2012-10-17 16:41:47

  5. Former user Account Deleted
    "All array declarations that work in the static model" includes these:
    
    int foo[THREADS+3]; // Not shared
    struct {
      char names[THREADS][20]; // Structure field
      int flags;
    } tdata[THREADS][THREADS]; // Also not shared
    

    Reported by brian.wibecan on 2012-10-17 19:10:52

  6. Former user Account Deleted
    Thanks, Brian.  I was way too general in my issue proposal.  I will make a much more
    restrictive proposal shortly once I capture more of the semantics in my head.
    

    Reported by jeff.science on 2012-10-17 22:26:47

  7. Log in to comment