upc_all_(prefix_)reduce: behavior is under-specified for floating-point NaNs

Issue #96 new
Former user created an issue

Originally reported on Google Code with ID 96

Two of the ops defined for upc_all_(prefix_)reduce in UPC 1.2 are:

UPC_MIN For all data types, find the minimum value.
UPC_MAX For all data types, find the maximum value.

However since min and max are not C operators, it's currently undefined what the behavior
should be when a floating-point MIN/MAX reduction encounters a NaN value (which by
C99 Sec.4 implies undefined behavior).

We could choose to specify this as implementation-defined, which is not much help to
users, but at least raises awareness of the issue.

In an ideal world the operation would mimic the behavior of fmin() and fmax() defined
in math.h, ie:

"NaN arguments are treated as missing data: if one argument is a NaN and the other
numeric, then the fmax functions choose the numeric value."

However I don't know if it's reasonable to require this behavior of all implementations
(especially ones using any hardware support for reductions). I don't see any mention
of this issue in the MPI collectives spec - does anyone know how this case is handled?

Reported by danbonachea on 2012-10-08 01:02:51

Comments (6)

  1. Former user Account Deleted

    ``` I agree with Dan that this current specification leaves the case of MIN/MAX ill-defined for NaN. In fact, I have changed the summary to be slightly broader because we also do not (and should not, IMHO) specify anything about where (on which thread if any) a signaling-NaN will raise an exception when another reduction operation is applied. [Dan, feel free to split the issue if you really don't like me lumping then together].

    I am happy enough with "cementing" the current state by explicitly making the case of NaN arguments either undefined or possibly implementation-defined (each implementation must document its behavior).

    We ```

    Reported by `phhargrove@lbl.gov` on 2012-10-08 22:31:25

  2. Former user Account Deleted

    ``` I agree with this. Implementation-defined or undefined are the right solutions.

    Regarding MPI (http://www.mcs.anl.gov/research/projects/mpi/www/www3/MPI_Reduce.html):

    Notes on collective operations

    "The reduction functions (MPI_Op) do not return an error value. As a result, if the functions detect an error, all they can do is either call MPI_Abort or silently skip the problem. Thus, if you change the error handler from MPI_ERRORS_ARE_FATAL to something else, for example, MPI_ERRORS_RETURN, then no error may be indicated."

    If this is not enough, I would be happy to query the MPI Forum. ```

    Reported by `jeff.science` on 2012-10-08 22:51:26

  3. Former user Account Deleted
    Could we not define UPC_MAX and UPC_MAX_IGNORE_NAN (likewise for MIN) to support both
    behaviors?  I would not want to get a finite value from a reduction that included 1.0/0
    in the inputs.
    

    Reported by jeff.science on 2012-10-11 01:49:49

  4. Former user Account Deleted
    "I would not want to get a finite value from a reduction that included 1.0/0 in the
    inputs."
    
    Actually, 1.0/0 is +Infinity, which is different from NaN. MAX and MIN are well-defined
    for infinities, so the implementation had better already handle them correctly. NaN
    represents a value that is truly missing or not a well-defined value, like sqrt(-2).
    Comparison operators (<,>,==, etc) on NaN's are meaningless, so the straightforward
    definition of MIN and MAX fails for NaN. 
    
    The semantic question is whether the user wants a MIN/MAX reduction that encounters
    NaN to ignore it (as you might want to perform a reduction where one thread has "no
    contribution") or to infect it and cause the whole reduction to return NaN (as you
    might want for debugging when the NaN is the result of a programming error or other
    numeric failure). Both behaviors are potentially useful in different situations.
    
    "Could we not define UPC_MAX and UPC_MAX_IGNORE_NAN (likewise for MIN) to support both
    behaviors?  "
    
    Adding new features to the 1.0 collectives doesn't make sense, since we're about to
    replace them anyhow. However we should revisit this decision for the 2.0 collectives
    and consider providing programmer control for floating-point exception behavior.
    
    I think the best solution for 1.3 is to call it implementation-defined.
    

    Reported by danbonachea on 2012-10-11 02:43:57 - Status changed: Accepted - Labels added: Consensus-Medium - Labels removed: Consensus-Low

  5. Former user Account Deleted
    Change proposal mailed 10/29:
    
    
    --- upc-lib-collectives.tex     (revision 179)
    +++ upc-lib-collectives.tex     (working copy)
    @@ -651,6 +657,12 @@
     for $0 \leq$ {\tt i} $\leq$ {\tt nelems-1} and
     where ``$\oplus$'' is the operator specified by the variable {\tt op}.
    
    +\np \xadded[id=DB]{96}{
    +   If a floating-point variant of either function encounters an
    +   operand with a {\em NaN} value (as defined in [ISO/IEC00 Sec 5.2.4.2.2]),
    +   behavior is implementation-defined.
    +}
    +
     \np
     If the value of {\tt blk\_size} passed to these functions is
     greater than 0 then they treat the {\tt src} pointer
    

    Reported by danbonachea on 2012-10-30 01:35:54 - Status changed: PendingApproval

  6. Former user Account Deleted
    The proposed change was ratified at the 1/17/13 telecon and committed as SVN r201.
    

    Reported by danbonachea on 2013-01-17 19:53:00 - Status changed: Ratified

  7. Log in to comment