Support na_ignore argument for more (all?) is_* functions

Issue #17 resolved
Former user created an issue

Statistical data is often incomplete and therefor riddled with NAs. So in the spirit of Issue #8 I propose the support of the na_ignore argument for more (all?) is_* functions. E.g. to assert a variable is numeric or NA, character or NA, etc.

Comments (7)

  1. Richard Cotton repo owner

    I decided that it makes more sense for na_ignore to occur at the assert level rather than the is level. I've already provided support for this argument in more functions. It doesn't make sense everywhere: in your example

    assert a variable is numeric or NA

    this doesn't quite seem semantically right. The first thing is a check on the class of the variable, and the second thing is a check on missingness. so the assertive way of doing it is to keep these checks separate (which makes it easier to understand what is going on). Something like this:

    x %>%
      assert_is_numeric %>%
      assert_all_are_not_na
    

    If there are any functions in particular that you think would benefit from an na_ignore argument or different NA handling, give me an example in the comments.

  2. Luj Omu

    Having the na_ignore argument in the assert level makes perfectly sense, I just assumed this would need to be implemented on the is level.

    Sorry for not making myself clear. I want to make sure a variable consists only of numbers OR only of NAs OR of both mixed. (In contrary to allowing only numbers BUT NO NAs.) This works for mixed vectors. However the problem lies in the vectors consisting solely of NAs, as their class will be 'logical'.

    Consider this five variables, that I all deem valid for my function:

    vec1 <- 1;              assert_is_numeric( vec1) # Passes.
    vec2 <- c( 1, 2, 3);    assert_is_numeric( vec2) # Passes.
    vec3 <- c( 1, NA, 3);   assert_is_numeric( vec3) # Passes.
    vec4 <- c( NA, NA ,NA); assert_is_numeric( vec4) # Error: vec4 is not of type 'numeric'; it has class 'logical'.
    vec5 <- NA;             assert_is_numeric( vec5) # Error: vec5 is not of type 'numeric'; it has class 'logical'.
    

    I guess I just want a concise version of the following:

    stopifnot( is.numeric( vec) || all( is.na( vec)))
    
  3. Richard Cotton repo owner

    There are a few solutions to this already.

    The one that makes most sense to me is that if your input is sometimes being passed as a logical vector, but you want it to be numeric, is to coerce it to numeric.

    Either as.numeric(vec) or coerce_to(vec, "numeric") (the latter gives a warning).

    If you are happy with the input being either numeric or logical, then you could use assert_is_any_of(vec, c("numeric", "logical")).

    Or you can do more complicated things like

    if(all(is_na(vec)))
    {
      assert_is_any_of(vec, c("numeric", "logical"))
    } else
    {
      assert_is_numeric(vec)
    }
    

    I just want a concise version

    BTW, assertive isn't really designed to be concise. I've tried to optimize for having easily readable code, and informative error messages. This sometimes mean more typing, but hopefully less thinking.

  4. Luj Omu

    First, let me say thanks for providing and maintaining this packages and for taking the time to respond to me!

    to coerce it to numeric.

    I see where you are coming from, but I don't think unconditionally casting is a good idea. It will circumvent the very thing I would like to check for. E.g. consider the vector c( 1, "2" , NA ) : It will be cast to 'numeric' without error, although it does not fulfill the requirement of being 'numeric, with NAs allowed'.

    If you are happy with the input being either numeric or logical

    This would not catch the erroneous input of e.g. c( TRUE, FALSE, NA), which is of type 'logical', but will be coerced (implicitly) to 'numeric' the moment a mathematical operation is performed on it: c( TRUE, FALSE, NA) + 1 # c( 2, 1, NA).

    I've tried to optimize for having easily readable code

    Yes, I appreciate that, and that's my foremost concern, too (after the code working, of course). And that's actually what I meant when I misused the word "concise".

    I think my "problem" is that I don't perceive NA as it's own type ... and really it isn't. As I understand it, 'logical' ist just the most "narrow" type that allows for NAs. So that is used as long as there are only NAs in a vector.

    So for the sake of readability and clarity, I personally think the 'na_ignore' parameter would make sense for any type check:

    assert_is_numeric( vec, na_ignore = TRUE)
    

    That said, I think I will use the following for now:

    if( all( is.na( vec)))
      vec <- as.numeric( vec)
    assert_is_numeric( vec)
    
  5. Richard Cotton repo owner

    Problem seems highly specific; having a "check the type unless it is all NAs" isn't worth implementing for general use.

  6. Luj Omu

    Decision accepted.

    Just let me make one last point to show you that this situation is not as unusual as it may first seem. Let's say you have the following function where you want to restrict the argument type to 'numeric'.

      plusOne <- function( num) {
        assert_is_numeric( num)
        num + 1
      }
      plusOne( 1)             # 2
      plusOne( c( 1, 2, 3))   # c( 2, 3, 4)
      plusOne( c( 1, NA, 3))  # c( 2, NA, 4)
      plusOne( NA)            # Error
    

    From a purely logical perspective, if you have a function that operates

    • on a single numerical value,
    • and on a numerical vector,
    • and on a numerical vector containing NAs,

    what justification is there that a single NA should throw an error? (Unless of course you deliberately deny that special case using e.g. assert_all_are_not_na.)

    For comparison, you would not expect sum( NA) to produce an error, while sum( c( 1, NA, 3)) returns a result.

  7. Richard Cotton repo owner

    I still think that it's a bit weird behaviour to accept either numbers or a logical vector of NAs.

    If your function is going to be used directly by other people, then you should be helpful and either support arithmetic with logical vectors (treating them as 0/1), or coerce the input to numeric, explaining to the user that you are doing so.

    If your function is at a lower-level (not intended to be user-callable), or is somewhere where you need to be stricter with types (maybe R is the backend to some application), then you should just have assert_is_numeric and document that the caller needs to write plusOne(NA_real_).

  8. Log in to comment