Bug when trying to unfactor a 44 chars value on factor

Create issue
Issue #6 new
Former user created an issue

Try to unfactor the column on the rds file attached

x=readRDS(file = "bug_unfactor_example.rds")

x[3] [1] 13191203519135000156550020000312221580784296 Levels: 13191203519135000156550020000312221580784296 unfactor(x[3]) [1] 13191203519135001231215059440580646550896640

Comments (2)

  1. Mehrad Mahmoudian repo owner

    Thank you for this very interesting report. These are the type of report I always appreciate (clean explanation and reproducible example).

    After 10 minute searching it got apparent that this is technically not a bug, but rather a known shortcoming of 64bit R implementation for storing doubles. See the following link for detailed explanation (the first one is more straight forward imho):

    1. https://stackoverflow.com/a/52718903/1613005
    2. http://www.win-vector.com/blog/2015/06/r-in-a-64-bit-world/

    As you can also see in your example, up until the 16th significant number (from left to right), everything is OK. This is because your machine is 64 bit. if it was a 32 bit machine, it would have been only correct up until 12th.

    You can get the maximum size of the long double on your machine:

    .Machine$sizeof.longdouble
    

    which on my machine returns:

    [1] 16

    I also encourage you (and everyone like me who feel this is challenging and fun) to read the help for .Machine :

    ?.Machine
    

    Especially this part:

    sizeof.longdouble: the number of bytes in a C ‘long double’ type. Will
    be zero if there is no such type (or its use was disabled
    when R was built), otherwise possibly ‘12’ (most 32-bit
    builds) or ‘16’ (most 64-bit builds).

    The conclusion of both links above is to “corrects this behavior using the multiple precision library (gmp) ” or Multiple Precision Floating-Point Reliable package ( Rmpfr ).

    At this point I’m not sure if I want to add such dependency to varhandle package, but for the time being, I will to two things:

    1. keep this issue open until it is fully solved by any means necessary.
    2. Will implement in the varhandle::unfactor() to warn the user if there is such misbehavior that they should be worried about. (or even maybe aside from warning, returning them as character to keep them intact).

    Any suggestion/opinion is always welcome. Feel free to discuss this matter further here.

  2. Log in to comment