pandas2ri conversion from R to python doesn't preserve dates

Issue #594 resolved
Todor Markov created an issue

Using rpy2 3.2.0 in python 3.6, r ('littler') version 0.3.2, on Linux Mint 19.1. You can see that dates aren’t preserved when converting from python to R and then back, but I get floating points instead:

from rpy2 import robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
df = pd.DataFrame([['2019-01-01', 1]])
df[0] = pd.to_datetime(df[0])
# >>> df
#            0  1
# 0 2019-01-01  1

robjects.globalenv['data'] = df
# If printed from R side, the date is still properly stored
# But now if we convert back
df2 = robjects.globalenv['data']
# >>> df2
#              X0  X1
# 0  1.546294e+09   1
# It is now a floating point. It is also not obvious how to convert it back:
# >>> pd.to_datetime(df2['X0'])
# 0   1970-01-01 00:00:01.546293600 - wrong date

This has also been reported on Pandas github before, but it was not resolved there: https://github.com/pandas-dev/pandas/issues/21044

Comments (4)

  1. Laurent Gautier

    It is somehow easier to keep control over when and how conversion is used by using local converters:

    import pandas as pd
    from rpy2 import robjects
    from rpy2.robjects import pandas2ri
    from rpy2.robjects.conversion import localconverter, Converter
    
    df = pd.DataFrame([['2019-01-01', 1]])
    df[0] = pd.to_datetime(df[0])
    
    with localconverter(robjects.default_converter + pandas2ri.converter):
        robjects.globalenv['mydata'] = df
    
    # Only use the default converter
    r_datecol = robjects.globalenv['mydata'][0]
    
    # The column with a date was converter to an R vector of dates in the data dataframe:  
    # >>> r_datecol  
    # R object with classes: ('POSIXct', 'POSIXt') mapped to:
    # [2019-01-01]
    
    # Use default + pandas converters
    with localconverter(robjects.default_converter + pandas2ri.converter) as cv:
        py_datecol = cv.rpy2py(r_datecol)
    
    # The conversion of the R vector back to a pandas array of dates appears to be working:
    # >>> py_datecol                                                               
    # DatetimeIndex(['2019-01-01 00:00:00-05:00'], dtype='datetime64[ns, America/New_York]', freq=None)
    
    # The issue is therefore with what is happening at the DataFrame level. Somehow the POSIXct column
    # in the R data.frame is mapped to an array of floats (which is the C-level type in R, and should
    # only happen at the `rpy2.rinterface` level) rather that to the R-level class for date vectors.
    # The issue reported is broader than only date/time vectors.
    

    I think that I can come up with a fix later today.

  2. Log in to comment