Cannot get dataframe with dates out of R into Pandas

Issue #418 resolved
Timothy Hopper
created an issue

I'm trying to use Twitter's AnomalyDetection library from Python.

It includes a simple dataset called raw_data that has a date column.

When I try to get it into Python I get ValueError: array-shape mismatch in array 1.

My code is

from rpy2.robjects.packages import importr
from rpy2.robjects import r, pandas2ri
import pandas
import rpy2

print(pandas.__version__)
print(rpy2.__version__)
pandas2ri.activate()
ad = importr('AnomalyDetection')
print(r['raw_data'])

The full output is

0.20.3
2.8.6
Traceback (most recent call last):
  File "ad.py", line 10, in <module>
    print(r['raw_data'])
  File "/Users/tdhopper/.virtualenvs/anomalydetection/lib/python3.6/site-packages/rpy2-2.8.6-py3.6-macosx-10.6-intel.egg/rpy2/robjects/__init__.py", line 342, in __getitem__
    res = conversion.ri2py(res)
  File "/Users/tdhopper/.virtualenvs/anomalydetection/lib/python3.6/functools.py", line 803, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/Users/tdhopper/.virtualenvs/anomalydetection/lib/python3.6/site-packages/rpy2-2.8.6-py3.6-macosx-10.6-intel.egg/rpy2/robjects/pandas2ri.py", line 142, in ri2py_listvector
    res = ri2py.registry[DataFrame](obj)
  File "/Users/tdhopper/.virtualenvs/anomalydetection/lib/python3.6/site-packages/rpy2-2.8.6-py3.6-macosx-10.6-intel.egg/rpy2/robjects/pandas2ri.py", line 150, in ri2py_dataframe
    recarray = numpy2ri.ri2py(obj)
  File "/Users/tdhopper/.virtualenvs/anomalydetection/lib/python3.6/functools.py", line 803, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/Users/tdhopper/.virtualenvs/anomalydetection/lib/python3.6/site-packages/rpy2-2.8.6-py3.6-macosx-10.6-intel.egg/rpy2/robjects/numpy2ri.py", line 144, in ri2py_list
    res = numpy.rec.fromarrays(o2, names=tuple(names))
  File "/Users/tdhopper/.virtualenvs/anomalydetection/lib/python3.6/site-packages/numpy/core/records.py", line 619, in fromarrays
    raise ValueError("array-shape mismatch in array %d" % k)
ValueError: array-shape mismatch in array 1

Comments (4)

  1. Laurent Gautier

    The conversion code seems to be handling the conversion of dates from pandas to R, but not the other way around. I'll look at it. At the moment I don't remember for whether it is an oversight or there were ambiguities in the way the conversion can be made and no decision was made.

    In the meantime, a workaround could be to either: - extract the date components you are interested in (e.g., year, month, day, hour, etc...) into columns in R and retrieve that modified data frame - convert the date to a string, retrieve that modified data frame, convert the string back to a Python date.

  2. Ray Donnelly

    This breaks on any GNU/Linux system that does not have an /etc/timezone file (many).

    ======================================================================
    ERROR: testTimeR2Pandas (robjects.tests.testPandasConversions.PandasConversionsTestCase)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/opt/conda/conda-bld/rpy2_1507554941144/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold$
        py_time = robjects.conversion.ri2py(r_time)
      File "/opt/conda/conda-bld/rpy2_1507554941144/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold$
        return dispatch(args[0].__class__)(*args, **kw)
      File "/opt/conda/conda-bld/rpy2_1507554941144/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold$
        res = pandas.to_datetime(tuple(foo))
      File "/opt/conda/conda-bld/rpy2_1507554941144/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold$
        foo = (tzone.localize(datetime.fromtimestamp(x)) for x in obj)
    AttributeError: 'NoneType' object has no attribute 'localize'
    
    Stderr:
    /opt/conda/conda-bld/rpy2_1507554941144/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho$
      warnings.warn('No file %s' % etc_timezone)
    
  3. Log in to comment