Converting a pandas dataframe with dtype=int fails

Issue #159 resolved
Former user created an issue

In pandas2ri.pandas2ri(obj), when trying to convert each series in the pandas dataframe the obj.dtype is 'int64' so it gets passed to

        # converted as a numpy array
        res = original_conversion(obj)

which doesn't know how to deal with a Pandas series.

pandas.rpy.common.convert_to_r_dataframe

seems to work fine. Perhaps change pandas2ri to use that instead?

Here is a test script to illustrate:

%load_ext rmagic

import rpy2
import collections
import pandas as pd
import numpy as np
import pandas.rpy.common as com

print "Version:", rpy2.__version__

def test(dtype, pandas=False):
    cols = map(str, np.unique(np.random.randint(0, 1000000, 5)))
    index = map(str, np.unique(np.random.randint(0, 1000000, 50)))
    df = pd.DataFrame(np.random.randint(0,5,(len(index), len(cols))),
                      index=index, columns=cols, dtype=dtype)

    if pandas:
        rdf = com.convert_to_r_dataframe(df)
        %R -i rdf print(summary(df))
    else:
        %R -i df print(summary(df))

print "-----"
try:
    test(int)
except ValueError:
    print "Failed to convert a data frame of integer types"

print "-----"
try:
    test(object)
except ValueError:
    print "Failed to convert a data frame of object types"

print "-----"
try:
    test(int, True)
except ValueError:
    print "Failed to convert a data frame of integer types"

I get the output:

Version: 2.3.8
-----
Failed to convert a data frame of integer types
-----
 X56676 X103974 X147252 X533895 X560637
 0: 9   0: 7    0:10    0: 8    0:11   
 1: 7   1: 9    1: 9    1:11    1:15   
 2:11   2: 7    2:13    2:13    2: 5   
 3: 9   3:18    3: 9    3: 8    3:11   
 4:14   4: 9    4: 9    4:10    4: 8   

-----
 X56676 X103974 X147252 X533895 X560637
 0: 9   0: 7    0:10    0: 8    0:11   
 1: 7   1: 9    1: 9    1:11    1:15   
 2:11   2: 7    2:13    2:13    2: 5   
 3: 9   3:18    3: 9    3: 8    3:11   
 4:14   4: 9    4: 9    4:10    4: 8  

Comments (7)

  1. Job Evers‐Meltzer

    Ignore my suggestion about using pandas, pandas.rpy.common.convert_to_r_dataframe. At least, I'm getting errors every time I use that output in the grm function of the ltm package. Sorry.

  2. Laurent Gautier

    The converter for pandas in rpy2 is probably not without bugs.

    The most recent efforts on conversion with pandas are in the current development branch for rpy2 (branch version_2.4.x). If wanting to try that's only a matter of doing:

    pip install https://bitbucket.org/lgautier/rpy2/get/version_2.4.x.tar.gz
    

    Note: the ipython's "rmagic" is being merged into rpy2 (see https://github.com/ipython/ipython/issues/3803). I have not tested it much yet and I don't know whether this would interfere with your current use of the rmagic (@Dav Clark would be able to tell).

  3. Dav Clark

    I'll preface this by saying that most tests are currently failing (at least for @Laurent Gautier), and the code will be moved to a new location soon.

    But, if you want to test it out, you just run:

    %load_ext rpy2.interactive.rmagic
    

    The plan is to (very shortly) move that to:

    %load_ext rpy2.ipython
    
  4. Laurent Gautier

    Closing. It is working with:

    %load_ext rpy2.ipython
    
    import rpy2
    import collections
    import pandas as pd
    import numpy as np
    import rpy2.robjects.pandas2ri
    
    def test(dtype, pandas=False):
        cols = tuple(str(x) for x in np.unique(np.random.randint(0, 1000000, 5)))
        index = tuple(str(x) for x in np.unique(np.random.randint(0, 1000000, 50)))
        df = pd.DataFrame(np.random.randint(0,5,(len(index), len(cols))),
                          index=index, columns=cols, dtype=dtype)
    
        if pandas:
            rpy2.robjects.pandas2ri.activate()
            %R -i df print(summary(df))
            rpy2.robjects.pandas2ri.deactivate()
        else:
            %R -i df print(summary(df))
    
    print("-----")
    try:
        test(int)
    except ValueError:
        print("Failed to convert a data frame of integer types")
    
    print("-----")
    try:
        test(object)
    except ValueError:
        print("Failed to convert a data frame of object types")
    
    print("-----")
    try:
        test(int, True)
    except ValueError:
        print("Failed to convert a data frame of integer types")
    
  5. Log in to comment