runtime error with biglm package

Issue #134 invalid
Lev Givon created an issue

I've been trying to port the following R code (which runs without any problem using R 2.15.2) to Python 2.7.4 using rpy2 2.3.6 and pandas 0.11.0:

library(biglm)

set.seed(0)

fm = c(rep("A", 5), rep("B", 5), rep("C", 5), rep("D", 5))
yr = rep(c(1985,1986,1987,1988,1989), 4)
d = data.frame(fm, yr, y=rnorm(length(yr)), x=rnorm(length(yr)))

model_fm = y~x+as.factor(fm)-1

print(summary(lm(model_fm, data=d)))
print(summary(biglm(model_fm, data=d)))

The following code, however, raises a runtime exception (which seems to come from R) when calling biglm:

import numpy as np
import pandas
import pandas.rpy.common as comm
import rpy2.robjects as robjects

# Set this to the path containing locally installed R libraries:
robjects.r['.libPaths']('/home/lev/Work/prod-root/lib/R/library')
biglm = robjects.packages.importr('biglm')

np.random.seed(0)
fm = ['A']*5+['B']*5+['C']*5+['D']*5
yr = [1985,1986,1987,1988,1989]*4
N = len(fm)
d = pandas.DataFrame({'fm': fm, 'yr': yr, 
    'x': np.random.normal(size=N), 'y': np.random.normal(size=N)})
d_r = comm.convert_to_r_dataframe(d)

model_fm = 'y~x+as.factor(fm)-1'

print robjects.r.summary(robjects.r.lm(model_fm, data=d_r))
print robjects.r.summary(robjects.r.biglm(model_fm, data=d_r))

Here is the exception:

Error: $ operator is invalid for atomic vectors
Traceback (most recent call last):
  File "rpy2_biglm_demo.py", line 32, in <module>
    print robjects.r.summary(robjects.r.biglm(model_fm, data=d_r))
  File "/home/lev/Work/PYTHON/lib/python2.7/site-packages/rpy2/robjects/functions.py", line 86, in __call__
    return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
  File "/home/lev/Work/PYTHON/lib/python2.7/site-packages/rpy2/robjects/functions.py", line 35, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error: $ operator is invalid for atomic vectors

I don't believe the problem is due to pandas because the same exception is raised when the R dataframe is created directly with robjects.vectors.DataFrame() rather than pandas. Moreover, I can invoke the function using the following Python code:

robjects.globalenv['d'] = d_r
print robjects.r.summary(robjects.r('biglm(y~x+as.factor(fm)-1, data=d'))

Any ideas why attempting to access the biglm function directly causes an error?

Comments (2)

  1. Laurent Gautier

    The RRuntimeError means that the error is triggered by R when the call is evaluated.

    Beside narrowing down the problem on the Python side to the the exact call triggering the problem, R can be asked for hints:

    base = importr('base')
    # print() has nothing to do, neither has summary()
    # so we leave them out 
    fit = robjects.r.biglm(model_fm, data=d_r)
    tb=base.traceback()
    

    This is telling us that that problem happens during the call terms(formula).

    The call itself, retrieved with traceback() above (call copied below), can be copy/pasted into R and give the same error.

    (function (formula, data, weights = NULL, sandwich = FALSE) 
       {
           tt <- terms(formula)
           if (!is.null(weights)) {
               if (!inherits(weights, "formula")) 
                   stop("`weights' must be a formula")
               w <- model.frame(weights, data)[[1]]
           }
           else w <- NULL
           mf <- model.frame(tt, data)
           if (is.null(off <- model.offset(mf))) 
               off <- 0
           mm <- model.matrix(tt, mf)
           qr <- bigqr.init(NCOL(mm))
           qr <- update(qr, mm, model.response(mf) - off, w)
           rval <- list(call = sys.call(), qr = qr, assign = attr(mm, 
               "assign"), terms = tt, n = NROW(mm), names = colnames(mm), 
               weights = weights)
           if (sandwich) {
               p <- ncol(mm)
               n <- nrow(mm)
               xyqr <- bigqr.init(p * (p + 1))
               xx <- matrix(nrow = n, ncol = p * (p + 1))
               xx[, 1:p] <- mm * (model.response(mf) - off)
               for (i in 1:p) xx[, p * i + (1:p)] <- mm * mm[, i]
               xyqr <- update(xyqr, xx, rep(0, n), w * w)
               rval$sandwich <- list(xy = xyqr)
           }
           rval$df.resid <- rval$n - length(qr$D)
           class(rval) <- "biglm"
           rval
       })("y~x+as.factor(fm)-1", data = list(fm = c("A", "A", "A", "A", 
       "A", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "D", "D", 
       "D", "D", "D"), x = c(1.76405234596766, 0.400157208367223, 0.978737984105739, 
       2.24089319920146, 1.86755799014997, -0.977277879876411, 0.950088417525589, 
       -0.151357208297698, -0.103218851793558, 0.410598501938372, 0.144043571160878, 
       1.45427350696298, 0.761037725146993, 0.121675016492828, 0.443863232745426, 
       0.333674327374267, 1.49407907315761, -0.205158263765801, 0.313067701650901, 
       -0.854095739301725), y = c(-2.55298981583408, 0.653618595440361, 
       0.864436198859506, -0.742165020406442, 2.26975462398761, -1.45436567459876, 
       0.0457585173014461, -0.187183850025834, 1.53277921435846, 1.46935876990029, 
       0.154947425696916, 0.378162519602174, -0.887785747630113, -1.98079646822393, 
       -0.347912149326153, 0.15634896910398, 1.23029068072772, 1.20237984878441, 
       -0.387326817407952, -0.302302750575336), yr = c(1985L, 1986L, 
       1987L, 1988L, 1989L, 1985L, 1986L, 1987L, 1988L, 1989L, 1985L, 
       1986L, 1987L, 1988L, 1989L, 1985L, 1986L, 1987L, 1988L, 1989L
       )))
    

    From this, 2 options:

    • a bug in biglm()

    • an error in the way it is called.

    The answer is the second option: you are passing a formula as a string (and R does only see a string).

    It should have been:

    model_fm = robjects.Formula('y~x+as.factor(fm)-1')
    
  2. Log in to comment