custom import hook for R package

Issue #336 open
Antony Lee
created an issue

A rather minor itch, which may or may not be worth implementing.

Writing foo = importr("foo") looks slightly unpythonic to me. With the flexibility of Python's import system (e.g. https://github.com/njsmith/metamodule) it should be possible to provide a special package (e.g. rpy2.rpackages) so that from rpy2.rpackages import foo does "the right thing".

Comments (15)

  1. Laurent Gautier

    While importr predates the acquired flexibility of Python's import system there is definitely something to do in that direction (there are initial steps, for instance the Package object returned by importr inherits from ModuleType).

    I suspect that the devil will be in details such as what happens when a user "detaches" an R package, for example...

    Pull request anyone ?

  2. Greg Werbin

    IMO R objects loaded into Python from a package should always act as though they were called with :: syntax.

    That is, if I do something like

    cluster = importr('cluster')
    cluster.pam(x)
    

    then it should behave as if, in R, I ran:

    cluster::pam(x)
    

    This would make the issue of attachment somewhat moot. I haven't dug into the implementation details myself, but this is the behavior I would expect from RPy2 regardless of how cluster was imported.

  3. Laurent Gautier

    IMO R objects loaded into Python from a package should always act as though they were called with :: syntax.

    I am not sure to follow. What would make one think that this is not what we have with importr ?

  4. Laurent Gautier

    Consider the following:

    cluster = importr('cluster')
    
    ## the following is identical to `pam <- cluster::pam`
    pam = cluster.pam
    

    Where things gets more complicated is what happens when one "detaches" the package cluster on the R side install a new version of the R package. This doesn't mean that this is untractable, just that I think that some thought and experimentation(*) should be made... and there is no one like the person proposing the idea to do that ! ;-)

    (*: yep, that means demonstrate through an implementation)

  5. Antony Lee reporter

    Coming back to the original issue: in fact, the feature I requested is trivial to implement once you have metamodule as a dependency: create an rpy/robjects/rpackages.py with the following contents:

    from types import ModuleType
    import metamodule
    
    
    class RImporter(ModuleType):
        def __metamodule_init__(self): # Minor issue in metamodule, reported upstream.
            pass
    
        __path__ = None # This is not a package.
    
        _cache = {} # Not sure about the cache.
    
        def __getattr__(self, name):
            try:
                return self._cache[name]
            except KeyError:
                from .packages import importr
                return self._cache.setdefault(name, importr(name))
    
    
    metamodule.install(__name__, RImporter)
    del ModuleType, RImporter, metamodule
    

    and magic happens...

    >>> from rpy2.robjects.rpackages import ggplot2
    >>> ggplot2
    rpy2.robjects.packages.Package as a <module 'ggplot2'>
    

    or

    >>> import rpy2.robjects.rpackages
    >>> rpy2.robjects.rpackages.ggplot2
    rpy2.robjects.packages.Package as a <module 'ggplot2'>
    

    I'm not sure about the caching because I don't really understand R's import model; for example if importing is cheap and detaching is a problem then it may be best to simply not cache the import results.

    Please let me know if you'd be willing to merge in such a feature.

  6. Laurent Gautier

    I am heading for release rpy2-2.8.0 in about 2 weeks and I feel that it is a relatively short time to be working on this for that release, but let look at it for rpy2 2.9.0.

  7. Antony Lee reporter

    I think the code snippet it ready to go in as it is if I remove the caching (in which case it is exactly syntactic sugar for importr, with the same implications w.r.t. detaching, if any).

  8. Laurent Gautier

    I will create the branch version_2.8.x soon, and bump the branch default to become the future 2.9.0.

    We can start looking into including it right after. I was thinking that while we are at it we way may want to make optional behavior such automatic download/installation of R packages (that is if the R package is missing it tried download and install it) easy to have, and may also think about the handling non-trivial name conversions (see http://rpy2.readthedocs.io/en/version_2.7.x/robjects_rpackages.html#importing-r-packages)

  9. Antony Lee reporter

    Do you want to create a dependency on metamodule, or just include the source itself? (It's a self-contained 160-line module.)

    Name conversions could be handled with a syntax like

    with rpy2.robjects.rpackages.renames(...):
        from rpy2.robjects.rpackages import ...
    

    Automatic download/installation would likely be a global flag? (I don't see why you'd want to switch it module by module.)

  10. Laurent Gautier

    I am not familiar with metamodule. Is its purpose to mostly backport functionalities available in Python >= 3.5 ?

    If so, I'd be fine with a Python 3.5-only feature (I am removing workarounds that only existed for Python 2.6, no longer supported with rpy2, and I can't wait wait to drop Python 2.7 and clean the headache of supporting unicode, str and bytes) .

  11. Antony Lee reporter

    Python 3.5 basically lets you write

    import sys; sys.modules[__name__].__class__ = RImporter
    

    instead of

    import metamodule; metamodule.install(__name__, RImporter)
    

    so the gain is not that big (other than avoiding a dependency). I don't really have a preference, it's up to you.

    I still don't understand uses of detaching packages, but I guess this could be supported as del rpy2.robjects.rpackages.<foo> too.

  12. Log in to comment