plotting genomic features to the margins of an ExtendableHeatmap should share a common, reusable API

Issue #55 new
Thomas Gilgenast created an issue

currently ChipSeqExtendableHeatmap, GeneExtendableHeatmap, SNPExtendableHeatmap, MotifExtendableHeatmap, and BedExtendableHeatmap all contain lots of duplicated logic

all of these features do the same fundamental operation: they take some genomic features and plot some shapes in the margin at positions dictated by those features

it sounds like this pattern could be extracted to make it easier to write new features that are similar to this

it's clear that there's incentive to make small tweaks to the existing features since that's basically how SNPExtendableHeatmap, MotifExtendableHeatmap, and BedExtendableHeatmap arose

by exposing a common API for this, all features that use this API could benefit from improvements such as #28 without us having to rewrite all the features every time there's an improvement

this is relatively minor because the tech demos that address #28 come with major speedups to ChipSeqExtendableHeatmap and GeneExtendableHeatmap which are the two most commonly used features

Comments (8)

  1. Thomas Gilgenast reporter

    ExtendableHeatmap has become one of the most-used parts of the library and people are increasingly interested in extending it - this deserves a priority bump

  2. Thomas Gilgenast reporter

    one design option here is to make the plotting “compositional“ by creating functions like draw_rectangles(ax, features, color='k', ...) that return the axis they draw on

    this comment on #79 suggested creating a MarginAxis class that could be used to chain bound method calls - in this context, functions like draw_rectangles() could also be bound methods of MarginAxis, leading to calls like

    h.add_margin_ax(...)\
        .draw_rectangles(features)\
        .chipseq_style()\
        .label('my label')
    

    then the final piece of the puzzle would be to allow add_margin_ax() to accept a tuple of loc values (and return a list of MarginAxes added in all requested locations)

    the only current margin axis plotter that would not fit this pattern is the gene stack (and refgene stack) plotters, because they don’t know how many tracks will be in the stack until after the genes are loaded - these functions could use this pattern internally however:

    def add_refgene_stack(self, assembly, loc='bottom', **kwargs):
        orientation = 'h' if loc in {'top', 'bottom'} else 'v'
        genes = load_refgenes(
            assembly,
            self.grange_x if orientation == 'h' else self.grange_y
        )
        rows = pack_genes(genes)
        axes = []
        for row in rows:
            axes.append(
                self.add_margin_ax(**kwargs)
                    .draw_genes(row, **kwargs)
            )
        return axes
    

    where with load_refgenes() we are referring to #46, draw_genes() represents one of these bound methods on MarginAxes that returns the MarginAxes it draws on,

    callers can then call h.add_refgene_stack('mm9').some_style().label('genes')

    in this case it seems correct to make MarginAxes a subclass of pyplot.Axes, this way users can call things like scatter() on it directly, though we may have to override the return value to allow chaining; the alternative is to write __getattr__() to check if the underlying pyplot.Axes object has the requested attribute and pass that through

  3. Thomas Gilgenast reporter

    subclassing plt.Axes and then automatically overriding first-level calls to the superclass’s methods seems a bit tricky

    a third option is to write

    def force_return(value):
        """
        Decorator factory to create decorators that override the return value of
        decorated functions.
    
        Parameters
        ----------
        value : Any
            The value that will be returned.
    
        Returns
        -------
        function
            The decorator.
    
        Examples
        --------
        >>> # modify sum() so that it always returns 5
        >>> force_return(5)(sum)([2, 2])
        5
        """
        def dec(f):
            @wraps(f)
            def new_f(*args, **kwargs):
                f(*args, **kwargs)
                return value
            return new_f
        return dec
    
    
    class MarginAxes(object):
        """
        Wrapper around a ``pyplot.Axes`` instance providing custom plotting
        functions and function chaining.
    
        Examples
        --------
        >>> ax = plt.axes()
        >>> ma = MarginAxes(ax)
        >>> ma.sayhi().sayhi()
        hi
        hi
        <lib5c.plotters.extendable.margin_axes.MarginAxes object at ...>
        >>> ma.scatter([1], [0]).sayhi()
        hi
        <lib5c.plotters.extendable.margin_axes.MarginAxes object at ...>
        >>> ma.scatter.__doc__ == ax.scatter.__doc__
        True
        """
        def __init__(self, ax):
            self.ax = ax
    
        def __getattr__(self, name):
            attr = getattr(self.ax, name)
            if callable(attr):
                return force_return(self)(attr)
            return attr
    
        def sayhi(self):
            print('hi')
            return self
    

  4. Thomas Gilgenast reporter

    if we go down this road I think the plan is to use the mixin strategy we used for ExtendableHeatmap to add functionality to MarginAxes

  5. Thomas Gilgenast reporter

    perhaps more important than the syntactic sugar prototyped above is the actual design of composition-friendly signatures for the actual plotting functions

    one early example is seen in this notebook: https://colab.research.google.com/drive/1j_G5ObGQZOGWmaFdlqZHHEizJD7q3B5U

    def plot_chipseq(ax, starts, ends, heights, transpose=False, **kwargs):
        zeros = np.zeros(len(starts))
        verts = np.array([[starts, zeros],
                          [ends, zeros],
                          [ends, heights],
                          [starts, heights]]).transpose((2, 0, 1))
        if transpose:
            verts = verts[:, :, ::-1]
        ax.add_collection(collections.PolyCollection(verts, **kwargs)) 
    

    this cuts chipseq track plotting down to 9 lines of code whose complexity is mostly dictated by the signature of PolyCollection, whose data inputs are plain numpy arrays, and which can be used on any matplotlib axis

    the plotters should probably

    • take ax as the first positional arg
    • take other args that describe the data to be plotted as numpy arrays when possible
    • accept a transpose kwarg
    • pass through **kwargs to keep the function definitions lightweight

    under this design:

    • setting track limits or mandating the units on starts and ends is “someone else’s problem“

      • e.g., ExtendableHeatmap.add_margin_ax() already sets the limits to basepair units for you, while the notebook shows how you can do everything manually in arbitrary units
    • the plotter can be used outside the ExtendableHeatmap context, including in standalone figures and in gridspec layouts as shown in the notebook

  6. Thomas Gilgenast reporter

    an additional detail for designing the signatures of these methods: if they take the matplotlib axes as a kwarg instead of as the first positional arg, they could be decorated with @plotter and generally exist at the level of a fully independent plotter entirely outside the scope of the extendable heatmap plotting subpackage

    this takes us in the direction of our ultimate goal which is to both simplify the margin plotters and simultaneously decouple them from the ExtendableHeatmap class heirarchy. if this goal is accomplished, the functionality that remains in the ExtendableHeatmap class will be limited to adding margin axes (via the matplotlib divider, handled within ExtendableFigure) and setting the heatmap-aligned axes limits (in the units requested)

    my current thinking is that it would be better to first design this system of base plotting functions (extracting functionality out of the current ExtendableHeatmap subclasses into fully-decoupled and simplified plotters) and only then begin brainstorming designs and testing implementations for the “syntactic sugar“ that would allow nice client experiences such as chaining, etc.

  7. Log in to comment