would heatmap plotting be faster if we used PatchCollection?

Issue #28 closed
Thomas Gilgenast created an issue

currently, patches are added with a separate call to ax.add_artist() for each Patch (Rectangle or Polygon) we add

we are wondering if this would be any faster if we instead collect a list of references to all the Patches we want to draw, create a PatchCollection from this list, and then add that PatchCollection to the ax with a single call to ax.add_collection()

patches = []
triangle = mpl.patches.Polygon(np.array([[x_1, y_1], [x_2, y_2], [x_3, y_3]])), True)
p = mpl.collections.PatchCollection(patches)

Comments (5)

  1. Thomas Gilgenast reporter

    PatchCollection is definitely faster (~3x), and PolyCollection is even faster (~20x) according to this quick test drawing 10,000 rectangles on a chipseq track

    gene tracks plotted with a combination of PolyCollection and LineCollection are maybe 40% faster


    gene track plotting still feels slow. profiling reveals that the gene table loading is the most expensive step. actually, the gene table is loaded twice - once for each axis. pre-loading the gene table gives a 10x speedup (total time under half a second for a 20 Mb region). we will need to look into this separately, perhaps as part of #46

  2. Thomas Gilgenast reporter

    as a follow-up, we proposed querying UCSC on-the-fly as a solution to #46 and this improves speed to 1.35 s for plotting a 20 Mb region (a 4x speedup overall, including database connection and access time)

    the cells at the bottom of the notebook linked above show an example of this

    the ugliest step is now converting the DataFrame of genes to a nested dict structure for use with the existing gene plotting code, but we will probably leave this as-is

  3. Thomas Gilgenast reporter

    we should be able to apply the same concept of using Collections to speed up all the plotters (not just chipseq and gene track plotting), but we're going to defer that to be addressed by #55

  4. Log in to comment