binning could be faster in certain cases

Non-overlapping equal sized bins are easy to compute, but bins with arbitrary overlap seem to be more complicated. However, many practical bin parameter choices have the property that they could be computed by doing non-overlapping binning on a finer grid, then performing a convolution over that finer grid.

For example, 4/12 binning (4 kb wide bins, smoothing window side length 12 kb) can be computed by first binning on a 4 kb grid, then convolving with np.ones((3, 3)). To compute a mean, you can compute two quantities on the initial grid: running sum (or sum of logged counts if doing gmean) and total number of FFLJs. After passing both of these quantities through the convolution, the mean is just the ratio of the two quantities.

We may continue thinking about whether or not a similar approach could work when the smoothing window width is not an odd integer multiple of the bin resolution, but one workaround could be to pick the smoothing window width first and just adapt the bin width as needed (since bin width isn't as important, as long as it's small enough). For example, 4/16 binning could be rounded to 3.2/16.

Comments (3)