Add subsampling to distToNearest

Issue #110 resolved
Jason Vander Heiden created an issue

Might be good to add subsampling to distToNearest. There is a --subsample argument for the density method of findThreshold, but it's surely quicker to do the subsampling before distance calculation.

Assuming it works well, which we'd have to test.

Also, subsampling in distToNearest should probably be on V/J/junction groups and not the full data set, because the bottleneck is distance matrix calculation on large groups.

Comments (2)

  1. Steven Kleinstein

    Make a maximum size for each VJ-junction group. Subsample the rows of the distance matrix, but keep all columns so that the distance to the nearest is correct for any sequences that we keep.

  2. Log in to comment