Add subsampling to distToNearest
Issue #110
resolved
Might be good to add subsampling to distToNearest
. There is a --subsample
argument for the density
method of findThreshold
, but it's surely quicker to do the subsampling before distance calculation.
Assuming it works well, which we'd have to test.
Also, subsampling in distToNearest
should probably be on V/J/junction groups and not the full data set, because the bottleneck is distance matrix calculation on large groups.
Comments (2)
-
-
- changed status to resolved
Resolved here 424709f
- Log in to comment
Make a maximum size for each VJ-junction group. Subsample the rows of the distance matrix, but keep all columns so that the distance to the nearest is correct for any sequences that we keep.