findthreshold()
Issue #69
resolved
Dear folks!
I run findthreshold function on a dataset of ~30,000 sequences and it takes me several hours -I have no choice but run it overnight. Do you encounter also long running of this function?
Sivan
Comments (3)
-
-
- marked as enhancement
-
- changed status to resolved
We've added a new Guassian Mixture Model method to the
findThreshold
function (method="gmm"
) which is significantly faster than the old smoothed density estimate. This is now the default. - Log in to comment
Unfortunately, yes, it is very slow. The slow step is in finding the appropriate bandwidth parameter for smoothing the distance-to-nearest distribution. I did check that the results are very robust by subsampling down to 15,000 sequences, but it may be possible to subsample even further, if you want to explore that.