findthreshold()

Issue #69 resolved
sivagnon@gmail.com created an issue

Dear folks!

I run findthreshold function on a dataset of ~30,000 sequences and it takes me several hours -I have no choice but run it overnight. Do you encounter also long running of this function?

Sivan

Comments (3)

  1. Namita Gupta

    Unfortunately, yes, it is very slow. The slow step is in finding the appropriate bandwidth parameter for smoothing the distance-to-nearest distribution. I did check that the results are very robust by subsampling down to 15,000 sequences, but it may be possible to subsample even further, if you want to explore that.

  2. Jason Vander Heiden

    We've added a new Guassian Mixture Model method to the findThreshold function (method="gmm") which is significantly faster than the old smoothed density estimate. This is now the default.

  3. Log in to comment