Need UID clustering script

Issue #23 resolved
Jason Vander Heiden created an issue

UID errors seem to be a bigger problem that anticipated, particularly with high reads/UID data sets. We need to add a tool to create a cluster annotations by clustering UIDs which can then be specified in the --bf flag for BuildConsensus. Possibly after having been run through ClusterSets first. Might even be best to make it an option or subcommand of ClusterSets.

Comments (3)

  1. Jason Vander Heiden reporter

    Needs metrics for cluster error/diversity for logging/tuning/plotting purposes.

  2. Jason Vander Heiden reporter

    Can try usearch clustering. May also want to set a k-mer based pre-grouping threshold, following by hierarchical clustering. Maybe complete linkage. Or some sort of centroid linkage where the centroid is based on the UID with the maximum read read count.

  3. Log in to comment