Need UID clustering script
Issue #23
resolved
UID errors seem to be a bigger problem that anticipated, particularly with high reads/UID data sets. We need to add a tool to create a cluster annotations by clustering UIDs which can then be specified in the --bf
flag for BuildConsensus. Possibly after having been run through ClusterSets first. Might even be best to make it an option or subcommand of ClusterSets.
Comments (3)
-
reporter -
reporter Can try usearch clustering. May also want to set a k-mer based pre-grouping threshold, following by hierarchical clustering. Maybe complete linkage. Or some sort of centroid linkage where the centroid is based on the UID with the maximum read read count.
-
reporter - changed status to resolved
Added as ClusterSets subcommands.
- Log in to comment
Needs metrics for cluster error/diversity for logging/tuning/plotting purposes.