collapseDuplicates edge case (every sequence in a clone_id is ambiguous except for one sequence)
Issue #78
resolved
Attached minimal example files.
Example behavior…
library(alakazam)
library(dplyr)
sub_df <- read.table("example.tab", sep = '\t', header = T)
TEXT_FIELDS <- c("PRCONS")
NUM_FIELDS <- c("CONSCOUNT", "DUPCOUNT")
SEQ_FIELDS <- c("SEQUENCE_INPUT", "JUNCTION")
collapseDuplicates(sub_df, id = "SEQUENCE_ID", seq = "SEQUENCE_IMGT",
text_fields = TEXT_FIELDS, num_fields = NUM_FIELDS, seq_fields = SEQ_FIELDS,
add_count = TRUE, ignore = c("N", "-", ".", "?"), sep = ",",
verbose = FALSE)
Error…
Error in if (taxa %in% done_taxa) {: argument is of length zero
Traceback:
1. collapseDuplicates(sub_df, id = "SEQUENCE_ID", seq = "SEQUENCE_IMGT",
. text_fields = TEXT_FIELDS, num_fields = NUM_FIELDS, seq_fields = SEQ_FIELDS,
. add_count = TRUE, ignore = c("N", "-", ".", "?"), sep = ",",
. verbose = FALSE)
Likely cause is line 504 of Sequence.R
if (discard_count == nrow(d_mat)) {
We currently only stop analysis if all the sequences are discarded/ambiguous. However, if all but one sequence is discarded/ambiguous, we should also stop analysis. ie.
if (discard_count == nrow(d_mat) | discard_count + 1 == nrow(d_mat) ) {
Comments (4)
-
reporter -
reporter Alternatively…
if (nrow(d_mat) - discard_count <= 1) {
-
reporter -
assigned issue to
-
assigned issue to
-
reporter - changed status to resolved
Resolved with commit fdc30b8
- Log in to comment
Also, apparently Bitbucket only allows one attachment per issue. So email me if you want the full analysis.