collapseDuplicates edge case (every sequence in a clone_id is ambiguous except for one sequence)

Issue #78 resolved
Roy Jiang created an issue

Attached minimal example files.

Example behavior…

library(alakazam)
library(dplyr)

sub_df <- read.table("example.tab", sep = '\t', header = T) 

TEXT_FIELDS <- c("PRCONS")
NUM_FIELDS <- c("CONSCOUNT", "DUPCOUNT")
SEQ_FIELDS <- c("SEQUENCE_INPUT", "JUNCTION")

collapseDuplicates(sub_df, id = "SEQUENCE_ID", seq = "SEQUENCE_IMGT",
text_fields = TEXT_FIELDS, num_fields = NUM_FIELDS, seq_fields = SEQ_FIELDS,
add_count = TRUE, ignore = c("N", "-", ".", "?"), sep = ",",
verbose = FALSE)

Error…

Error in if (taxa %in% done_taxa) {: argument is of length zero
Traceback:

1. collapseDuplicates(sub_df, id = "SEQUENCE_ID", seq = "SEQUENCE_IMGT", 
 .     text_fields = TEXT_FIELDS, num_fields = NUM_FIELDS, seq_fields = SEQ_FIELDS, 
 .     add_count = TRUE, ignore = c("N", "-", ".", "?"), sep = ",", 
 .     verbose = FALSE)

Likely cause is line 504 of Sequence.R

if (discard_count == nrow(d_mat)) {

We currently only stop analysis if all the sequences are discarded/ambiguous. However, if all but one sequence is discarded/ambiguous, we should also stop analysis. ie.

if (discard_count == nrow(d_mat) | discard_count + 1 == nrow(d_mat) ) {

Comments (4)

  1. Roy Jiang reporter

    Also, apparently Bitbucket only allows one attachment per issue. So email me if you want the full analysis.

  2. Log in to comment