Error in {: task 658 failed - "missing value where TRUE/FALSE needed"
Issue #154
resolved
Greetings,
Thanks for the highly useful shazam
package. I've been using it and encountered an error with collapseClones
when running (the repertoire file repertoire_sample.tsv
is attached):
repertoire = read.table("repertoire_sample.tsv", sep='\t')
clonal_sequences = shazam::collapseClones(
db = repertoire[, c("sequence_alignment", "germline_alignment", "clone_id")],
cloneColumn="clone_id",
sequenceColumn="sequence_alignment",
germlineColumn="germline_alignment",
regionDefinition=NULL, #shazam::IMGT_V,
method="thresholdedFreq", minimumFrequency=0.6,
includeAmbiguous=FALSE, breakTiesStochastic=FALSE,
nproc=12)
The error log is:
Error in {: task 658 failed - "missing value where TRUE/FALSE needed"
Traceback:
1. shazam::collapseClones(db = repertoire[1:5000, c("sequence_alignment",
. "germline_alignment", "clone_id")], cloneColumn = "clone_id",
. sequenceColumn = "sequence_alignment", germlineColumn = "germline_alignment",
. regionDefinition = NULL, method = "thresholdedFreq", minimumFrequency = 0.6,
. includeAmbiguous = FALSE, breakTiesStochastic = FALSE, nproc = 1)
2. foreach(idx = 1:length(uniqueClonesIdx), .verbose = FALSE, .errorhandling = "stop") %dopar%
. {
. cloneIdx <- uniqueClonesIdx[[idx]]
. cloneDb <- db[cloneIdx, ]
. clone_num <- unique(cloneDb[["fields_clone_id"]])
. if (length(unique(cloneDb[[juncLengthColumn]])) > 1) {
. stop("Expecting all sequences in the same clone with the same junction lenght.")
. }
. cloneRegionDefinition <- regionDefinition
. if (regionDefinitionName %in% c("IMGT_VDJ_BY_REGIONS",
. "IMGT_VDJ")) {
. cloneRegionDefinition <- getCloneRegion(clone_num = clone_num,
. db = cloneDb, seq_col = sequenceColumn, juncLengthColumn = juncLengthColumn,
. clone_col = "fields_clone_id", regionDefinition = regionDefinition)
. }
. calcClonalConsensus(db = cloneDb, sequenceColumn = sequenceColumn,
. germlineColumn = germlineColumn, muFreqColumn = muFreqColumn,
. regionDefinition = cloneRegionDefinition, method = method,
. minimumFrequency = minimumFrequency, includeAmbiguous = includeAmbiguous,
. breakTiesStochastic = breakTiesStochastic, breakTiesByColumns = breakTiesByColumns)
. }
3. e$fun(obj, substitute(ex), parent.frame(), e$data)
The call doesn’t return an error if I significantly reduce the input db size:
clonal_sequences = shazam::collapseClones(
db = repertoire[1:10, c("sequence_alignment", "germline_alignment", "clone_id")],
cloneColumn="clone_id",
sequenceColumn="sequence_alignment",
germlineColumn="germline_alignment",
regionDefinition=NULL, #shazam::IMGT_V,
method="thresholdedFreq", minimumFrequency=0.6,
includeAmbiguous=FALSE, breakTiesStochastic=FALSE,
nproc=12)
Even though I could run it successfully if I reduce the size of repertoire
, I would like to understand what is the source of the error. It would also be useful to add an informative error message.
Thank you and looking forward,
Eugen
Comments (2)
-
-
- changed status to resolved
Yes, there are NAs in one of the columns. I have added an informative error message to the devel vesion of shazam, in 70b19e0.
With your example data, you will now get something like this:
Error in shazam::collapseClones(db = repertoire[, c("sequence_alignment", : NA values found in column(s): germline_alignment. 1 sequence(s) affected.
- Log in to comment
Hi Eugen,
I haven’t been able to check your file yet, but could you see if any of the input columns (especially the junction or junction_length column) contain missing data or NA values? That would be my guess based on the error message.
Best,
-Ken