- edited description
observedMutations breaking with additional columns
A database with additional sample identifier columns prepended to the db object breaks the observedMutations
function, resulting in a cbind error.
Code with attachment should reproduce error.
## ---------------------------------------------------
## This code breaks on observedMutations
clones <- collapseClones(sub_db, regionDefinition=IMGT_V,
method="thresholdedFreq", minimumFrequency=0.6,
includeAmbiguous=FALSE, breakTiesStochastic=FALSE,
nproc=1)
observed <- observedMutations(clones,
sequenceColumn="CLONAL_SEQUENCE",
germlineColumn="CLONAL_GERMLINE",
regionDefinition=IMGT_V,
nproc=1)
#### `Error in cbind_all(x) : Argument 2 must have names` ####
## ------------------------------------------------
## This code works - drops first two columns
sub_db_1 <- sub_db[, - c(1, 2)]
clones_1 <- collapseClones(sub_db_1, regionDefinition=IMGT_V,
method="thresholdedFreq", minimumFrequency=0.6,
includeAmbiguous=FALSE, breakTiesStochastic=FALSE,
nproc=1)
observed_1 <- observedMutations(clones_1,
sequenceColumn="CLONAL_SEQUENCE",
germlineColumn="CLONAL_GERMLINE",
regionDefinition=IMGT_V,
nproc=1)
[thanks to Yisi for finding and for making workaround, blame Rob for adding too many damn columns]
Comments (6)
-
reporter -
Does the
SEQUENCE_ID
column contain duplicate values? Or does the input already have duplicate columns names (possibly duplicate with the output ofobservedMutations
)? -
reporter sum(duplicated(sub_db$SEQUENCE_ID)
outputs 0, so nay on that count. The columns that are prepended (ID and flox) have duplicated values (they are specifying the sample ID and a condition, respectively). [also for clones there are no dups inSEQUENCE_ID
] -
- changed status to wontfix
I looked into this since I rewrote these functions. Your call to
collapseClones()
did not callobservedMutations()
because the method is not one based on mutation frequency (i.e.mostMutated
orleastMutated
). It failed because you did not give a properdb
tocollapseClones()
. As the doc specifies,db
should be "data.frame containing sequence data. Required."Upon examining
sub_db
fromsub_db.rda
,sub_db
is more than a data.frame, it is:> class(sub_db) [1] "grouped_df" "tbl_df" "tbl" "data.frame"
The reason why the "workaround" worked is because
sub_db_1 <- sub_db[, - c(1, 2)]
got rid of the front two items somehow. -
Ah, you probably need to do
ungroup(sub_df)
first. -
reporter @javh you're right, it was the grouping that messed up the procedure, when its ungrouped it works perfectly fine - when you remove the first two columns it gets rid of the "grouped_df" type, similar to an
ungroup()
call. - Log in to comment