observedMutations breaking with additional columns

Issue #100 wontfix
Robert Amezquita created an issue

A database with additional sample identifier columns prepended to the db object breaks the observedMutations function, resulting in a cbind error.

Code with attachment should reproduce error.

## ---------------------------------------------------
## This code breaks on observedMutations
clones <- collapseClones(sub_db, regionDefinition=IMGT_V, 
                         method="thresholdedFreq", minimumFrequency=0.6,
                         includeAmbiguous=FALSE, breakTiesStochastic=FALSE, 
                         nproc=1)

observed <- observedMutations(clones, 
                              sequenceColumn="CLONAL_SEQUENCE",
                              germlineColumn="CLONAL_GERMLINE",
                              regionDefinition=IMGT_V,
                              nproc=1)

#### `Error in cbind_all(x) : Argument 2 must have names` ####

## ------------------------------------------------
## This code works - drops first two columns
sub_db_1 <- sub_db[, - c(1, 2)]

clones_1 <- collapseClones(sub_db_1, regionDefinition=IMGT_V, 
                         method="thresholdedFreq", minimumFrequency=0.6,
                         includeAmbiguous=FALSE, breakTiesStochastic=FALSE, 
                         nproc=1)

observed_1 <- observedMutations(clones_1, 
                              sequenceColumn="CLONAL_SEQUENCE",
                              germlineColumn="CLONAL_GERMLINE",
                              regionDefinition=IMGT_V,
                              nproc=1)

[thanks to Yisi for finding and for making workaround, blame Rob for adding too many damn columns]

Comments (6)

  1. Jason Vander Heiden

    Does the SEQUENCE_ID column contain duplicate values? Or does the input already have duplicate columns names (possibly duplicate with the output of observedMutations)?

  2. Robert Amezquita reporter

    sum(duplicated(sub_db$SEQUENCE_ID) outputs 0, so nay on that count. The columns that are prepended (ID and flox) have duplicated values (they are specifying the sample ID and a condition, respectively). [also for clones there are no dups in SEQUENCE_ID]

  3. Julian Zhou

    I looked into this since I rewrote these functions. Your call to collapseClones() did not call observedMutations() because the method is not one based on mutation frequency (i.e. mostMutated or leastMutated). It failed because you did not give a proper db to collapseClones(). As the doc specifies, db should be "data.frame containing sequence data. Required."

    Upon examining sub_db from sub_db.rda, sub_db is more than a data.frame, it is:

    > class(sub_db)
    [1] "grouped_df" "tbl_df"     "tbl"        "data.frame"
    

    The reason why the "workaround" worked is because sub_db_1 <- sub_db[, - c(1, 2)] got rid of the front two items somehow.

  4. Robert Amezquita reporter

    @javh you're right, it was the grouping that messed up the procedure, when its ungrouped it works perfectly fine - when you remove the first two columns it gets rid of the "grouped_df" type, similar to an ungroup() call.

  5. Log in to comment