Adding indel support to buildPhylipLineage

Issue #54 resolved
Eric Mukherjee created an issue

It'd be super helpful to add indel support to the buildPhylipLineage function (maybe using the SEQUENCE_INPUT column?) Our data has a ton of indels (both real and erroneous) and we're really interested in seeing how indels increase diversity in our lineages of interest; would be happy to talk more about this by email or whatever.

Comments (5)

  1. Jason Vander Heiden

    Hey @emukherj,

    I added a mask_char option to makeChangeoClone and a dist_mat option to buildPhylipLineage, which will, hopefully, let you count indels as mismatches.

    For example:

    db <- subset(ExampleDb, CLONE == 3138)
    clone <- makeChangeoClone(db, 
        text_fields=c("SAMPLE", "ISOTYPE"), 
        num_fields="DUPCOUNT", mask_char="-")
    
    dnapars_exec <- "~/apps/phylip-3.69/dnapars"
    graph <- buildPhylipLineage(clone, dnapars_exec, 
        dist_mat=getDNAMatrix(gap=-1), rm_temp=TRUE)
    

    This:

    1. Replaces all dot/dash character with dashes (so keeping alignment gaps and converting IMGT gaps to dashes).
    2. Treats indels (runs of different length dashes) of any length as a single mismatch (gap=-1). If you instead set gap=1, then this will count each individual gap character as a mismatch (if it's not a gap in both sequences).

    I didn't test extensively, but let me know if this accomplishes what you wanted.

  2. Eric Mukherjee reporter

    Hey Jason, thanks for all your help. I tried AlignRecords just now but it didn't work; can I send you my data and the error message to see if its something you can fix?

    Thanks!

  3. Jason Vander Heiden

    Sure, please email me the data, error message and command used and I'll see if I can figure out what's wrong.

  4. Log in to comment