Adding indel support to buildPhylipLineage

Issue #54 resolved
Eric Mukherjee
created an issue

It'd be super helpful to add indel support to the buildPhylipLineage function (maybe using the SEQUENCE_INPUT column?) Our data has a ton of indels (both real and erroneous) and we're really interested in seeing how indels increase diversity in our lineages of interest; would be happy to talk more about this by email or whatever.

Comments (5)

  1. Jason Vander Heiden

    Hey @Eric Mukherjee,

    I added a mask_char option to makeChangeoClone and a dist_mat option to buildPhylipLineage, which will, hopefully, let you count indels as mismatches.

    For example:

    db <- subset(ExampleDb, CLONE == 3138)
    clone <- makeChangeoClone(db, 
        text_fields=c("SAMPLE", "ISOTYPE"), 
        num_fields="DUPCOUNT", mask_char="-")
    
    dnapars_exec <- "~/apps/phylip-3.69/dnapars"
    graph <- buildPhylipLineage(clone, dnapars_exec, 
        dist_mat=getDNAMatrix(gap=-1), rm_temp=TRUE)
    

    This:

    1. Replaces all dot/dash character with dashes (so keeping alignment gaps and converting IMGT gaps to dashes).
    2. Treats indels (runs of different length dashes) of any length as a single mismatch (gap=-1). If you instead set gap=1, then this will count each individual gap character as a mismatch (if it's not a gap in both sequences).

    I didn't test extensively, but let me know if this accomplishes what you wanted.

  2. Eric Mukherjee reporter

    Hey Jason, thanks for all your help. I tried AlignRecords just now but it didn't work; can I send you my data and the error message to see if its something you can fix?

    Thanks!

  3. Log in to comment