controversial result for the same input data

Issue #3 resolved
echo created an issue

Hi, I’m using dowser to polt Phylogenetic trees, however, I got totally different plot by re-running the same code and same input data, These two figures implicate different differentiate ways. How could this happen?

Comments (9)

  1. echo reporter

    In the first figure, it seems like that the clone of ln is differentiated from the clone of brain, but in figure 2, it has the totally different meaning.

  2. Kenneth Hoehn

    What are the specific commands you’re using to format the clones, build the tree, and plot the figure?

  3. echo reporter

    I use CreatGermlines.py and DefineClones.py (--act set --model ham --norm len --dist 0.16) in Change-O to creat germline_d_mask and cluster clones, and then use the clone clustered files into dowser.

    Codes in R are like this:

    data<-read.delim("combine_db-pass_germ-pass_clone-pass.tsv")
    try = data[data$clone_id %in% "49",]

    clones <- formatClones(try, trait="sample")

    trees <- getTrees(clones)

    plots <- plotTrees(trees, tips="sample", tip_palette="Set1",tipsize=4)

    plots[[1]]

  4. echo reporter

    I try it again (with same input and same code), sometimes the circle in the same line with germline can be like this:

    So, in this figure, can I also say clone from ln is from clone of brain? And I’m very confused, is it better to use igphyml rather than dowser to elucidate evolutionary signatures among BCR clones? (For me, I want to know the differentiate ways of brain B cells, are they from ln? )

  5. Kenneth Hoehn

    I’m guessing this is due to uncertainty in the tree topology, with multiple trees being equally likely. A simple explanation is that most mutations in these sequences are in the CDR3. Because the germline CDR3 is not known, all of these trees would be equally. You can check this by looking at the sequence_alignment and germline_alignment_d_mask columns for that clone. You could also remove the CDR3 from the sequences by using:

    try = maskSequences(try)

    clones <- formatClones(try, trait="sample",seq=”sequence_masked”)

    trees <- getTrees(clones)

    If all the mutations are in the CDR3, all sequences would be the same and the trees would be flat with the above commands. It’s possible there is other ambiguity in the possible trees though.

    The different tree building methods should give similar tree topologies. It’s an open question as to which is better in general for building trees.

    In general, you can’t just look at a tree and tell where the clone originated, especially if it’s a very small tree. Trees are a model of mutation events leading to the sequences. They don’t always reflect the pattern of migrations, and there is usually a lot of uncertainty in the sampling and tree building. We’re actively developing methods for using the trees for this purpose, and we’ve published one of them: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009885 which detailed them in a tutorial here: https://dowser.readthedocs.io/en/latest/vignettes/Discrete-Trait-Vignette/.

  6. Log in to comment