--md option for BuildTrees.py cannot accept more than one field

Issue #154 resolved
Roy Jiang created an issue

This appears to be because of an error in the way in which the GERM sequence is handled in the Newick tree. The error is only encountered when the sequence is read back in using readIgphyml.

Recommendation: 1. warn or error when multiple --md fields are given 2. concatenate --md fields and then separate them out later or not include the --md fields in the fasta headers at all (use a hashmap concept).

Example error here where the two --md fields are STATUS and CREGION.

BuildTrees.py -d /tmp/RtmpvprCSd/1f15ff111827c6.tab --md CREGION STATUS --collapse --igphyml --outdir /tmp/RtmpvprCSd

[ruoyi@localhost ~]$ cat /tmp/RtmpvprCSd/1f15ff111827c6_igphyml-pass.tab
CLONE NSEQ NSITE TREE_LENGTH LHOOD KAPPA_MLE OMEGA_FWR_MLE OMEGA_CDR_MLE WRC_2_MLE GYW_0_MLE WA_1_MLE TW_0_MLE SYC_2_MLE GRS_0_MLE TREE
REPERTOIRE 17 121.0000 0.0794 -269.7126 1.1754 0.9969 0.7190 -0.9900 0.3775 -0.9595 -0.9900 -0.9900 0.1944 `igphyml --repfile /tmp/RtmpvprCSd/1f15ff111827c6_lineages_gy.tsv -m HLP --run_id hlp --threads 1 -o lr --omega e,e -t e --motifs WRC_2:0,GYW_0:1,WA_1:2,TW_0:3,SYC_2:4,GRS_0:5 --hotness e,e,e,e,e,e --oformat tab --outname /tmp/RtmpvprCSd/1f15ff111827c6_igphyml-pass.tab `
64 12 126 0.2382 -199.7973 1.1754 0.9969 0.7190 -0.9900 0.3775 -0.9595 -0.9900 -0.9900 0.1944 (((GTTGTTAGGGGAATTAT_IgM_Blood:0.0000004587,GGGAAACCACAACAGAT_IgM_Blood:0.1004575634):0.0081976464,(((((GAAGATCCTAGACAAAA_IgM_Blood:0.0081956606,ACTACTGCTTATAACAG_IgM_Blood:0.0082185742):0.0000006321,CGGTTAACATGAAGTA_IgM_EM:0.0000000100):0.0000006623,TAAGATTTCAATCGGAT_IgG_Blood:0.0000000100):0.0000005433,TTAGTAGCGAAAAGGTG_IgM_Blood:0.0000000100):0.0000008593,((((TTTTGATGTTCAGGCGC_IgM_Blood:0.0475059979,TGGTCTGTGCCTCTAAC_IgM_Blood:0.0357340864):0.0001482744,CTCAATATTTCAACGGG_IgM_Blood:0.0101405192):0.0003165248,CACCGAACGTTTTATGC_IgM_Blood:0.0101843372):0.0008570526,TCGGTTGTGTATCTGAC_IgM_Blood:0.0081070635):0.0000924770):0.0000009152):0.0000000100,64_GERM_GERM:0.0000100000);
1404 3 116 0.0000 -17.1915 1.1754 0.9969 0.7190 -0.9900 0.3775 -0.9595 -0.9900 -0.9900 0.1944 (((GCCTTTGTAACATTCTT_IgM_Blood:0.0000004398,GACGGATAACGTTAGTA_IgG_Blood:0.0000004398):0.0000000100,ACGCCGACATGTTGAC_IgM_EM:0.0000004398):0.0000004398,1404_GERM_GERM:0.0000100000);
543 2 121 0.0000 -52.7238 1.1754 0.9969 0.7190 -0.9900 0.3775 -0.9595 -0.9900 -0.9900 0.1944 ((CTAAGTATACATTAGTG_IgM_Blood:0.0000004398,CTCTACGTCTTTAGGG_IgM_EM:0.0000004398):0.0000004398,543_GERM_GERM:0.0000100000);

trees_df <- readIgphyml('/tmp/RtmpvprCSd/1f15ff111827c6_igphyml-pass.tab')

Error in root.phylo(phy = tree, outgroup = germid, resolve.root = T, edge.label = TRUE): specified outgroup not in labels of the tree
Traceback:

1. readIgphyml("/tmp/RtmpvprCSd/1f15ff111827c6_igphyml-pass.tab")
2. rerootGermline(tree, paste0(df[["CLONE"]][i], "_GERM"), resolve = TRUE)
3. ape::root(phy = tree, outgroup = germid, resolve.root = T, edge.label = TRUE)
4. root.phylo(phy = tree, outgroup = germid, resolve.root = T, edge.label = TRUE)
5. stop("specified outgroup not in labels of the tree")

Comments (2)

  1. Log in to comment