- edited description
CreateGermlines gives warning about IMGT-numbering spacers
Hello! I am trying to run the Immcantation pipeline on some data I have, everything goes smoothly up to CreateGermlines, that gives the warning:
WARNING> Germline reference sequences do not appear to contain IMGT-numbering spacers. Results may be incorrect.
and no sequences pass this step.
The germline files I am using are the ones downloaded via the fetchimgt.sh script in the webinar folder and the V sequences generated by TIgGER with findNovelAlleles, inferGenotype, genotypeFasta and writeFasta (which seem to have all the IMGT-numbering spacers one could hope for).
Am I doing something silly I have not realized?
Command I am using and output:
CreateGermlines.py -d WTCHG_460561_701501_igh_genotyped_clone-pass.tab -r ../data/imgt/human/vdj/*IGH[DJ].fasta genotype/WTCHG_460561_701501_v_genotype.fasta -g full --cloned --vf V_CALL_GENOTYPED --failed --log WTCHG_460561_701501_CG.log --cloned
START> CreateGermlines FILE> WTCHG_460561_701501_igh_genotyped_clone-pass.tab GERM_TYPES> full SEQ_FIELD> SEQUENCE_IMGT V_FIELD> V_CALL_GENOTYPED D_FIELD> D_CALL J_FIELD> J_CALL CLONED> True CLONE_FIELD> CLONE
PROGRESS> 11:34:53 |Sorting by clone | 0.0 min PROGRESS> 11:34:56 |Done | 0.0 min
PROGRESS> 11:34:58 |####################| 100% (14,457) 0.0 min
OUTPUT> None RECORDS> 14457 PASS> 0 FAIL> 14457 END> CreateGermlines
Head of the fasta file with the V sequences:
IGHV1-202 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAG GTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCGGCTACTATATGCAC TGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAAC... ...AGTGGTGGCACAAACTATGCACAGAAGTTTCAG...GGCAGGGTCACCATGACCAGG GACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCC GTGTATTACTGTGCGAGAGA IGHV1-301 CAGGTCCAGCTTGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAG GTTTCCTGCAAGGCTTCTGGATACACCTTC............ACTAGCTATGCTATGCAT TGGGTGCGCCAGGCCCCCGGACAAAGGCTTGAGTGGATGGGATGGATCAACGCTGGC... ...AATGGTAACACAAAATATTCACAGAAGTTCCAG...GGCAGAGTCACCATTACCAGG GACACATCCGCGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAAGACACGGCT GTGTATTACTGTGCGAGAGA
Comments (8)
-
reporter -
Can you check the FASTA file outputted by TIgGER? That is,
genotype/WTCHG_460561_701501_v_genotype.fasta
?I think this might be because that the germline FASTA outputted by TIgGER has IMGT gaps as
---
instead of...
. So if in that file you see things like,ATGCATGCC---TTTATG
, try changing all instances of---
to...
.I remember running into this issue the first time I tried including inferred novel germline alleles from TIgGER, and I've always made a mental note since to switch the
---
s to...
s before passing it to Change-O.If this doesn't work, then we'll have to wait for @javh
-
reporter This is what the novel V sequences files look like:
>IGHV1-2*02 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAG GTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCGGCTACTATATGCAC TGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAAC... ...AGTGGTGGCACAAACTATGCACAGAAGTTTCAG...GGCAGGGTCACCATGACCAGG GACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCC GTGTATTACTGTGCGAGAGA >IGHV1-3*01 CAGGTCCAGCTTGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAG GTTTCCTGCAAGGCTTCTGGATACACCTTC............ACTAGCTATGCTATGCAT TGGGTGCGCCAGGCCCCCGGACAAAGGCTTGAGTGGATGGGATGGATCAACGCTGGC... ...AATGGTAACACAAAATATTCACAGAAGTTCCAG...GGCAGAGTCACCATTACCAGG GACACATCCGCGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAAGACACGGCT GTGTATTACTGTGCGAGAGA
I think they are fine...
-
You were looking at the non-novel ones in the file. Look for allele names containing a suffix like "_T288C" -- those are the ones added by TIgGER. For example, I looked at one of the FASTA files I got from TIgGER, scrolling through the file, I found things like
>IGHV4-39*07 CAGCTGCAGCTGCAGGAGTCGGGCCCA...GGACTGGTGAAGCCTTCGGAGACCCTGTCC CTCACCTGCACTGTCTCTGGTGGCTCCATCAGC......AGTAGTAGTTACTACTGGGGC TGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGT... ......GGGAGCACCTACTACAACCCGTCCCTCAAG...AGTCGAGTCACCATATCAGTA GACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCC GTGTATTACTGTGCGAGAGA >IGHV4-59*01_T288C CAGGTGCAGCTGCAGGAGTCGGGCCCA---GGACTGGTGAAGCCTTCGGAGACCCTGTCC CTCACCTGCACTGTCTCTGGTGGCTCCATC------------AGTAGTTACTACTGGAGC TGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGT--- ------GGGAGCACCAACTACAACCCCTCCCTCAAG---AGTCGAGTCACCATATCAGTA GACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCC GTGTATTACTGTGCGAGAGA
Notice how the non-novel allele
IGHV4-39*07
has all gaps as...
, whereas the novel alleleIGHV4-59*01_T288C
has all gaps as---
. -
reporter Yes, I was a fool and trusted the first few sequences. I found the novel sequences and I have exactly the situation you described. I'll change that and try again, but I am pretty confident it is just that.
Thank you so much!
-
Hrm. Even if the novel alleles are missing the
.
characters, which will cause the germline reconstruction for those alleles to fail, that warning about the missing "IMGT-numbering spacers" shouldn't occur. It really only checks to make sure some sequence have them, not all.We're trying to get a tigger release together, so I'll make a note to fix the output there, but something else might be going on.
If the fix @jqz suggested doesn't work, could you email the input files (germlines and tab file) to immcantation@googlegroups.com? We can take a look. Your command looks fine, so I don't have any suggestion that wouldn't require some debugging.
-
reporter The problem persists after changing the spacers.
I sent you an email, thank you for your help!
-
- changed status to resolved
Seems to have fixed itself. Please, reopen if it pops up again.
- Log in to comment