_imgt_check fails for mouse TRA contigs
Issue #198
resolved
Hi, thanks for all the great tools - i’m noticing an issue when running MakeDb.py on mouse TCR data where TRA contigs are flagged as invalid at the _imgt_check
step.
in my test case, the second sequence was the first TRA contig
with open(aligner_file, 'r') as f:
parse_iter = parser(f, seq_dict, references, regions=regions, asis_calls=asis_calls, infer_junction=infer_junction)
germ_iter = (addGermline(x, references, amino_acid=amino_acid) for x in parse_iter)
count = 0
for record in germ_iter:
count += 1
if count == 2:
print(record.v_call)
print(record.j_call)
print(record.functional)
print(record.sequence_imgt)
print(record.junction)
print(_imgt_check(record))
if count == 2:
break
and the outputs are:
TRAV3-3*01
TRAJ23*01
True
GGCGAGCAGGTGGAGCAGCGCCCT...CCTCACCTGAGTGTCCGGGAGGGAGACAGTGCCGTTATCACCTGCACCTACACAGACCCTAAC..................AGTTATTACTTCTTCTGGTACAAGCAAGAGCCGGGGGCAAGTCTTCAGTTGCTTATGAAGGTTTTCTCAAGT.........ACGGAAATAAACGAAGGA...............CAAGGATTCACTGTCCTACTGAACAAGAAAGAC...AAACGACTCTCTCTGAACCTCACAGCTGCCCATCCTGGGGACTCAGCCGCGTACTTCTGCGCAGTCAGTGCATTTGGCTATAACCAGGGGAAGCTTATCTTTGGACAGGGAACCAAGTTATCTATCAAGCCCA
TGCGCAGTCAGTGCATTTGGCTATAACCAGGGGAAGCTTATCTTT
False
TGCGCAGTCAGTGCATTTGGCTATAACCAGGGGAAGCTTATCTTT
is the junction and is what’s reported in the igblast.fmt7 output
but within _imgt_check
, the output from is record.sequence_imgt[x:y]
:
TACTTCTGCGCAGTCAGTGCATTTGGCTATAACCAGGGGAAGCTT
so check = (rec.junction == rec.sequence_imgt[x:y])
fails.
Any potential ideas on what could be going on?
Comments (2)
-
-
- changed status to resolved
- Log in to comment
Hi. Some mouse TRA germlines have extra IMGT-numbered positions that push the CDR3 start position past 312 in the alignment. MakeDb checks that the junction stats at 312, and in these cases, the check fails. Until we implement a solution, you can use --partial flag to skip check this check, but you will need to filter out records without a V, J and junction sequence.
If you submit to IMGT the reference germline sequence:
You will see the V-region alignment has additional positions 8A and 84A (not shown here):