_imgt_check fails for mouse TRA contigs

Issue #198 resolved
zewen.tuong created an issue

Hi, thanks for all the great tools - i’m noticing an issue when running MakeDb.py on mouse TCR data where TRA contigs are flagged as invalid at the _imgt_check step.

in my test case, the second sequence was the first TRA contig

with open(aligner_file, 'r') as f:
    parse_iter = parser(f, seq_dict, references, regions=regions, asis_calls=asis_calls, infer_junction=infer_junction)
    germ_iter = (addGermline(x, references, amino_acid=amino_acid) for x in parse_iter)
    count = 0
    for record in germ_iter:
        count += 1
        if count == 2:
            print(record.v_call)
            print(record.j_call)
            print(record.functional)
            print(record.sequence_imgt)
            print(record.junction)
            print(_imgt_check(record))
        if count == 2:
            break

and the outputs are:

TRAV3-3*01
TRAJ23*01
True
GGCGAGCAGGTGGAGCAGCGCCCT...CCTCACCTGAGTGTCCGGGAGGGAGACAGTGCCGTTATCACCTGCACCTACACAGACCCTAAC..................AGTTATTACTTCTTCTGGTACAAGCAAGAGCCGGGGGCAAGTCTTCAGTTGCTTATGAAGGTTTTCTCAAGT.........ACGGAAATAAACGAAGGA...............CAAGGATTCACTGTCCTACTGAACAAGAAAGAC...AAACGACTCTCTCTGAACCTCACAGCTGCCCATCCTGGGGACTCAGCCGCGTACTTCTGCGCAGTCAGTGCATTTGGCTATAACCAGGGGAAGCTTATCTTTGGACAGGGAACCAAGTTATCTATCAAGCCCA
TGCGCAGTCAGTGCATTTGGCTATAACCAGGGGAAGCTTATCTTT
False

TGCGCAGTCAGTGCATTTGGCTATAACCAGGGGAAGCTTATCTTT is the junction and is what’s reported in the igblast.fmt7 output

but within _imgt_check, the output from is record.sequence_imgt[x:y]:

TACTTCTGCGCAGTCAGTGCATTTGGCTATAACCAGGGGAAGCTT

so check = (rec.junction == rec.sequence_imgt[x:y]) fails.

Any potential ideas on what could be going on?

Comments (2)

  1. ssnn

    Hi. Some mouse TRA germlines have extra IMGT-numbered positions that push the CDR3 start position past 312 in the alignment. MakeDb checks that the junction stats at 312, and in these cases, the check fails. Until we implement a solution, you can use --partial flag to skip check this check, but you will need to filter out records without a V, J and junction sequence.

    If you submit to IMGT the reference germline sequence:

    >TRAV3-3*01
    ggcgagcaggtggagcagcgccctcctcacctgagtgtccgggagggagacagtgccgttatcacctgcacctacacagaccctaacagttattacttcttctggtacaagcaagagccgggggcaagtcttcagttgcttatgaaggttttctcaagtacggaaataaacgaaggacaaggattcactgtcctactgaacaagaaagacaaacgactctctctgaacctcacagctgcccatcctggggactcagccgcgtacttctgcgcagtcagtg
    

    You will see the V-region alignment has additional positions 8A and 84A (not shown here):

                                        <----------------------------------------------  FR1 - IMGT 
    
                                        1               5               8A      10                  
    
    TRAV3-3*01                          ggc gag cag gtg gag cag cgc cct ... cct cac ctg agt gtc cgg 
    
    AC003997 Musmus TRAV3-3*01 F        --- --- --- --- --- --- --- --- ... --- --- --- --- --- --- 
    

  2. Log in to comment