MakeDb all TRA goes to fail file

Issue #185 resolved
mathis nozais created an issue

Hello,

I’m new to TCR analysis and I’m trying to use Igblast with assign genes and then makedb. When running makedb : MakeDb.py igblast -i all_contig_igblast.fmt7 -s all_contig.fasta -r /usr/local/share/germlines/imgt/mouse/vdj/imgt_mouse_TR*.fasta --10x all_contig_annotations.csv --failed --extended

In the db-pass file I have all my TRB locus but no TRA. I have to go into the fail file to find them. I don’t understand why they are filtered out. For most of the TRA they are labelled as productive, have a V and J assignement and a Junction region (that seems to be shorter than in the pass file).

If someone has some idea about this issue ? Maybe I’m doing something wrong. I attach an extract of the two tsv output from Makedb.

Have a good day !

Comments (5)

  1. Jason Vander Heiden

    Greetings @mathis nozais , I don’t see an obvious problem in the attached files. About half of the failed sequences don’t have a J gene or junction, so those are obvious, but I’m not sure about the other half. Would it be possible for you to email the three input files (-i, -s, and --10x arguments) to immcantation@googlegroups.com, please? We can test on our end and see if we have the same problem.

    Also, you could try adding the --partial argument to MakeDb. That won’t necessarily fix the issue, but it will relax some of the checks and then most records will end up in one file.

  2. mathis nozais reporter

    Thanks for your quick answer ! Yes indeed for those sequence it’s obvious but my issue is for the other half with a productive TCR, and a V J assignment. In fact I use the partial argument, and filter after by myself the TRA but in the MakeDb documentation they talk about incomplete VDJ alignment defined as missing V or J , productivity call… Since I have those information for half of my failed sequences the last parameter possible was a junction issue. But they seems to be fine, so I should be able to use those sequences.

    I send the email with my 3 input files.

  3. Jason Vander Heiden

    Hi Mathis,

    The unexpected failures are due to a few mouse TRA germlines that have extra IMGT-numbered positions. When I add the --log argument to MakeDb-igblast I see a lot of these:

            ID> AAAGCAACACAACTGT-1_contig_1
        V_CALL> TRAV14D-1*01,TRAV14N-1*01
        D_CALL> None
        J_CALL> TRAJ31*01,TRAJ31*02
    PRODUCTIVE> True
         ERROR> Junction does not match the sequence starting at position 310 in the IMGT numbered V(D)J sequence.
    

    Most of these are TRAV14D-1*01 and other germlines with similar structure (eg, TRAV6D-7*04). These are failing because the germlines have two additional amino acids in the germline, 8A (actually 9) and 84A (actually 86), which pushes the CDR3 start position past 312 in the alignment. If you drop one of these into IMGT/V-QUEST you'll see:

                                        <----------------------------------------------  FR1 - IMGT 
                                        1               5               8A      10                  
    3                                   cag cag cag gtg aga caa agt ccc ... caa tct ctg aca gtc tgg 
    L77150 Musmus TRAV14D-1*01 F        --- --- --- --- --- --- --- --- ... --- --- --- --- --- ---
    
                                        --------------------------------------  FR3 - IMGT  --------
                                        75                  80                  84A 85              
    3                                   cga ttc aca atc ttc ttc aat aaa agg gag ... aaa aag ctc tcc 
    L77150 Musmus TRAV14D-1*01 F        --- --- --- --- --- --- --- --- --- --- ... --- --- --- --- 
    

    This is something largely specific to Rhesus IGL and mouse TRA germlines. We have a partial solution for Rhesus implemented as --regions rhesus-igl.

    We’re seeing more paired TRB:TRA and heavy:light data, so we need a better solution for this. For now, I recommend just going with the --partial flag and manually filtered out records without a V, J and junction sequence.

    We’ll come up with something better, but we need to think about the best approach.

  4. mathis nozais reporter

    Hi,

    Thank you so mutch for the feedback ! Ok so I’ll keep my manual filtering for now.

  5. Log in to comment