junction_aa contains extra AA before conserved C residue for TCR alpha

Issue #115 resolved
Scott Christley created an issue

I've been in the process of updating the apps on VDJServer, including moving to IgBlast 1.8.0 and more recent pRESTO and ChangeO. I'm using default branch so that I get the AIRR format.

$ hg summary
parent: 665:ac5ceba07a80 tip
 Merged kleinstein/changeo into default
branch: airr-handle-only
commit: 35 unknown (new branch)
update: (current)
phases: 5 draft

Anyways, I processing a TCR alpha data set and noticed extra AAs before the conserved C residue. I don't process TCR alpha too often, so I don't know if this was an issue with earlier version of ChangeO or not. Here's a couple sequences as examples:

CCTATCCCCTGTGTGCCTTGGCAGTCTCAGCAGGTCTTCAGTTGCTTATGAAGGTTTTCTCAAGTACGGAAATAAACGAAGGACAAGGATTCACTGTCCTACTGAACAAGAAAGACAAACAACTCTCTCTGAACCTCACAGCTGCCCATCCTGGGGACTCAGCCGTGTACTTCTGCGCAGCCCCCGGGACTGGAGGCTATAAAGTGGTCTTTGGAAGTGGGACTCGATTGCTGGTAAGCCCTGACATCCAGAACCCGGAACCTGCTGT
CCTATCCCCTGTGTGCCTTGGCAGTCTCAGGAGAAGGTCCACAACTCCTCTTTAGAGCCTCAAGGGACAAAGAGAAAGGAAGCAGCAGAGGTTTTGAAGCTACATATGATAAAGGGACCACCTCCTTCCACTTGCGGAAAGCTTCAGTGCAAGAGTCAGACTCGGCTGTGTACTACTGTGCTCTGAGTGGTAGGCACTATGGAAATGAGAAAATAACTTTTGGGGCTGGAACCAAACTCACCATTAAACCCAACATCCAGAACCCGGAACCTGCTGT

with junction_nt coming out as

TACTTCTGCGCAGCCCCCGGGACTGGAGGCTATAAAGTGGTCTTT
TACTACTGTGCTCTGAGTGGTAGGCACTATGGAAATGAGAAAATAACTTTT

and junction_aa coming out as

YFCAAPGTGGYKVVF
YYCALSGRHYGNEKITF

IgBlast gives the CDR3 as this

AAPGTGGYKVV
ALSGRHYGNEKIT 

Comments (10)

  1. Scott Christley reporter

    My version of ChangeO might be a bit older than I expect, so I'm gonna update it and try again.

  2. Scott Christley reporter

    Okay, I think I'm at the most current and I get the same result.

    $ hg summary
    parent: 696:35caac882b75 tip
     Added AIRR format support to ConverDb-genbank.
    branch: default
    commit: 28 unknown (clean)
    update: (current)
    
  3. Jason Vander Heiden

    Okay. I'll take a look at this soon (hopefully today). This would mean the IMGT numbering isn't correct, because the junction start position is fixed at 309.

  4. Jason Vander Heiden

    So this is caused because amino acid 104 is actually in position 106 in these TCRA genes.

    IMGT alignment:

                                        1               5               8A      10                  
    
    1                                   ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 
                                         G   E   Q   V   E   Q   R   P       P   H   L   S   V   R  
    AC163653 Musmus TRAV3D-3*02 F       ggc gag cag gtg gag cag cgc cct ... cct cac ctg agt gtc cgg 
    
                                        75                  80                  84A 85              
                                         G   F   T   V   L   L   N   K   K   D       K   Q   L   S  
    1                                   gga ttc act gtc cta ctg aac aag aaa gac ... aaa caa ctc tct
    

    Note, that TRAV3D-3*02 has both a position 8 and an 8A (9) as well as an 84 (85) and 84A (86). That's what's screwing it up. The IMGT-gapping itself is fine, w.r.t. the IMGT germlines, it's just that these particular germlines do not adhere to the IMGT numbering scheme. :/

    This is an instance where it's probably just easier to just pass the IgBLAST CDR3 to JUNCTION (with +/-3) rather than add a fix.

    I might fix it anyway by adding a motif search for the conserved residue instead of relying on the IMGT numbering, but... probably more trouble than it's worth.

  5. Jason Vander Heiden

    Okay, I updated the parser to use IgBLAST's CDR3 for determining the junction. Only works for IgBLAST v1.7+. Falls back to the old IMGT-gapping method for older versions.

  6. Log in to comment