junction_aa contains extra AA before conserved C residue for TCR alpha
I've been in the process of updating the apps on VDJServer, including moving to IgBlast 1.8.0 and more recent pRESTO and ChangeO. I'm using default branch so that I get the AIRR format.
$ hg summary
parent: 665:ac5ceba07a80 tip
Merged kleinstein/changeo into default
branch: airr-handle-only
commit: 35 unknown (new branch)
update: (current)
phases: 5 draft
Anyways, I processing a TCR alpha data set and noticed extra AAs before the conserved C residue. I don't process TCR alpha too often, so I don't know if this was an issue with earlier version of ChangeO or not. Here's a couple sequences as examples:
CCTATCCCCTGTGTGCCTTGGCAGTCTCAGCAGGTCTTCAGTTGCTTATGAAGGTTTTCTCAAGTACGGAAATAAACGAAGGACAAGGATTCACTGTCCTACTGAACAAGAAAGACAAACAACTCTCTCTGAACCTCACAGCTGCCCATCCTGGGGACTCAGCCGTGTACTTCTGCGCAGCCCCCGGGACTGGAGGCTATAAAGTGGTCTTTGGAAGTGGGACTCGATTGCTGGTAAGCCCTGACATCCAGAACCCGGAACCTGCTGT
CCTATCCCCTGTGTGCCTTGGCAGTCTCAGGAGAAGGTCCACAACTCCTCTTTAGAGCCTCAAGGGACAAAGAGAAAGGAAGCAGCAGAGGTTTTGAAGCTACATATGATAAAGGGACCACCTCCTTCCACTTGCGGAAAGCTTCAGTGCAAGAGTCAGACTCGGCTGTGTACTACTGTGCTCTGAGTGGTAGGCACTATGGAAATGAGAAAATAACTTTTGGGGCTGGAACCAAACTCACCATTAAACCCAACATCCAGAACCCGGAACCTGCTGT
with junction_nt coming out as
TACTTCTGCGCAGCCCCCGGGACTGGAGGCTATAAAGTGGTCTTT
TACTACTGTGCTCTGAGTGGTAGGCACTATGGAAATGAGAAAATAACTTTT
and junction_aa coming out as
YFCAAPGTGGYKVVF
YYCALSGRHYGNEKITF
IgBlast gives the CDR3 as this
AAPGTGGYKVV
ALSGRHYGNEKIT
Comments (10)
-
reporter -
reporter This is mouse.
-
reporter Okay, I think I'm at the most current and I get the same result.
$ hg summary parent: 696:35caac882b75 tip Added AIRR format support to ConverDb-genbank. branch: default commit: 28 unknown (clean) update: (current)
-
Okay. I'll take a look at this soon (hopefully today). This would mean the IMGT numbering isn't correct, because the junction start position is fixed at 309.
-
So this is caused because amino acid 104 is actually in position 106 in these TCRA genes.
IMGT alignment:
1 5 8A 10 1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... G E Q V E Q R P P H L S V R AC163653 Musmus TRAV3D-3*02 F ggc gag cag gtg gag cag cgc cct ... cct cac ctg agt gtc cgg 75 80 84A 85 G F T V L L N K K D K Q L S 1 gga ttc act gtc cta ctg aac aag aaa gac ... aaa caa ctc tct
Note, that TRAV3D-3*02 has both a position 8 and an 8A (9) as well as an 84 (85) and 84A (86). That's what's screwing it up. The IMGT-gapping itself is fine, w.r.t. the IMGT germlines, it's just that these particular germlines do not adhere to the IMGT numbering scheme. :/
This is an instance where it's probably just easier to just pass the IgBLAST CDR3 to JUNCTION (with +/-3) rather than add a fix.
I might fix it anyway by adding a motif search for the conserved residue instead of relying on the IMGT numbering, but... probably more trouble than it's worth.
-
Okay, I updated the parser to use IgBLAST's CDR3 for determining the junction. Only works for IgBLAST v1.7+. Falls back to the old IMGT-gapping method for older versions.
-
Hey @schristley, can I close this? Is this working?
-
-
assigned issue to
-
assigned issue to
-
reporter Test jobs are running now so should know soon.
-
reporter - changed status to resolved
CDR3 looks to be coming out correctly for my data sets.
- Log in to comment
My version of ChangeO might be a bit older than I expect, so I'm gonna update it and try again.