Parse CDR3 from IgBLAST v.1.5.0+

Issue #90 resolved
Derek Matthew Croote created an issue

The April 25, 2016 release of IgBLAST v1.5.0 added a CDR3 annotation to the output (mouse example below with surrounding output for context). It would be nice to extract this information. This might be achieved with the addition of an optional argument so as to not break compatibility with older IgBLAST versions?

# V-(D)-J junction details based on top germline gene matches (V end, V-D junction, D region, D-J junction, J start).  Note that possible overlapping nucleotides at VDJ junction (i.e, nucleotides that could be assigned to either rearranging gene) are indicated in parentheses (i.e., (TACT)) but are not included under the V, D, or J gene itself
CTGTG   TGAGAGATCGGGGCTATGATAGTAGTGG    TTATTAC GGAAATCTTGACTG  CTGGG   

# Sub-region sequence details (nucleotide sequence, translation)
CDR3    GTGAGAGATCGGGGCTATGATAGTAGTGGTTATTACGGAAATCTTGACTGC     VRDRGYDSSGYYGNLDC

# Alignment summary between query and top germline V gene hit (from, to, length, matches, mismatches, gaps, percent identity)
...

Comments (7)

  1. Jason Vander Heiden

    Hi @dcroote, this should be pretty simple to add. Though, MakeDb does already have a --regions argument that will add a column containing the CDR3 region, as defined by IMGT.

    We could certainly add an extra argument to add the IgBLAST CDR3. Assuming they are different in some way.

    I'll take a look at it.

  2. Derek Matthew Croote reporter

    Hi Jason,

    Much appreciated! One use of this additional argument is for parsing IgBLAST results generated with the kabat domain system, although I agree that the MakeDB CDR3_IMGT should be equivalent to the IgBLAST CDR3 when using the IMGT domain system and the --regions argument.

    I could work on a pull request?

    Best, Derek

  3. Jason Vander Heiden

    Certainly, @dcroote! We're quite happy to accept pull requests.

    I can also take a crack at it, if you'd prefer, but not until sometime next week.

  4. Jason Vander Heiden

    Hey, @dcroote. I merged your pull request, updated the docs accordingly, and made minor stylistic tweaks:

    • Renamed argument to --cdr3, just because it's shorter.
    • Renamed columns to CDR3_IGBLAST_NT and CDR3_IGBLAST_AA, just so they look like the CDR3_IMGT column.

    I ran the same raw data though IgBLAST 1.4, 1.5 and 1.6 and tested. Looks good! Thanks again.

  5. Log in to comment