Rhesus macaque

Issue #194 new
Yupeng Feng created an issue

Hi there, Can I use MakeDb.py imgt for Rhesus Macque’s Ig analysis?

Comments (5)

  1. Yupeng Feng reporter

    When I run human data, it worked very well. But when I run the rhesus macaque data. every sequences failed and I got no output.

  2. ssnn

    Hi! We are working to find a solution for this. For now, use --partial. In February 2021 (https://www.imgt.org/IMGTinformation/creations/#2021)) IMGT introduced two new codons into the alignment of Rhesus IGH (15A and 26A), along with a large number of gene name changes. If you work with V-quest or High V-Quest, you will see the additional codons in alignment of the results. I don't think IgBLAST has done any updates to rhesus_monkey.ndm.imgt to account for the two new codons, at least I have not seen this changes in the release notes, so probably best to use IMGT for the alignments. Because of the new positions, `MakeDb.py imgt` can't find the second cysteine in the expected location. You can use --partial to avoid this check (then MakeDb will use a loose check https://bitbucket.org/kleinstein/changeo/src/8a3ba4cdcf78b3a2f93132da529a18ad5ab90595/bin/MakeDb.py#lines-226:231, as opposed to the strict checks https://bitbucket.org/kleinstein/changeo/src/8a3ba4cdcf78b3a2f93132da529a18ad5ab90595/bin/MakeDb.py#lines-194:224). Keep this in mind also for downstream analyses that rely on the location of the CDRs.

  3. ssnn

    @Jason Vander Heiden What do you think about changing imgt_regions to accomodate something like:

                    'rhesus': {
                            'igh' { 'fwr1': 1,
                                    'cdr1': 29,
                                    'fwr2': 41,
                                    'cdr2': 58,
                                    'fwr3': 68,
                                    'cdr3': 107},
                            'igk' { 'fwr1': 1,
                                    'cdr1': 28,
                                    'fwr2': 40,
                                    'cdr2': 57,
                                    'fwr3': 67,
                                    'cdr3': 106},
                            'igl' { 'fwr1': 1,
                                    'cdr1': 28,
                                    'fwr2': 40,
                                    'cdr2': 59,
                                    'fwr3': 69,
                                    'cdr3': 108}
                            }
    

    The boundaries come from IMGT’s https://www.imgt.org/IMGTrepertoire/Proteins/proteinDisplays.php?species=Rhesus%20monkey&latin=Macaca%20mulatta&group=IGHV (change the last IGHV with IGKV and IGLV). I tested these boundaries with some rhesus data I have, and almost all sequences passed.

  4. Jason Vander Heiden

    In principle I’m fine with it, assuming those boundaries are universally true within a given locus' germline reference, which I’m not certain is the case. It certainly isn’t the case for mouse TRA where this problem also exists. We’d have to add a check for the locus before doing the CDR3 check, which shouldn’t be a big deal.

    The lazy solution would be to have the validation check each frame from positions 105-108 and assume the alignment is correct within that window. If we did do that, then we could “look back” and fix the other region positions according to this table based on the CDR3 start.

    We’d also need fixes in downstream applications that depend upon IMGT-numbering, but… first thing’s first.

  5. Jason Vander Heiden

    In the interest of having some sort of fix while we iron out the details, I changed the default filtering behavior to exclude the junction region start position validation with the old behavior invoked by the new ``--strict`` argument (d0dc4fb).

  6. Log in to comment