`GermlineIGHV` contains an allele that is not found in IMGT reference

Issue #23 resolved
Julian Zhou created an issue

The GermlineIGHV object that comes loaded with Tigger v0.3.1 contains IGHV3-43D*01. This, when cross-ref'ed against the IMGT IGHV reference database as of Oct 27 2018, is missing from the IMGT database.

A closer look reveals that what Tigger's GermlineIGHV is calling IGHV3-43D*01 has the exact same sequence as what IMGT is calling IGHV3-43D*03.

Comments (6)

  1. Jason Vander Heiden

    It'd probably be wise to just update the database to the latest germline set. And add kappa (GermlineIGKV) and lambda (GermlineIGLV) germlines linked to the same man page as the heavy chain germlines.

  2. ssnn

    I suspect @dgadala created the GermlineIGHV object (although IIRC with other name) using whatever germline set he used in his analysis. And this germline set, with the example data, is used as an example to find novel alleles. If we change the germlines, we need to make sure the new set still works to find novel alleles. Also, we may need to update the example data gene calls, because if the allele that has been removed was present in the db, findNovelAlleles will die with something like ‘the allele x is not present in the germline set’.

    So if GermlineIGHV main purpose is demonstration, I am not sure I would worry about providing updated reference germlines in TIgGER, because of the extra maintenance tasks, and because user can easily use readIgFasta to load their data.

  3. ssnn

    The GermlineIGHV object that comes loaded with Tigger v0.3.1 contains IGHV3-43D*01. This, when cross-ref'ed against the IMGT IGHV reference database as of Oct 27 2018, is missing from the IMGT database.

    A closer look reveals that what Tigger's GermlineIGHV is calling IGHV3-43D*01 has the exact same sequence as what IMGT is calling IGHV3-43D*03.

  4. ssnn

    I can confirm that updating the germlines breaks the examples. We could add a SampleGermlinesIGHV with the 2014 germlines to be used with SampleDb in the examples and tests. (a76c6b5)

    I don’t see where updateAlleleNames is used (I am using ‘Find in files’ and the function name doesn’t appear outside the definition or comments/docs lines). Do we need this function? If yes, we should probably update the mappings too.

  5. Log in to comment