Does CreateGermlines work with IgBLAST results?

Issue #47 resolved
Jason Vander Heiden created an issue

We need to test if the IgBLAST output works with CreateGermlines... And probably fix it because it probably doesn't.

Comments (7)

  1. Jason Vander Heiden reporter

    Unsurprisingly, it doesn't work. From @sonia_t :

    "in case it's useful, I DID try CreateGermlines with MakeDb-parsed igblast, using this command:

    CreateGermlines.py -d IGHV_igblast_db-pass.tab --sf SEQUENCE_VDJ --failed --log CG_try3.log --outdir try3 -g vonly -r ./germlines

    all sequences fail with this error (regardless of -g option)

    ERROR> Germline sequence is 131 nucleotides longer than input sequence

    Where the number of nucleotides (131 in this case) seems to be V_SEQ_START - 1"

  2. Jason Vander Heiden reporter

    I made several changes to CreateGermlines and MakeDb. It seems to be working now with the SEQUENCE_VDJ column, but it needs more testing. I discovered another fantastic "feature" of IgBLAST in the process... It allows the end/start positions for V/D, D/J and V/J to overlap, with reasonably high frequency. That was the main problem, though there were a few smaller problems as well.

    Let me know if it isn't working correctly... Reopen the issue and I'll try to fix it more better.

  3. Jason Vander Heiden reporter
    • changed status to open

    Doesn't work if there are gaps in the alignment. I'm gussing this will require the following to fix:

    1. Require addition of the btop column to the IgBLAST output.
    2. Autodetect columns in the hit table. Maybe via the comment string?
    3. Parse BTOP string and adjust SEQUENCE_VDJ to include gaps/deletions during MakeDb step.
  4. Namita Gupta

    Minor change to how igblast must be run (-outfmt '7 std qseq'), then MakeDb igblast will get the gapped query sequence for SEQUENCE_VDJ and CreateGermlines seems to work.

  5. David Koppstein

    Thanks for figuring this out. We'll update our igblast command and use CreateGermlines with getSeqDistance or calcObservedMutations as you suggested and report back.

  6. Jason Vander Heiden reporter

    You could also use presto.Sequence.scoreSeqPair() for a python option (by picking the right ignore_chars and score_dict arguments). shm::calcDBObservedMutations() has the most options though.

    This also means you should be able to build lineage trees from the IgBLAST output, which was previously not possible due to germline requirement.

  7. Log in to comment