CreateGermlines.py fails

Issue #187 resolved
Tartu Immunology created an issue

Hi,

This might be the same one as previously reported issue. Or maybe I just don’t understand the workflow.

I’ve processed my data according to the RACE + UMI vignette. I wanted to run spectral clustering using the vj method. For that, I need the germline_alignment_d_mask column. I run the following command:

CreateGermlines.py -d my_sample_ph_parse-select.tsv \
  -r ~/share/germlines/imgt/human/vdj/*IGH[DJ].fasta \
  -g dmask --vf v_call \
  --format airr --outname my_sample_ph

The IMGT data was downloaded according the igblast section of the tutorial. I get the following warning:

WARNING> Germline reference sequences do not appear to contain IMGT-numbering spacers.
Results may be incorrect.

None of the sequences get annotated. Any idea what could be wrong?

Best regards

Comments (3)

  1. Jason Vander Heiden

    That old issue seems unrelated. It was an issue specifically with the format of novel germline sequences output by tigger.

    I think you just have a typo in your command. The second line should be:

    -r ~/share/germlines/imgt/human/vdj/*IGH[VDJ].fasta
    

    Ie, [DJ][VDJ].

    You’re only passing the D and J germlines to the tool and because those don’t contain IMGT numbering spacers (as intended; they are a V property), CreateGermlines isn’t seeing any sequences with spacers.

    You can also pass it the directory and it’ll load all the files in it. Assuming you only have fasta files in that directory. Eg:

    -r ~/share/germlines/imgt/human/vdj
    

  2. Tartu Immunology reporter

    Thanks! I just copied it from the intro lab and didn’t think twice. Should have paid more attention, sorry for bothering!

  3. Log in to comment