MakeDb.py igblast --format description needs adjusting

Issue #175 resolved
Jesse Connell created an issue

I’m following along with the “Using IgBLAST” section in the documentation:

https://changeo.readthedocs.io/en/latest/examples/igblast.html

and had some confusion with MakeDb.py’s parser help text. It looks like the input should be a .fmt7 file, but the description for --format says Specify input and output format. (default: airr). For this script in particular that’ll just be for the output, right, not input? (Looking at AssignGenes.py, it sets format=False in its call to getCommonArgParser and then adds a custom --format argument. Should MakeDb.py do something similar, or am I misunderstanding the format options?)

Comments (5)

  1. Jason Vander Heiden

    Yeah, that’s right. Only the output for MakeDb. We can fix the commandline help - that wording is confusing, especially now that IgBLAST has native AIRR output.

  2. Jesse Connell reporter

    No worries, I just wanted to make sure I wasn't mixing up my inputs and outputs for these. One last question there: Even though igblastn can give an AIRR TSV file out, I should still use the fmt7 and funnel it through MakeDb.py since that does some of its own processing/deduplicating/filtering, right?

  3. Jason Vander Heiden

    Yeah, I would stick with feeding the fmt7 file into MakeDb for now. There’s not much that differs - the IgBLAST AIRR file should be fine as well. There is some filtering (disable with --partial), but no deduplication.

    The biggest difference is that MakeDb inserts IMGT numbering spacers into the *_alignment fields, which a few downstream tools expect (eg, tigger/shazam). We’re going to add a mode to MakeDb that’ll take as input an AIRR file from cellranger/igblast and add those IMGT numbering spacers. Just haven’t gotten to it yet.

  4. Jesse Connell reporter

    Thanks! (Ah and you're right about the deduplicating. I see now it's just reading DUPCOUNT from the sequence descriptions and then writing it as a duplicate_count column for AIRR.)

  5. Log in to comment