MakeDb.py igblast --format description needs adjusting
I’m following along with the “Using IgBLAST” section in the documentation:
https://changeo.readthedocs.io/en/latest/examples/igblast.html
and had some confusion with MakeDb.py’s parser help text. It looks like the input should be a .fmt7 file, but the description for --format
says Specify input and output format. (default: airr)
. For this script in particular that’ll just be for the output, right, not input? (Looking at AssignGenes.py, it sets format=False in its call to getCommonArgParser and then adds a custom --format argument. Should MakeDb.py do something similar, or am I misunderstanding the format options?)
Comments (5)
-
-
reporter No worries, I just wanted to make sure I wasn't mixing up my inputs and outputs for these. One last question there: Even though igblastn can give an AIRR TSV file out, I should still use the fmt7 and funnel it through MakeDb.py since that does some of its own processing/deduplicating/filtering, right?
-
Yeah, I would stick with feeding the fmt7 file into MakeDb for now. There’s not much that differs - the IgBLAST AIRR file should be fine as well. There is some filtering (disable with
--partial
), but no deduplication.The biggest difference is that MakeDb inserts IMGT numbering spacers into the
*_alignment
fields, which a few downstream tools expect (eg, tigger/shazam). We’re going to add a mode to MakeDb that’ll take as input an AIRR file from cellranger/igblast and add those IMGT numbering spacers. Just haven’t gotten to it yet. -
- changed status to resolved
Clarified
--format
help in 0b456bb. -
reporter Thanks! (Ah and you're right about the deduplicating. I see now it's just reading DUPCOUNT from the sequence descriptions and then writing it as a duplicate_count column for AIRR.)
- Log in to comment
Yeah, that’s right. Only the output for MakeDb. We can fix the commandline help - that wording is confusing, especially now that IgBLAST has native AIRR output.