Creating gapped IgBlast database
Issue #97
resolved
Based on the documentation it seems MakeDb requires both a gapped reference and for IgBlast to have been run using a gapped reference. When trying to create the IgBlast database with IMGT gaps you get: BLAST Database creation error: Near line 358, there's a line that doesn't look like plausible data, but it's not marked as defline or comment.
How are you supposed to make this so that the --region flag works? Right now the CDR3-IMGT is incorrect.
Comments (4)
-
-
reporter Got it working now. Thank you.
-
reporter - changed status to resolved
-
Sure thing.
- Log in to comment
Hi @peterch405,
To import IMGT reference sequences into IgBLAST you need to both modify the format of the sequence headers and remove the IMGT gaps (dots). IgBLAST provides the
edit_imgt_file.pl
script to do this on their ftp site.We also have a version of the cleaning script that will remove duplicate alleles as well (which appear in some TCR files) in the immcantation repo under
scripts
:clean_imgtdb.py
.If you download all the scripts from the immcantation repo you can do this in one step via:
Or a more complete example that includes downloading the references:
Then, just pass the fasta files that contain the original IMGT-gapped sequences to MakeDb. IgBLAST doesn't require (and can't use) gapped sequences, but as long as the nucleotide sequences and allele names are the same in the IgBLAST database it should work fine.