Creating gapped IgBlast database

Issue #97 resolved
Peter Chovanec created an issue

Based on the documentation it seems MakeDb requires both a gapped reference and for IgBlast to have been run using a gapped reference. When trying to create the IgBlast database with IMGT gaps you get: BLAST Database creation error: Near line 358, there's a line that doesn't look like plausible data, but it's not marked as defline or comment.

How are you supposed to make this so that the --region flag works? Right now the CDR3-IMGT is incorrect.

Comments (4)

  1. Jason Vander Heiden

    Hi @peterch405,

    To import IMGT reference sequences into IgBLAST you need to both modify the format of the sequence headers and remove the IMGT gaps (dots). IgBLAST provides the edit_imgt_file.pl script to do this on their ftp site.

    We also have a version of the cleaning script that will remove duplicate alleles as well (which appear in some TCR files) in the immcantation repo under scripts: clean_imgtdb.py.

    If you download all the scripts from the immcantation repo you can do this in one step via:

    imgt2igblast.sh -i <imgt directory> -o <igblast directory>
    

    Or a more complete example that includes downloading the references:

    fetch_igblastdb.sh -o /data/igblast
    fetch_imgtdb.sh -o /data/imgt
    imgt2igblast.sh -i /data/imgt -o /data/igblast
    

    Then, just pass the fasta files that contain the original IMGT-gapped sequences to MakeDb. IgBLAST doesn't require (and can't use) gapped sequences, but as long as the nucleotide sequences and allele names are the same in the IgBLAST database it should work fine.

  2. Log in to comment