MakeDb.py igblast fails with KeyError: '# Query:'

I’ve been using the Docker container from the tutorial to align my own data. The following runs fine on the file I generated with presto:

%%bash
AssignGenes.py igblast \
-s results/Sample_1_L001-C_atleast-2.fastq \
-b /usr/local/share/igblast --organism human \
--loci ig --format blast --outdir results/igblast --nproc 8

The resulting file looks like:

# IGBLASTN
# Query: 
# Database: /usr/local/share/igblast/database/imgt_human_ig_v /usr/local/share/igblast/database/imgt_human_ig_d /usr/local/share/igblast/database/imgt_human_ig_j
# Domain classification requested: imgt

# V-(D)-J rearrangement summary for query sequence (Top V gene match, Top D gene match, Top J gene match, Chain type, stop codon, V-J frame, Productive, Strand, V Frame shift).  Multiple equivalent top matches, if present, are separated by a comma.
IGHV6-1*01  IGHD6-13*01,IGHD6-25*01 IGHJ4*02    VH  No  In-frame    Yes +   No

# V-(D)-J junction details based on top germline gene matches (V end, V-D junction, D region, D-J junction, J start).  Note that possible overlapping nucleotides at VDJ junction (i.e, nucleotides that could be assigned to either rearranging gene) are indicated in parentheses (i.e., (TACT)) but are not included under the V, D, or J gene itself
AGAGA   TC  GGTATAGCAGC CT  CTTTG

Now, when I run

%%bash
sudo mkdir -p results/changeo
sudo MakeDb.py igblast \
-s results/Sample_1_L001-C_atleast-2.fastq -i results/igblast/Sample_1_L001-C_atleast-2_igblast.fmt7 \
--format airr \
-r /usr/local/share/germlines/imgt/human/vdj/ --outdir results/changeo \
--outname Sample_1

it fails with

Traceback (most recent call last):
  File "/usr/local/bin/MakeDb.py", line 897, in <module>
    args.func(**args_dict)
  File "/usr/local/bin/MakeDb.py", line 542, in parseIgBLAST
    output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
  File "/usr/local/bin/MakeDb.py", line 274, in writeDb
    for i, record in enumerate(records, start=1):
  File "/usr/local/bin/MakeDb.py", line 541, in <genexpr>
    germ_iter = (addGermline(x, references, amino_acid=amino_acid) for x in parse_iter)
  File "/usr/local/lib/python3.9/site-packages/changeo/IO.py", line 1531, in __next__
    db = self.parseSections(sections)
  File "/usr/local/lib/python3.9/site-packages/changeo/IO.py", line 1438, in parseSections
    db['sequence_input'] = str(self.sequences[query].seq)
KeyError: '# Query:'

Is that an issue with the input fastq file? Here’s a snippet:

@1CTCTCATT|PRCONS=IGHM|CONSCOUNT=155|DUPCOUNT=2
SOME SEQUENCE
+
{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{

The last command I ran to get this file was a based on

CollapseSeq.py -s HD09N-C_primers-pass_reheader.fastq -n 20 --inner \
    --uf CREGION --cf CONSCOUNT --act sum --outname HD09N-C

from the presto tutorial.

Am I missing some step? Thanks in advance!

Comments (6)