- edited description
MakeDb crashes with sequence id
Issue #92
resolved
I found this sequence because it was causing a crash in repsum. I then made a test case for myself, and ran across this MakeDb crash just by accident. It definitely is the sequence id because if I change it then MakeDb runs fine.
input file with just a single sequence
>repsum_issue_36
GCAGTTTGTCTGACCCCCTGCTAACTGCAAGCCTCCAGGTCCAGTCTGATTCCATTCTTA
Here is the igblast output
# IGBLASTN 2.2.29+
# Query: repsum_issue_36
# Database: /work/01114/vdj/lonestar/../common/igblast-db/db/10_05_2016//human/ReferenceDirectorySet/human_TR_V.fna /work/01114/vdj/lonestar/../common/igblast-db/db/10_05_2016//human/ReferenceDirectorySet/
human_TR_D.fna /work/01114/vdj/lonestar/../common/igblast-db/db/10_05_2016//human/ReferenceDirectorySet/human_TR_J.fna
# Domain classification requested: imgt
# Note that your query represents the minus strand of a V gene and has been converted to the plus strand. The sequence positions refer to the converted sequence.
# V-(D)-J rearrangement summary for query sequence (Top V gene match, Top J gene match, Chain type, stop codon, V-J frame, Productive, Strand). Multiple equivalent top matches having the same score and pe
rcent identity, if present, are separated by a comma.
TRAV7*01 TRAJ9*01 VA No Out-of-frame No -
# V-(D)-J junction details based on top germline gene matches (V end, V-J junction, J start). Note that possible overlapping nucleotides at VDJ junction (i.e, nucleotides that could be assigned to either
rearranging gene) are indicated in parentheses (i.e., (TACT)) but are not included under the V, D, or J gene itself
TGGAC N/A CTGGA
# Alignment summary between query and top germline V gene hit (from, to, length, matches, mismatches, gaps, percent identity)
FR3-IMGT 2 22 21 17 4 0 81
Total N/A N/A 21 17 4 0 81
# Hit table (the first field indicates the chain type of the hit)
# Fields: query id, query gi, query acc., query acc.ver, query length, subject id, subject ids, subject gi, subject gis, subject acc., subject acc.ver, subject accs., subject length, q. start, q. end, s. s
tart, s. end, query seq, subject seq, evalue, bit score, score, alignment length, % identity, identical, mismatches, positives, gap opens, gaps, % positives, query/sbjct frames, query frame, sbjct frame, B
TOP
# 6 hits found
V reversed|repsum_issue_36 0 reversed|repsum_issue_36 reversed|repsum_issue_36 60 TRAV7*01 TRAV7*01 0 0 TRAV7*01 TRAV7*01 TRAV7
*01 274 2 22 202 222 AAGAATGGAATCAGACTGGAC AAGAATGGAAGCAGCTTGTAC 0.59 22.1 13 21 80.95 17 4 17 0 0 80.95 1/1 1 1
10TG3ACCT2GT2
V reversed|repsum_issue_36 0 reversed|repsum_issue_36 reversed|repsum_issue_36 60 TRDV3*01 TRDV3*01 0 0 TRDV3*01 TRDV3*01 TRDV3
*01 290 50 59 145 136 AGACAAACTG AGACAAACTG 15 17.4 10 10 100.00 10 0 10 0 0 100.00 1/1 1 1 10
V reversed|repsum_issue_36 0 reversed|repsum_issue_36 reversed|repsum_issue_36 60 TRAV8-7*01 TRAV8-7*01 0 0 TRAV8-7*01 TRAV8-7*01 TRAV8
-7*01 290 23 32 135 126 CTGGAGGCTT CTGGAGGCTT 15 17.4 10 10 100.00 10 0 10 0 0 100.00 1/1 1 1 10
J reversed|repsum_issue_36 0 reversed|repsum_issue_36 reversed|repsum_issue_36 60 TRAJ9*01 TRAJ9*01 0 0 TRAJ9*01 TRAJ9*01 TRAJ9
*01 61 23 32 8 17 CTGGAGGCTT CTGGAGGCTT 0.18 20.3 10 10 100.00 10 0 10 0 0 100.00 1/1 1 1 10
J reversed|repsum_issue_36 0 reversed|repsum_issue_36 reversed|repsum_issue_36 60 TRAJ24*02 TRAJ24*02 0 0 TRAJ24*02 TRAJ24*02 TRAJ2
4*02 63 31 38 24 31 TTGCAGTT TTGCAGTT 2.8 16.4 8 8 100.00 8 0 8 0 0 100.00 1/1 1 1 8
J reversed|repsum_issue_36 0 reversed|repsum_issue_36 reversed|repsum_issue_36 60 TRAJ47*01 TRAJ47*01 0 0 TRAJ47*01 TRAJ47*01 TRAJ4
7*01 57 52 59 13 20 ACAAACTG ACAAACTG 2.8 16.4 8 8 100.00 8 0 8 0 0 100.00 1/1 1 1 8
# BLAST processed 1 queries
and the command line for MakeDb
MakeDb.py igblast -s $1 -i $2 -r $VDJ_DB_ROOT/human/ReferenceDirectorySet/TR_VDJ.fna --regions --scores
let me know if you need the germline db file. And the stack trace
File "/scratch/01114/vdj/vdj/job-6157425024944893465-242ac11c-0001-007-igblast_test/bin/MakeDb.py", line 4, in <module>
__import__('pkg_resources').run_script('changeo==0.3.4.999', 'MakeDb.py')
File "/opt/apps/gcc5_2/python3/3.5.1/lib/python3.5/site-packages/pkg_resources/__init__.py", line 735, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/opt/apps/gcc5_2/python3/3.5.1/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1652, in run_script
exec(code, namespace, namespace)
File "/scratch/01114/vdj/vdj/job-6157425024944893465-242ac11c-0001-007-igblast_test/lib/python3.5/site-packages/changeo-0.3.4.999-py3.5.egg/EGG-INFO/scripts/MakeDb.py", line 555, in <module>
args.func(**args_dict)
File "/scratch/01114/vdj/vdj/job-6157425024944893465-242ac11c-0001-007-igblast_test/lib/python3.5/site-packages/changeo-0.3.4.999-py3.5.egg/EGG-INFO/scripts/MakeDb.py", line 289, in parseIgBLAST
no_parse=no_parse, partial=partial, out_args=out_args)
File "/scratch/01114/vdj/vdj/job-6157425024944893465-242ac11c-0001-007-igblast_test/lib/python3.5/site-packages/changeo-0.3.4.999-py3.5.egg/EGG-INFO/scripts/MakeDb.py", line 122, in writeDb
for i, record in enumerate(db, start=1):
File "/scratch/01114/vdj/vdj/job-6157425024944893465-242ac11c-0001-007-igblast_test/lib/python3.5/site-packages/changeo-0.3.4.999-py3.5.egg/changeo/Parsers.py", line 1096, in __next__
db = self.parseSections(sections)
File "/scratch/01114/vdj/vdj/job-6157425024944893465-242ac11c-0001-007-igblast_test/lib/python3.5/site-packages/changeo-0.3.4.999-py3.5.egg/changeo/Parsers.py", line 1021, in parseSections
db['SEQUENCE_INPUT'] = str(self.seq_dict[query].seq)
KeyError: 'psum_issue_36'
Comments (5)
-
reporter -
That should be enough, thanks. Off the top of my head, I see no good reason why it would truncate the sequence id to
psum_issue_36
unless there are special characters hidden in the id. I'll take a look tomorrow. -
-
assigned issue to
-
assigned issue to
-
Should be fixed in e60a32d. It was a misuse of
str.lstrip
. Let me know if it works for you now. -
- changed status to resolved
Should be fixed in e60a32d.
- Log in to comment