xml.parsers.expat.ExpatError: no element found (running with blast on assembled sequences)

Issue #82 resolved
Jessica Rowell created an issue

Command I ran:

 run_resfinder.py -o amr -s Other -l 0.7 -t 0.8 -db_res /home/src/resfinder/db_resfinder/ --acquired -ifa SAMPLE.mapped.fasta --blastPath /home/src/ncbi-blast-2.13.0+/bin/blastn

How I got SAMPLE.mapped.fasta (input file):

map a metagenomics sample to a collection of reference bacterial genomes using bowtie2 → get a sorted bam file → samtools fasta SAMPLE.sorted.bam > SAMPLE.mapped.fasta

ResFinder version: 4.1.11

Installed as per the “Installation” instructions here.

Error:

After 4 hours' run time and tons of cryptic “Found: gene entry” prints to STDOUT, I finally get this:

Traceback (most recent call last):
File "/home/src/resfinder/run_resfinder.py", line 390, in <module>
blast_results = acquired_finder.blast(inputfile=args.inputfasta,
File "/home/src/resfinder/cge/resfinder.py", line 148, in blast
blast_run = Blaster(inputfile=inputfile, databases=self.databases,
File "/usr/local/lib/python3.8/dist-packages/cgecore/blaster/blaster.py", line 100, in init
for blast_record in blast_records:
File "/usr/local/lib/python3.8/dist-packages/Bio/Blast/NCBIXML.py", line 824, in parse
expat_parser.Parse(NULL, True) # End of XML record
xml.parsers.expat.ExpatError: no element found: line 134501892, column 12

I can’t tell if this is resulting from something I’ve done, or a problem somewhere else. I’m not very familiar with the xml.parsers.expat module. Some initial searching suggests that “expat_parser.Parse(NULL, True)” means the XML file it’s trying to parse might be empty?

Any help on this would be appreciated. Thanks! I’m happy to try to provide whatever additional info you need.

Comments (4)

  1. CGE Helpdesk

    Dear Jessica,

    Thank you for your interest in ResFinder.
    It seems the job failed because the input file with the metagenomic sample was too big to be processed by BLAST and consequently no XML file was written. 
    You could either split the metagenomic sample up into multiple fasta files or try to run the raw reads with KMA.

    Best regards,
    Karen, CGE Helpdesk.

  2. Jessica Rowell reporter

    Thanks, Karen. That makes sense…it was a humongous file (~60M bp). I did run it on the raw reads with KMA (that works fabulously!) and I was just trying out the Blast match after mapping to some bacterial genomes just for comparison. So not a big deal! I appreciate your help.

  3. Jessica Rowell reporter

    Problem was that the assembled fasta file was too large. It would be good if Blast could run multi-threaded, or if there were some kind of checkpoints in place to flag an extra-large file with a warning or something. But I understand with a small team that wouldn't be a high priority. Thanks for all your efforts to maintain and improve ResFinder, and for responding to issues quickly!

  4. Log in to comment