Issue #7 new

Possible error with taxonomy assignment?

aaunins
created an issue

Hi - This may not be an error/bug, but rather an issue with interpretation on my end. I apologize if I am missing something simple.

Whatever the case, I am running the command line version of metannotate. I am able to run the test data set with no errors as instructed in the README file, and all the output files are created no problem. When I try the program on my own data and a different HMM, there does not seem to be any analyses for assigning taxonomy. This is the command I am running:

sudo python run_metannotate.py --orf_files=/media/4TB_drive1/Cladophora_Metannotate/Calumet20B_all.out.out.faa,/media/4TB_drive1/Cladophora_Metannotate/JeorsePark22B_all.out.out.faa --hmm_files=data/hmms/TIGR01287.HMM --reference_database=data/Refseq.fa --output_dir=Cladophora_test --tmp_dir=test_tmp --run_mode=both --orfs_hmm_evalue=0.01 --refseq_hmm_evalue=0.01

This is the output: Running commands: hmmstat data/hmms/TIGR01287.HMM Running hmmsearch on provided sequences. Running commands: hmmsearch -o /dev/null -A /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/test_tmp/tmpZJNLh9 --domtblout /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/test_tmp/tmpiTH6EV --domE 0.01 --cpu 6 data/hmms/TIGR01287.HMM /media/4TB_drive1/Cladophora_Metannotate/Calumet20B_all.out.out.faa Running commands: esl-reformat -o /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/test_tmp/tmp2R_5e1 fasta /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/test_tmp/tmpZJNLh9 Running commands: hmmsearch -o /dev/null -A /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/test_tmp/tmpYMAp94 --domtblout /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/test_tmp/tmp_9b6rQ --domE 0.01 --cpu 6 data/hmms/TIGR01287.HMM /media/4TB_drive1/Cladophora_Metannotate/JeorsePark22B_all.out.out.faa Running commands: esl-reformat -o /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/test_tmp/tmpaLVs54 fasta /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/test_tmp/tmpYMAp94 Running hmmsearch on Reference database. Running commands: hmmsearch -o /dev/null -A /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/test_tmp/tmpZZQwtt --domE 0.01 --domtblout cache/60189aab19612de37ecc2709d76ad4cb749037d221324b25.domtblout --cpu 6 data/hmms/TIGR01287.HMM data/Refseq.fa Running commands: esl-reformat -o cache/60189aab19612de37ecc2709d76ad4cb749037d221324b25.converted.msa fasta /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/test_tmp/tmpZZQwtt Job ran successfully. The following files are now available:

/media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/Cladophora_test/TIGR01287_0_Calumet20B_all_out_out_0_reads_7F_PeF437900228.fa /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/Cladophora_test/TIGR01287_0_JeorsePark22B_all_out_out_1_reads_S_e50R501771404.fa /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/Cladophora_test/raw_counts_pnkpNC831889948.csv /media/4TB_drive1/doxeylab-metannotateinstaller-0207d4a79dad/metannotate/Cladophora_test/normalized_counts_sCmrJs397902270.csv

The program does not run usearch, FastTree, pplacer, or any of those analyses for taxonomic assignment for my data. If I retry the analysis with RPOB.HMM for my data as in the README, that works fine and all the output files are generated.

So, does this mean that of the sequences matching TIGR01287.HMM in my data, none can be assigned taxonomy, or is there some kind of error here?

Thanks for your time and help -

Comments (3)

  1. doxeylab repo owner

    Hello, Yes, I am guessing that there were no hits. What are the contents of the normalized_counts_sCmrJs397902270.csv file? To verify that no hits were detected, you could try running hmmsearch directly against your fasta file.

  2. doxeylab repo owner

    Sorry, this may actually be a bug due to processing of the newer Refseq databases. The NCBI has changed their db format. This will be fixed soon.

  3. aaunins reporter

    Thanks for the reply. In the csv file for raw counts for my own dataset, there are 992, 1002, and 751 hits from hmmsearch of my data against the TIGR01287 HMM. I can reproduce these numbers by running hmmsearch independent of the run_metannotate.py script. Since there were many hits, I am guessing it is a bug then with the format of the Refseq database.

    I'm looking forward to the fix, and I appreciate your making the metannotate tool available - the results I generate will be very useful for my analyses.

  4. Log in to comment