did not find a decorated file for the archaeal tree

Issue #110 closed
Yi Liu created an issue

Hi Simon,

Thanks for your useful tool!

but i met a issue when i added the test MAGs to database. I first ran the wetland MAGs test and was able to get the correct output, but when adding my custom MAG database it was unable to recognize my decorated.tree-taxonomy files even though they existed.

I am using gtdbtk: version 1.5.0:

emma@emma-PowerEdge-R730xd:/media/HD1/LY/MAGs_GTDB-tk_results$ ll
total 10076
-rw-rw-r-- 1 emma emma 145267 8月 28 15:28 gtdbtk.ar122.decorated.tree
-rw-rw-r-- 1 emma emma 391682 9月 1 21:58 gtdbtk.ar122.decorated.tree-taxonomy
-rw-rw-r-- 1 emma emma 2538861 8月 28 15:28 gtdbtk.bac120.decorated.tree
-rw-rw-r-- 1 emma emma 7231288 9月 1 21:58 gtdbtk.bac120.decorated.tree-taxonomy
drwxrwxr-x 2 emma emma 4096 9月 2 17:07 infer

emma@emma-PowerEdge-R730xd:/media/HD1/LY/MAGs_GTDB-tk_results$ cd ./infer
emma@emma-PowerEdge-R730xd:/media/HD1/LY/MAGs_GTDB-tk_results/infer$ ll
total 2624
-rw-rw-r-- 1 emma emma 145267 9月 1 21:58 gtdbtk.ar122.decorated.tree
-rw-rw-r-- 1 emma emma 2538861 9月 1 21:58 gtdbtk.bac120.decorated.tree

my command:

nohup iphop add_to_db --fna_dir /media/HD1/LY/dereplicated_genomes --gtdb_dir /media/HD1/LY/MAGs_GTDB-tk_results --out_dir /media/HD1/LY/host_predictions/iphop_db_with_custom_mags --db_dir /media/HD1/LY/Aug_2023_pub_rw --num_threads 32 &

head of the logfile:

Starting
[1] Get a list of genomes to import...
[2] Import information from GTDBtk trees...
Reading /media/HD1/LY/MAGs_GTDB-tk_results/gtdbtk.ar122.decorated.tree
Reading /media/HD1/LY/MAGs_GTDB-tk_results/gtdbtk.bac120.decorated.tree
[3] Load new host genomes in blast database...

Building a new DB, current time: 09/02/2024 17:13:30
New DB name: /media/HD1/LY/host_predictions/iphop_db_with_custom_mags/db/Host_Genomes/New_host_genomes
New DB title: /media/HD1/LY/host_predictions/iphop_db_with_custom_mags/db/Host_Genomes/New_host_genomes.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 2255758 sequences in 171.545 seconds.

Created nucleotide BLAST (alias) database /media/HD1/LY/host_predictions/iphop_db_with_custom_mags/db/Host_Genomes/Host_Genomes with 25402714 sequences
[4] Get CRISPR arrays from new MAGs and add to database...
python /home/emma/mambaforge/envs/iphop_env/lib/python3.8/site-packages/iphop/utils/CRISPR/identify_crispr.folder.py -i /media/HD1/LY/dereplicated_genomes -o /media/HD1/LY/host_predictions/iphop_db_with_custom_mags/db/Tmp_CRISPRs
python /home/emma/mambaforge/envs/iphop_env/lib/python3.8/site-packages/iphop/utils/CRISPR/get_crispr_database.py -d /media/HD1/LY/host_predictions/iphop_db_with_custom_mags/db/Tmp_CRISPRs
[5] Add new genomes to WIsH database...

[6] Add new genomes to VHM database...
[7] Add new genomes to PHP database...
counting kmer ...
Processing adh_s_bin463.fna
Processing dbc_w_bin326.fna
Processing blk_s_bin263.fna

tail of the logfile:

Note: xcd_s_bin691 will not be included in the database because it could not be added to any tree (bacteria or archaea)
Note: gskl_s_bin495 will not be included in the database because it could not be added to any tree (bacteria or archaea)
Note: dbc_s_bin326 will not be included in the database because it could not be added to any tree (bacteria or archaea)
Note: dbc_w_bin82 will not be included in the database because it could not be added to any tree (bacteria or archaea)

We added 0 additional bacteria genomes and 0 additional archaea genomes
[9] All done

!#!#!#!#!#! WARNING --- SOME UNEXPECTED EVENTS HAPPENED -- WE LIST THEM BELOW, IT COULD BE NOTHING, BUT YOU SHOULD STILL DOUBLE-CHECK #!#!#!#!#!#!#

Note - we did not find a decorated file for the archaeal tree, so we did not use any data from a new archaeal genome
Note - we did not find a decorated file for the bacterial tree, so we did not use any data from a new bacterial genome

!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!#!#!

Comments (6)

  1. Simon Roux repo owner

    Hi,
    So it seems like iPHoP does find the tree files, but can not identify the bins in these trees. Can you check if the bins (e.g. dbc_w_bin82) are indeed mentioned in these files, and if they have the exact same name ?

  2. Yi Liu reporter

    Thank you for your quick response. I checked the tree files. The bins are indeed mentioned in these files, and they have exact same name.

    dbc_w_bin500 d__Bacteria; p__Bdellovibrionota; c__Bdellovibrionia; o__Bdellovibrionales; f__UBA1609; g__; s__

    GB_GCA_002453155.1 d__Bacteria; p__Bdellovibrionota; c__Bdellovibrionia; o__Bdellovibrionales; f__UBA1609; g__UBA6776; s__UBA6776 sp002453155
    gskl_w_bin81 d__Bacteria; p__Bdellovibrionota; c__Bdellovibrionia; o__Bdellovibrionales; f__UBA1609; g__; s__
    GB_GCA_013216195.1 d__Bacteria; p__Bdellovibrionota; c__Bdellovibrionia; o__Bdellovibrionales; f__UBA1609; g__JABSOQ01; s__JABSOQ01 sp013216195
    xcd_s_bin612 d__Bacteria; p__Bdellovibrionota; c__Bdellovibrionia; o__Bdellovibrionales; f__UBA1609; g__; s__
    dbc_w_bin520 d__Bacteria; p__Bdellovibrionota; c__Bdellovibrionia; o__Bdellovibrionales; f__UBA1609; g__; s__
    dbc_w_bin82 d__Bacteria; p__Bdellovibrionota; c__Bdellovibrionia; o__Bdellovibrionales; f__UBA1609; g__; s__
    GB_GCA_001769925.1 d__Bacteria; p__Bdellovibrionota; c__Bdellovibrionia; o__Bdellovibrionales; f__UBA1609; g__RBG-16-40-8; s__RBG-16-40-8 sp001769925
    GB_GCA_903833275.1 d__Bacteria; p__Bdellovibrionota; c__Bdellovibrionia; o__Bdellovibrionales; f__UBA1609; g__RBG-16-40-8; s__RBG-16-40-8 sp903833275
    GB_GCA_003694355.1 d__Bacteria; p__Bdellovibrionota; c__Bdellovibrionia; o__Bdellovibrionales; f__UBA1609; g__J124; s__J124 sp003694355

  3. Simon Roux repo owner

    Ok, I think I see what happens. GTDB-tk changed where some files are stored between versions, so iPHoP is confused. Can you try to copy the files “gtdbtk.ar122.decorated.tree-taxonomy” and “gtdbtk.bac120.decorated.tree-taxonomy” from “MAGs_GTDB-tk_results/” to “MAGs_GTDB-tk_results/infer/” ? iPHoP expects these two to be in “infer/”, but with 1.5 it seems they are only in the base directory, and infer only contains the “decorated.tree” files. Then delete the custom database directory you already generated, re-run “add_to_db”, and hopefully you won’t see these warnings anymore, and you will see your MAGs in the “Host_Genomes.tsv” file

  4. Yi Liu reporter

    Hi Simon,

    It works. Thank you so much!

    [8] Now build the new host genome metadata file...

    We added 4271 additional bacteria genomes and 465 additional archaea genomes
    [9] All done

  5. Log in to comment