No new spacers in added custom MAGs

Issue #7 open
Zhichao Zhou created an issue

Hi iPHoP authors,

I found an error report when adding custom MAGs to the db:

In lines 1104-1108:

[4] Get CRISPR arrays from new MAGs and add to database...
python /home/zhichao/miniconda3/envs/Test-iPHoP/lib/python3.8/site-packages/iphop/utils/CRISPR/identify_crispr.folder.py -i /storage1/data11/Test/Guaymas_bins -o /storage1/data11/Test/Test_db/iPHoP_db_custom/db/Tmp_CRISPRs
python /home/zhichao/miniconda3/envs/Test-iPHoP/lib/python3.8/site-packages/iphop/utils/CRISPR/get_crispr_database.py -d /storage1/data11/Test/Test_db/iPHoP_db_custom/db/Tmp_CRISPRs
Count total new spacers -> 0
No new spacers, we just link to the old db

In the last lines:

Traceback (most recent call last):
File "/home/zhichao/miniconda3/envs/Test-iPHoP/bin/iphop", line 10, in <module>
sys.exit(cli())
File "/home/zhichao/miniconda3/envs/Test-iPHoP/lib/python3.8/site-packages/iphop/iphop.py", line 121, in cli
args"func"
File "/home/zhichao/miniconda3/envs/Test-iPHoP/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 70, in main
blast_crispr.run_and_parse_blast_to_crispr(args)
File "/home/zhichao/miniconda3/envs/Test-iPHoP/lib/python3.8/site-packages/iphop/modules/blast_crispr.py", line 25, in run_and_parse_blast_to_crispr
get_crispr_results(args["crisprparsed"],args["blastcrisprout"],args["array_file"],args["spacer_complexity"],args["messages"])
File "/home/zhichao/miniconda3/envs/Test-iPHoP/lib/python3.8/site-packages/iphop/modules/blast_crispr.py", line 59, in get_crispr_results
with open(array_file, "r", newline='') as f:
FileNotFoundError: [Errno 2] No such file or directory: './Test_db/iPHoP_db_custom/db_infos/All_CRISPR_array_size.tsv'

I am wondering if this is the reason that no new crisprs were found in added custom MAGs caused the error

Comments (26)

  1. Simon Roux repo owner

    Hi,

    Thanks for reporting. Do you expect CRISPR arrays/spacers to be detected in the new MAGs ? And does the addition of custom MAGs still complete successfully despite this absence of CRISPR spacer ?

  2. Zhichao Zhou reporter

    I am not intentionally expecting CRISPR arrays/spacers to be detected to add more potential host matches, it is OK if there are none in the input new MAGs. It is OK if the CRISPR part is not practical but other parts (for example blastn against genomes, kmer-based viral-host associations) work

  3. Simon Roux repo owner

    Ok, let me know if the new database works as expected then, even without CRISPR spacer. If that is the case, then it’s just an exception we have to better handle, but that would not be a major issue.

  4. Zhichao Zhou reporter

    I found that it is not the result of no CRISPR detected in custom MAGs.

    It is that one has to provide the full path to the directory of database since custom db will link the files from the standard db and full paths are needed to make links

  5. Simon Roux repo owner

    Good point, because the database is very large we opted to use symbolic links rather than copying all files over, but it is true that you need to provide the full path. I’ll update the README, thanks !

  6. Zhichao Zhou reporter

    Hi Simon,

    I have some new reports about this issue.

    Previously, I used 98 MAGs as the input of custom MAGs to build custom db. The “All_additional_spacers.nr.clean.fasta” file in “Tmp_CRISPRs“ folder is of 0 size. So I think it may be that spacers are few/none in these 98 MAGs.

    But, now, I used thousands of MAGs from highly diverse environments (wetland soil). It is still the case. The files in “Tmp_CRISPRs“ folder are of 0 size. I am wondering whether there are any mistakes in your script when dealing with grepping spacers from custom MAGs and making new CRISPRs blast db.

    Maybe it is just a mistake in my case; and I hope so. But I still recommend to have a check on this issue.

    I see you have “Wish_extra“, “php_models_extra“, and “rewish_models_extra“ in your custom db folder. I think for those host-predicting methods, extra MAGs information should be added. And I also see the custom db prediction result by “iPHoP-RF“, which means that the RF method part works well.

    PS: I used the v1.3.3 and the most updated standard db.

    Best,

    Chao

  7. Zhichao Zhou reporter

    Hi Simon,

    Many thanks for your fast reply! We have always benefited from your contributions to viromics and tools.

    I have uploaded all MAGs to the folder. And If you need assembly and reads, here is the link: /storage1/data11/ViWrap/

  8. Zhichao Zhou reporter

    By the way, “db/Tmp_CRISPRs/” contains 0-size files after running custom db building:

  9. Simon Roux repo owner

    Hi Zhichao,

    I was able to confirm that there is nothing wrong with your MAGs, as I got some CRISPR predicted from the full set (and from the smaller set, I got 0 spacers, but I still got output files as opposed to just the two fasta files in Tmp_CRISPRs).

    Could you try running just this CRISPR prediction part, i.e.:

    $ mkdir Test_CRISPRs
    $ python /path-to-your-conda-install-dir/.conda/envs/iphop/lib/python3.8/site-packages/iphop/utils/CRISPR/identify_crispr.folder.py -i Guaymas_bins/ -o Test_CRISPRs/
    

  10. Zhichao Zhou reporter

    Hi Simon,

    I get good results like this:

    This is the script that I have used:

    # Within the conda environment: /storage1/data11/yml_environments/ViWrap-iPHoP
    
    mkdir Test_CRISPRs
    python /storage1/data11/yml_environments/ViWrap-iPHoP/lib/python3.8/site-packages/iphop/utils/CRISPR/identify_crispr.folder.py -i Guaymas_bins/ -o Test_CRISPRs/
    

  11. Simon Roux repo owner

    Interesting, so that part works. Can you re-run the “iphop add_to_db” and capture the stdout and stderr in a file ? The command line you mention above is the one that should be run by add_to_db, so I’m not sure why it would work here but not as part of the larger pipeline (maybe a difference in the conda environment ?)

  12. Zhichao Zhou reporter

    Hi Simon,

    Here is the result. It seems that something is wrong with BLAST DB building:

  13. Simon Roux repo owner

    That is so weird. Let me try the full add_to_db on my side. In the meantime, if you can re-run the same command (“iphop add_to_db …”) with the option “--debug”, iphop will be (much) more talkative, but there may be some interesting information as to why it’s encountering issues

  14. Zhichao Zhou reporter

    I seem to find the point. I assigned the iPhoP_db as “/storagel/datal1/ViWrap/ViWrap_db/iPHoP_db/iPHoP_db“ (two iPHoP_db here, I thought it would be OK, but seems not). It is good to run the standard iPhoP with default db, but has some problem when running the custom db one. I have changed this, and now I am testing. So it is totally a mistake on my own. Many thanks!

  15. Simon Roux repo owner

    Oh interesting, I can confirm that on my side the “add_to_db” step with all the bins seemed to work fine, so hopefully this was it and it works for you moving forward !

  16. Zhichao Zhou reporter

    Hi Simon,

    I found some additional issues on my side:

    It seems that these two steps do not work - The output “Tmp_CRISPRs“ folder was not generated.

    I run this under the iPHoP conda env (v1.3.3)

  17. Simon Roux repo owner

    Right, it seems like these steps work when you call the command line directly though, right ? i.e. if you stay in the same conda env, and you copy paste the first line, you get data in “Tmp_CRISPRs” ? If so, that may be something to do with the “subprocess” module we use to run these from the main iPHoP python script

  18. Zhichao Zhou reporter

    Hi Simon, I copied the first line and run within the same conda env. But the “Tmp_CRISPRs“ folder was not generated.
    I even tried a random folder containing several fasta files somewhere else, but I still got no reasonable results.

    I am not sure if this is just my case or a general one

  19. Simon Roux repo owner

    Do you have any output when trying to run this first line, e.g. “Processing …” ? (the line should have more stuff after “Processing”) ? If not, then can you try after adding a “/” after “Guaymas_bins” ? (I just realized this script did not use os.path, so I’m wondering if it’s something like this :-( )

  20. Zhichao Zhou reporter

    Yes, maybe, that’s the trick. After adding '/', it seems running. I will use “/storage1/data11/ViWrap/tmp_run_mangrove_MAGs/MAGs/“ as the input of MAG folder to make a test too.

  21. Simon Roux repo owner

    That would be an easy fix at least :-) Let me know once it finishes if indeed you get the expected results, and I will make sure this is fixed in the next version (should also be easy to fix by simply using os.path.join). Thx !

  22. James Kosmopoulos

    Hi Chao and Simon,

    Just letting you know that I also ran into this issue, and adding a trailing forward slash / to the --fna_dir as you both suggested solved it. The resulting database contained CRISPRs from my custom MAGs, as desired.

    Take care,

    James

  23. Log in to comment