error with download the iPHoP host database

Issue #90 closed
Jinghong Xu created an issue

Hi developer:

I tried 2 ways to download the large iPHoP host database to server, i.e. iphop download --db_dir path_to_iPHoP_db or wget https://portal.nersc.gov/cfs/m342/iphop/db/iPHoP.latest_rw.tar.gz, however both were failed.

wget error looked like this:

21071450K .......... .......... .......... .......... .......... 11% 10.8M 8h9m
21071500K .......... .......... .......... .......... .......... 11% 10.7M 8h9m
21071550K .......... .......... .......... .......... .......... 11% 11.3M 8h9m
21071600K .......... .......... .......... .......... .......... 11% 10.4M 8h9m
21071650K .......... .......... .......... .......... .......... 11% 11.3M 8h9m
21071700K .......... .......... .......... .......... .......... 11% 10.4M 8h9m
21071750K .......... 11% 18.1M=3m18s

2024-02-25 00:52:37 (5.22 MB/s) - Connection closed at byte 21577482770. Giving up.

I guess the database is so big that the net closed during the downloading? I tried several times it always stops at 11%.

I don’t know why and how to solve this.

Another question is that as long as I use iphop to predict hosts, is it necessary to predict hosts simultaneously using VirHostMatcher and CRISPR spacers blast(CRISPR spacer searched by MinCED tool) further, I am not sure whether they are redundant algorithmically, as in many papers, they preidict hosts using not only one tool.

Thanks!

Comments (7)

  1. Simon Roux repo owner

    Hi !

    Unfortunately, the database is very large, and for some connections can not be downloaded in one step. You can try to use the parameter “--split” in “iphop download”, as the database is then downloaded by chunks of 10Gb, which may be sufficient. If this does not work, my recommendation would be to use a download manager like aria2c (https://aria2.github.io/manual/en/html/aria2c.html), and ask it to download the file https://portal.nersc.gov/cfs/m342/iphop/db/iPHoP.latest_rw.tar.gz. Hopefully it would handle the connection closed better, and let you download the whole database eventually.

  2. Simon Roux repo owner

    As for the use of other tools, iPHoP is designed to run a series of tools (which include VirHostMatcher and CRISPR spacer blast) and then provide a single consensus host prediction. So in my opinion you do not need to run another tool, unless you have specific signal and/or host genomes you want to look into.

  3. Jinghong Xu reporter

    Thank you so much for your help! I will try to use '--split' to download the large database.

  4. Simon Roux repo owner

    I got an email saying there was another message in this issue, but can’t see it here. So just checking: was there an issue on Bitbucket or was this message about something you fixed in the meantime, and then deleted the message ? Thanks !

  5. Jinghong Xu reporter

    Yes, I’m sorry I was about to ask something related and then I understaood what was wrong. Thank you!😁

  6. Log in to comment