WIsH Issues with pipeline

Issue #99 closed
Christina Renee Rathwell created an issue

Hello there,

Thanks for integrating these tools! I’m excited to give it a try. (I’m running on an linux instance with 32 vCPUs 128 GiB RAM. )

I have a set of viruses from a single sample I’m testing against the provided database. Some are concatenated vMAGs and some are contigs.

Here’s my (failing) command:

iphop predict --fa_file All_vOTUs_017.fa --db_dir iphop/DB/Aug_2023_pub_rw/ --out_dir iphop/2024_017_vs_Aug_2023_DB --num_threads 30

So the pipeline just stalls at step:

[3/1/Run] Running (recoded)WIsH...

No error message and doesn’t stop “running” though I can see there is no longer any demand on my CPU anymore. No text is populated in the wish.log file. No text is populated in the wish.cmd file, and for each retry a different number of Batch.results show up in the wish_results directory.

I’ve tried:

rerunning without doing anything

rerunning with the --single_thread_wish command

rerunning with deleting all wish output first (not necessary, looks like it deletes each time)

letting it “run” for hours (no new Batch.results appear)

Without an error message I can’t quite figure out what else to try. I suppose I could run WiSH outside iPHoP and restart the pipeline downstream? Which results files would be necessary and would any parsing need to happen in order for iPHoP to automatically proceed to the next step?

I was wondering if you’d heard of this happening before. I read all the issues mentioning wish and I didn’t see anything quite like it.

Thanks!

Thanks!

Comments (5)

  1. Christina Renee Rathwell reporter
    • marked as trivial
    • edited description

    EDIT: I had read in the issues that datasets around 2000-3000 should be okay, but I batched this dataset of 2200 down to 550 anyway and it worked. I just ignored it when it looked like it had stalled, and it went through.

    I still wonder if there is a way to skip any of the 5 integrated steps, and make predictions based on a subset of the tools. Thanks!

  2. Simon Roux repo owner

    HI ! Yes WIsH can be a bit long especially if your sequences are full-size genomes (the rule of thumb of “~2000-3000” sequences is ok, but it will depend on how large are your input sequences).

    Re: skipping some of the tools, at this point we have not yet implemented this possibility, mostly because we would need to train additional classifiers for each tool combination (in our benchmarks, the general classifier does not do well with missing an entire set of results from one of the 5 tools). So unfortunately right now, you do have to run through all 5.

    Best,

    Simon

  3. Log in to comment