Bioconda and Git Clone Install difficulty

Issue #38 closed
Stefanie Huttelmaier created an issue

I’m trying to install iphop on my institutions HPCC and hitting a couple of road blocks. When I try to use the bioconda route, if I use conda install, I get into a long loop of failing to solve the environment and starting over.

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: -

And when I use mamba install, I get this error:

Encountered problems while solving:
  - package iphop-1.0.0-pyhdfd78af_0 requires perl-bioperl 1.6.924.*, but none of the providers can be installed

When I took the git clone route, I encountered the known issue with rafah and was able to fix it using the suggested method. Now I’m getting an error with what I think is tensor flow. I was initially trying to install this in a public directory so my whole lab would have access to it using --prefix when creating the conda environment. I thought the issue might be that tensor flow was using the wrong path, but when I tried reinstalling in my home directory I get the same error.

[8/1.2] Run blast classifier Model_blast_Conv-87 (by batch)..
Traceback (most recent call last):
  File "/projects/b1180/software/conda_envs/iphop/bin/iphop", line 8, in <module>
    sys.exit(cli())
  File "/projects/b1180/software/conda_envs/iphop/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
    args"func"
  File "/projects/b1180/software/conda_envs/iphop/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 106, in main
    runmodels.run_individual_models(args)
  File "/projects/b1180/software/conda_envs/iphop/lib/python3.8/site-packages/iphop/modules/runmodels.py", line 58, in run_individual_models
    full_predicted = run_single_classifier(classifier,tensors,args)
  File "/projects/b1180/software/conda_envs/iphop/lib/python3.8/site-packages/iphop/modules/runmodels.py", line 229, in run_single_classifier
    best_model = keras.models.load_model(h5_file, compile=False)
  File "/projects/b1180/software/conda_envs/iphop/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/projects/b1180/software/conda_envs/iphop/lib/python3.8/site-packages/tensorflow/python/saved_model/loader_impl.py", line 118, in parse_saved_model
    raise IOError(
OSError: SavedModel file does not exist at: /projects/b1180/software/conda_envs/iphop/lib/python3.8/site-packages/iphop/classifiers/Model_blast_Conv-87.h5/{saved_model.pbtxt|saved_model.pb}

Any suggestions

Comments (4)

  1. Simon Roux repo owner

    Hi Stefanie,

    Sorry you are encountering so many errors. I am not sure why mamba is not working for you (I’m less surprised about conda freezing and unable to solve environments.. :-) ). The mamba error is especially weird as it refers to iphop-1.0.0 while the latest version is iphop-1.3.2. Maybe it would be worth trying one last time with trying to “force” the last version, i.e. mamba install -n iphop iphop=1.3.2 ?

    However, in the (likely) case that this does not magically solve the problem, I think we can try a few things for the “SavedModel file does not exist” error. The first thing is: can you look into /projects/b1180/software/conda_envs/iphop/lib/python3.8/site-packages/iphop/classifiers/ and check which files are there and what is their size ? I suspect that these will not match exactly the list of files in https://bitbucket.org/srouxjgi/iphop/src/main/iphop/classifiers/ . The reason is that, because a number of files in the “classifiers” directory are too big for git, we use “git lfs” to add them to the repo. So what I think may be happening is that some classifier files are not downloaded when you use ‘git clone’, and then iPHoP can not complete its run.

    If indeed you see that your classifiers folder seems incomplete, what would be worth trying is to install git-lfs, possibly through conda (https://anaconda.org/conda-forge/git-lfs/), and rerun the git clone command. Nothing should chance except these larger files in “classifiers” that should be downloaded. From there, you can retry a iphop run (you should be able to reuse the same output folder, and iPHoP will try to pick up at the step it was stopped).

    Let me know if any of these solve the issues !

    Best,

    Simon

  2. Stefanie Huttelmaier reporter

    Hi Simon,

    Thanks so much for your response! I’ve tried a couple things and was able to get iphop running. All classifiers were present in my original install which was one of the reasons I initially thought that --prefix flag was causing confusion with the path. I tried git lfs anyway just to see what would happen and that broke the environment. I ended up starting with a fresh install.

    First, I tried to force mamba to the latest iphop version as you suggested and of course, this was not the magic fix:

    Looking for: ['iphop=1.3.2']
    
    bioconda/linux-64        [====================] (00m:00s) No change
    bioconda/noarch          [====================] (00m:00s) No change
    biobakery/linux-64       [====================] (00m:00s) No change
    biobakery/noarch         [====================] (00m:00s) No change
    ursky/linux-64           [====================] (00m:00s) No change
    ursky/noarch             [====================] (00m:00s) No change
    pkgs/main/noarch         [====================] (00m:00s) No change
    pkgs/main/linux-64       [====================] (00m:00s) No change
    pkgs/r/linux-64          [====================] (00m:00s) No change
    pkgs/r/noarch            [====================] (00m:00s) No change
    conda-forge/noarch       [====================] (00m:02s) Done
    conda-forge/linux-64     [====================] (00m:07s) Done
    
    Pinned packages:
      - python 3.8.*
    
    
    Encountered problems while solving:
      - nothing provides __cuda needed by tensorflow-2.7.0-cuda102py310hcf4adbc_0
    

    Next, I tried to install from the .yaml. At first it was giving me unsolvable errors, similar to the ones I was encountering with the bioconda route. One of my lab mates was having a similar issue installing another software, getting unsolvable errors for environments that should be solvable. She fixed it by removing extra channels from her .yml and that did the trick for me as well.

    The fix:

    https://stackoverflow.com/questions/75065878/installing-jupyter-error-nothing-provides-openssl-1-1-1-1-1-2-0a0-needed-by

    The environment:

    mamba env create -f iphop_environment.yml --prefix /projects/b1180/software/conda_envs/iphop/
    

    The .yaml:

    name: iphop
    channels:
      - bioconda
    #  - conda-forge
    #  - biocore
    #  - defaults
    dependencies:
      - blast=2.12
      - python=3.8
      - biopython=1.79
      - pandas=1.3
      - perl<6
      - git-lfs=3.1
      - hmmer=3.3.2
      - perl-bioperl<=1.7
      - click=8.0
      - prodigal=2.6
      - r-base=4.0
      - r-ranger=0.13
      - diamond=2.0
      - crisper_recognition_tool=1.2
      - piler-cr=1.06
      - joblib=1.0.1
      - scikit-learn=0.22.0
      - pip=21.2
      - numpy=1.23
      - pip:
        - keras==2.7.0
    

    After activation, git clone installed all classifiers without issue. I still got the known issue with Rafah, so followed the bioperl update instructions. What’s strange is that once I relaunched the environment, deleted the whole test output folder and reran the install test, I get all the CDS and Full Genome prediction outputs from rafah in rafah_out:

    but the rafahparsed.csv is empty so iphop doesn’t report anything from Rafah on completion. This is what my rafah.log looks like:

    Running host prediction mode
    Indexing sequences from iphop_test_results/test_input_phages_iphop/Wdir/split_input/
    Processing AJ421943.1.fasta
    Processing CP017905.1.fasta
    Processing IMGVR_UViG_3300013274_000001.fasta
    Processing IMGVR_UViG_3300013456_000001.fasta
    Processing MT657335.1.fasta
    Processed 5 Genomic Sequences
    Running Prodigal
    Indexing sequences from iphop_test_results/test_input_phages_iphop/Wdir/rafah_out/Full_CDS_Prediction.faa
    Running hmmsearch. Query: iphop_test_results/test_input_phages_iphop/Wdir/rafah_out/Full_CDS_Prediction.faa DB: iphop_db/Test_db_rw/db/rafah_data/HP_Ranger_Model_3_Filtered_0.9_Valids.hmm
    Obtained 43644 ids from iphop_db/Test_db_rw/db/rafah_data/HP_Ranger_Model_3_Valid_Cols.txt
    Parsing iphop_test_results/test_input_phages_iphop/Wdir/rafah_out/Full_CDSxClusters_Prediction
    Detected 653 OGs across 5 genomic sequences
    Performing host prediction
    [1] "Loading Model from  iphop_db/Test_db_rw/db/rafah_data/MMSeqs_Clusters_Ranger_Model_1+2+3_Clean.RData"
    No such file or directory at /projects/b1180/software/conda_envs/iphop/lib/python3.8/site-packages/iphop/utils/RaFAH_v0.3.pl line 313.
    Parsing output of host prediction iphop_test_results/test_input_phages_iphop/Wdir/rafah_out/Full_Host_Predictions.tsv
    

    I’m unsure if this is the same as the issue that is already being reported, but thought it might be helpful to document. Iphop does finish for me now, it just doesn't include the rafah results. I’m currently running my own data, looking forward to some results!! Please let me know if there is anything else I can share/test that can be of use to you.

  3. Simon Roux repo owner

    Hi Stefanie,

    Thanks for the update, and glad that you have iPHoP (nearly) working ! The RaFAH error seems consistent with a known issue: the R part of RaFAH can easily run out of memory, and then the file “Full_Host_Predictions.tsv” is not generated. One way to check this would be to run the same iPHoP calculation on a node with more memory (if it’s available) ? Unfortunately the memory usage can be an issue even with 5 input sequences (this is an issue we are looking at and hope to fix by recoding this part of RaFAH eventually).

    The conda channels issue in the yaml install is not something I had encountered, but it’s good to know this work-around! I’ll add it to the Readme as soon as I can to help others who may be faced with a similar problem.

    Thanks !

    Best,

    Simon

  4. Log in to comment